CN109101474B

CN109101474B - Address aggregation method, package aggregation method and equipment

Info

Publication number: CN109101474B
Application number: CN201710468203.7A
Authority: CN
Inventors: 王国印; 郑耸
Original assignee: Cainiao Smart Logistics Holding Ltd
Current assignee: Cainiao Smart Logistics Holding Ltd
Priority date: 2017-06-20
Filing date: 2017-06-20
Publication date: 2022-09-30
Anticipated expiration: 2037-06-20
Also published as: CN109101474A

Abstract

The embodiment of the application discloses an address aggregation method, a parcel aggregation method and equipment, and relates to the technical field of data processing. The address aggregation device comprises: the system comprises a portal address acquisition device, a portal address acquisition device and a portal address acquisition device, wherein the portal address acquisition device is used for acquiring a plurality of standard portal addresses nationwide; the address conversion device is used for converting the standard addresses into the structured addresses; the feature extraction device is used for extracting features of the plurality of structured addresses to obtain a plurality of feature sets corresponding to the plurality of standard addresses; the similarity determining device is used for determining the similarity between any two standard addresses in the standard addresses according to a plurality of feature sets corresponding to the standard addresses; and the address aggregation device is used for aggregating the standard addresses according to the similarity to obtain a plurality of clusters. By utilizing the method and the device, the standard addresses belonging to the same area are aggregated under the same cluster, so that the efficiency of package aggregation is improved subsequently.

Description

Address aggregation method, package aggregation method and equipment

Technical Field

The present application relates to the field of data processing technologies, and in particular, to an address aggregation method, a package aggregation method, an address aggregation setting, and a package aggregation device.

Background

At present, in a package-sending scene at the end of logistics, the package-sending range of each courier generally comprises a plurality of cells or a plurality of office buildings. In the prior art, parcels belonging to the same community or the same office building are generally sorted manually according to the delivery addresses of the parcels, and then are processed together according to different communities or office buildings, for example, the parcels in the same community are notified to users in batches, or the parcels in the same community are put into a self-service cabinet in batches, and the parcels in the same community are dispatched to a certain courier one by one.

With the rapid development of the logistics industry and the geographic information technology, people have higher and higher requirements on timeliness of logistics dispatching, and the dispatching mode at the logistics tail end cannot meet the requirement of high-speed dispatching. In the prior art, packages are sorted manually in a logistics terminal picking scene, so that the defects of low dispatching efficiency and reduced user experience exist, certain sorting errors exist, and the dispatching efficiency can be further reduced.

Therefore, how to research and develop a new scheme, which can aggregate packages, identify whether different packages belong to the same district, office building and other areas, and automatically sort packages through aggregation results in a pull-in scene at the end of a logistics is a technical problem to be solved in the field.

Disclosure of Invention

The embodiment of the application aims to provide an address aggregation method, a package aggregation method and equipment, which are used for identifying whether different standard addresses belong to the same area or not, so that the standard addresses belonging to the same area are aggregated under the same cluster, and the package aggregation efficiency is improved subsequently.

In order to solve the above technical problem, the embodiment of the present application is implemented as follows:

according to a first aspect of the present application, a method of address aggregation is presented, comprising:

acquiring a plurality of standard door addresses;

converting the plurality of standard addresses into a plurality of structured addresses;

performing feature extraction on the plurality of structured addresses to obtain a plurality of feature sets corresponding to the plurality of standard portal addresses, wherein the feature sets comprise interest area attribute information;

determining the similarity between any two standard addresses in the standard addresses according to a plurality of feature sets corresponding to the standard addresses;

and aggregating the plurality of standard addresses according to the similarity to obtain a plurality of clusters.

According to a second aspect of the present application, an apparatus for address aggregation is provided, including:

the system comprises a door address acquisition device, a door address acquisition device and a door address acquisition device, wherein the door address acquisition device is used for acquiring a plurality of standard door addresses nationwide;

the address translation device is used for translating the plurality of standard addresses into a plurality of structured addresses;

the characteristic extraction device is used for carrying out characteristic extraction on the plurality of structured addresses to obtain a plurality of characteristic sets corresponding to the plurality of standard door addresses, and the characteristic sets at least comprise road and route number information of interest areas and/or names of the interest areas;

the similarity determining device is used for determining the similarity between any two standard addresses in the standard addresses according to a plurality of feature sets corresponding to the standard addresses;

and the address aggregation device is used for aggregating the standard addresses according to the similarity to obtain a plurality of clusters.

According to a third aspect of the present application, there is provided a method of parcel aggregation comprising:

acquiring a plurality of standard door addresses;

determining the similarity between any two standard addresses in the standard addresses, and aggregating the standard addresses according to the similarity to obtain a plurality of clusters;

creating a package aggregation model for the plurality of clusters respectively;

and acquiring a communication address of a package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster.

According to a fourth aspect of the present application, there is provided an apparatus for parcel aggregation, comprising:

the door address acquisition device is used for acquiring a plurality of standard door addresses;

the address aggregation device is used for determining the similarity between any two standard addresses in the standard addresses and aggregating the standard addresses according to the similarity to obtain a plurality of clusters;

a package aggregation model establishing device for respectively establishing package aggregation models for the plurality of clusters;

and the model matching device is used for acquiring a communication address of the package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster.

According to the technical scheme provided by the embodiment of the application, the method comprises the steps of firstly obtaining a plurality of standard door addresses, converting the standard door addresses into the structural addresses, carrying out feature extraction based on interest areas on the structural addresses to obtain the feature set, aggregating the standard door addresses according to the similarity to obtain a plurality of clusters, aggregating the standard door addresses belonging to the same area to the same cluster, secondly constructing a package aggregation model according to the clusters, and finally matching the communication addresses on logistics packages with the package aggregation model to obtain the optimal cluster, so that the packages in the same area are aggregated to the same cluster, and the package aggregation efficiency is improved.

In order to make the aforementioned and other objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic view of a scenario of an apparatus for parcel aggregation according to the present application;

fig. 2 is a block diagram of a first embodiment of an address aggregation device according to the present application;

fig. 3 is a block diagram of a second embodiment of an address aggregation device according to the present application;

fig. 4 is a flowchart of a first embodiment of a method for address aggregation according to the present application;

fig. 5 is a flowchart of a second embodiment of a method for address aggregation according to the present application.

Detailed Description

The embodiment of the application provides a parcel aggregation method, an address aggregation method, a parcel aggregation device and an address aggregation device.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms referred to in the present application will be first described below.

Feature (Feature): is an abstract result of the characteristics of an object or a group of objects and is used to describe concepts.

Clustering (Clustering): the process of dividing a collection of physical or abstract objects into classes consisting of similar objects is called clustering.

n-gram if a sentence S is made up of m words (w) ₁ w ₂ w ₃ …w _m ) Then n-gram is defined as: { w _i w _i+1 …w _i+n-1 |1≤i≤m-n+1}。

k-skip-n-gram if a sentence S consists of m words (w) ₁ w ₂ w ₃ …w _m ) Then k-skip-n-gram is defined as: { w _i1 w _i2 …w _in |∑i _j -i _j-1 <k}。

gram: several words are combined to form a gram, typically referring to the feature instances extracted by the ngram.

Communication address: a series of characters containing the names of buildings such as provincial city, prefecture, town street, house number, house Estate, building, etc., or plus the number of floors, room number, etc., a valid address is unique.

A receiving address: is the address where people receive packages or letters.

Structured address: the word string with structural labels generated after the communication address is subjected to word segmentation, and if the label is as follows: province, city, county, street, community, road, house number, POI identification, building number, unit number, room number, etc.

Detailed address: the communication address is the part excluding the administrative division.

Point of interest (POI): the term geographic information system generally refers to all geographic objects that can be abstracted into points, especially some geographic entities closely related to people's lives, such as schools, banks, restaurants, gas stations, hospitals, supermarkets, and the like. The main purpose of the interest points is to describe the addresses of the objects or events, so that the description capability and the query capability of the positions of the objects or events can be enhanced to a great extent, and the accuracy and the speed of geographic positioning are improved.

Area of Interest (Area of Interest, AOI): refers to a geographical object with a certain geographical area, such as a cell, a village, an office building, a school, a hospital, an industrial park, a scientific park, etc., and broadly refers to a wide range of POIs.

TF-IDF, TF Term Frequency (Term Frequency) and IDF reverse file Frequency (Inverse Document Frequency). TF represents the frequency of occurrence of the term in the document d. The main idea of IDF is: if the documents containing the entry t are fewer, that is, the smaller n is, the larger IDF is, the entry t has good category distinguishing capability. If a word or phrase appears frequently in one article, TF, is high and rarely appears in other articles, the word or phrase is considered to have a good category discrimination capability.

Fig. 1 is a scene schematic diagram of a package aggregation device according to the present application, and in a package deployment scene at a logistics end, with rapid development of the logistics industry and the geographic information technology, the number of packages is increasing. How to judge whether different packages belong to an interest area with a natural boundary, such as the same cell, an office building and the like, becomes a key factor for restricting the industrial efficiency, for example, in a logistics scene, the packages are piled according to the interest areas, and the automatic sorting, collecting and dispatching efficiency of the packages can be greatly improved. Fig. 2 is a block diagram of a first embodiment of an address aggregation device according to the present application, and referring to fig. 2, an address aggregation device according to the present application includes:

the door address obtaining device 100 is used for obtaining a plurality of standard door addresses. In a specific embodiment, the standard address may be obtained by a related tool in the prior art (such as a Baidu map, a Gade map, etc.), where the standard address is a standard address in a certain range, such as nationwide, provincial, etc., and the standard address generally includes a standard address and a longitude and latitude related to the standard address. The standard address requires that the address must be accurate to four levels (including province, city, district, county and town streets), the house number corresponds to the AOI, the administrative division information corresponds to the detailed address, and the address standard is clean and has no errors.

The address translation device 200 is used for translating a plurality of standard addresses into a plurality of structured addresses.

In one embodiment of the present application, the standard portal address may be translated into a structured address by a word segmentation tool, specifically, the standard door address is participled, the purpose is to extract the place name information in the standard door address, then semantic labeling information is added to each place name information (the labeled contents mainly comprise provincial administrative district prov, regional administrative district city, county administrative district discrict, country administrative district town, development district devZone, community/village committee Community, main road, sub road subRoad, main road No, sub road No, AOI, smaller-range AOI subAoi, building No. houso, Unit No. cellNo, floor No. floor, room No. roomNo, entity inside the room and the like, and at least road and road No. information of the interested area and/or the name of the interested area), and finally the place name information is put into a structural address template according to the labeled semantic information, so that the structural address is obtained.

Taking the three-pier street, Yunzhou pond and Lang 866, Zhejiang university, hong Kong school zone in the western lake region of Hangzhou city, Zhejiang province as an example, the structured addresses obtained by conversion are the names of Zhejiang province/province, Hangzhou city/city, west lake region/district, three-pier street/street, Yunzhou pond and Lang 866/Lang, and Zhejiang university, Taiwan university, Hongjingtang pond and Lang AOI.

A feature extraction device 300, configured to perform feature extraction on the multiple structured addresses to obtain multiple feature sets corresponding to the multiple standard addresses. Since the addresses are aggregated according to the AOI dimensions of the interested areas, the feature extraction device needs to extract features by taking the AOI as the center, and the extracted feature quantity is greatly increased by combining administrative division information (province, city, prefecture, county, street, community, and the like) and the core determining factors of the AOI. The feature set thus includes region-of-interest attribute information, where the region-of-interest attribute information includes road and route number information and/or region name of the region. One standard portal address is converted into one structural address, and one structural address is subjected to feature extraction to obtain a feature set.

In an embodiment of the present application, feature extraction may be directly performed on a structured address, and all features in the structured address are extracted to form a feature set. Taking the three-pier street Yunhang pond road 866 of Zhejiang university Hongkin harbor in the West lake region of Hangzhou province of Zhejiang province as an example of a standard address in a standard portal site, the converted structured addresses are Zhejiang province/province, Hangzhou city/city, West lake region/region, three-pier street/street, Yuhangtang pond road/road, 866/road number, and Zhejiang Cercis harbor school region/AOI, and the extracted feature set is Zhejiang province, Hangzhou city, West lake region, three-pier street, Yuhangtang pond road, 866, Zhejiang university Hongkin harbor school region. In an embodiment, the feature set includes road and way number information and area names.

After the standard address is structured, the standard address is converted into a structured object, so that the feature extraction can be realized by combining fields in the structured object, namely the feature can be templated. In one embodiment of the present application, the features in the structured address can be extracted by defining a feature template in advance. For example, if features in a feature template are all present in a structured address, the structured address may be translated into a plurality of features, and if a feature is not present, the feature is not output into the feature set. A plurality of features are predefined in the feature template.

The feature template is composed of member fields of address structured objects, features are extracted only when all the member fields of the structured objects contained in the template are not empty, and otherwise, the extracted result is empty. Since the object of the present application is to group packages together according to the AOI dimension, the feature extraction must be performed with AOI as the center (specifically, each feature must contain at least information capable of determining AOI), and the AOI determining factors mainly include:

1) the names of AOI are mainly the names of districts, office buildings, schools and hospitals, such as: lejia international building, a city district of the city of the le jia international building;

2) the house number of AOI, such as: liang lu 999, wen xi lu 969.

By combining administrative division information (province, city, county, street, community, etc.) and the core determinants of AOI, the amount of extracted features is greatly increased. In one embodiment of the present application, the structural definition of the feature template may be as shown in table 1 below:

TABLE 1

The embodiment shown in table 1 is based on template-based ngram (template-based N-tuple model) feature extraction, 6 templates are predefined, and the structured address is "zhejiang/prev hang city/city hang hough/discrict warehouse street good road 999/townsgood road/road 999/roadNo good international building/AOI office/AOI category" with "the top part of the slash is semantic labeling information, identifying the structural information of the current word in the address, the feature extracted based on the feature template of table 1 is shown on the right side of table 1.

In another embodiment of the present application, the preset template may further include other information, such as ambiguity level, which is further included in the feature template, as shown in table 2:

TABLE 2

In the example of the feature template shown in table 2, the template-based N-gram includes 6N-grams, the first column is a N-gram of the feature template, the second column is an ambiguity level, which can indicate whether the feature extracted by the N-gram can uniquely determine a specific address, and how ambiguous, 0 indicates complete ambiguity (if two AOI addresses include such feature, it can be directly determined as a synonymous AOI address), 1 indicates slight ambiguity, and 2 indicates greater ambiguity, which can in turn define ambiguity levels. The features extracted based on the feature template of table 2 are shown on the right side of table 2. In the embodiment shown in table 2, the extracted ambiguity corresponding to a plurality of features in the feature set corresponding to the standard portal address is the ambiguity of the N-gram model, such as: the ambiguity level of the Ngram feature templates rod, roadNo and AOI in table 2 is set to 1, and the ambiguities of the features "good circuit", "999 number" and "happy international building" in the feature set extracted in the embodiment are all 1, i.e. slightly ambiguous.

Referring to fig. 2, the apparatus further includes a similarity determining device 400, configured to determine a similarity between any two standard addresses in the plurality of standard addresses according to a plurality of feature sets corresponding to the plurality of standard addresses. In one embodiment of the present application, the similarity between two standard addresses is determined sequentially by a similarity formula. Specifically, the similarity formula may be a jaccard similarity (formula 1), a cosine similarity (formula 2), a formula 3, or a formula 4, as follows:

the similarity of the feature sets corresponding to the two standard door addresses is calculated based on Jaccard similarity, the feature vectors of the two feature sets are assumed to be represented by A and B, and the Jaccard similarity calculation method is shown in formula 1. Formula 2 is the euclidean distance (cosine identity), and formula 3 is the similarity score obtained by dividing the number of feature intersections between two feature sets by the number of smaller feature intersections in the two sets. Formula 4 is the similarity score obtained by dividing the number of feature intersections between the two feature sets by the number of the larger feature intersections in the two sets.

Referring to fig. 2, the address aggregation apparatus further includes an address aggregation device 500, configured to aggregate the plurality of standard addresses according to the similarity to obtain a plurality of clusters.

In one embodiment of the present application, the address aggregation apparatus 500 includes:

in an embodiment of the present application, when the similarity between two standard addresses calculated by the similarity determining apparatus 400 is greater than or equal to a preset threshold, the two standard addresses are used as the similar standard addresses. In an actual use process, when the similarity determining apparatus 400 determines the similarity between two standard addresses according to equation 1, the preset threshold may be 0.33, and when the similarity is greater than or equal to 0.33, the two standard addresses are considered as similar standard addresses.

In another embodiment of the present application, when there are identical features in feature sets corresponding to two standard addresses and an ambiguity level of the identical features is unambiguous, the two standard addresses are regarded as similar standard addresses. That is, in the embodiment shown in table 2, if there is the same feature in the feature set corresponding to each of the two standard addresses and the ambiguity level of the same feature is 0, that is, there is no ambiguity, it indicates that the two standard addresses can be merged.

In another embodiment of the present application, when there is a synonymous interest region in the feature set corresponding to two standard addresses, the two standard addresses are used as similar standard addresses. That is, if the names of the two AOIs are synonymous under the four-level administrative division, the feature sets in which they are located may be merged.

Specifically, in the package dispatching scenario of the logistics tail end shown in fig. 1, assuming that 8 packages are numbered 1, 2, and … … 8, respectively, corresponding 8 standard door addresses can be obtained according to package addresses, and the standard door address corresponding to each standard door address can be determined by the similar address determining module, as shown in table 3:

TABLE 3

The judging module is used for judging whether each standard door address and the corresponding similar standard door address are in the cluster or not, if not, the first adding module is executed, and if not, the second adding module is executed;

the first adding module is used for adding the standard door address and the corresponding similar standard door address into a newly-built cluster;

the second adding module is used for adding the standard door address and the corresponding similar standard door address into the cluster;

and the address aggregation module is used for adding the standard gate address and the corresponding similar standard gate address into the cluster.

In a specific implementation manner, whether each standard gate address and the corresponding similar standard gate address are in a cluster or not can be sequentially judged by traversing all the standard gate addresses, and when the standard gate addresses and the corresponding similar standard gate addresses do not exist, it is indicated that the standard gate addresses and the corresponding similar standard gate addresses do not have a cluster yet, so that the standard gate addresses and the corresponding similar standard gate addresses need to be added into a newly-built cluster, otherwise, the standard gate addresses and the corresponding similar standard gate addresses are added into the cluster. In the embodiment shown in table 1, when traversing from number 1 to number 8, the standard address with number 1 and the corresponding similar standard addresses 2 and 6 are not added to the cluster, so a new cluster is created, when the number of the cluster is 1, the clusters sometimes have 1, 2 and 6, and then traversing from number 1 to number 8, and finally two clusters are obtained by aggregation, as shown in table 4.

TABLE 4

Cluster ID	Numbering of standard addresses in the cluster
		1	1、2、4、6、7、8
2	3、5

That is, if A, B, C represents 3 different AOI addresses, if a and B are synonymous AOI addresses, and B and C are synonymous AOI addresses, A, B, C are mutually synonymous AOI addresses, and can be merged into the same cluster.

In the embodiment shown in fig. 1, after the address aggregation device of the present application processes, 8 parcels are finally divided into two interest areas, wherein one interest area has 6 parcels, and the other interest area has 2 parcels, so that in the logistics scene shown in fig. 1, the parcels are stacked according to the interest areas, and the efficiency of automatic parcel sorting, collecting and dispatching can be greatly improved.

In another embodiment of the present application, the address aggregation apparatus 500 may sequentially find the standard gate addresses to which each standard gate address is similar through a double-loop traversal, and then merge the similar clusters together through a cluster merging algorithm. Specifically, the pseudo code of the algorithm for finding the standard addresses similar to each standard address through double-loop traversal is as follows:

in the stream-end package-assignment scenario such as that shown in fig. 1, assuming that there are 8 packages numbered 1, 2, and … … 8, each corresponding to 8 standard addresses, the result output by the double-loop traversal is shown in table 5:

TABLE 5

The clusters are combined together through a cluster combination algorithm, the specific algorithm is to sequentially traverse the list of each standard gate address, find whether the key of each standard gate address is in the cluster list, if so, combine the standard gate address and the corresponding cluster in the cluster list, and finally obtain the aggregated cluster, and the pseudo code of the specific algorithm is as follows:

the method comprises the steps of inputting cluster _ in [ ], containing ditt in the form of { cluster _ key _ id, list (cluster) }, wherein the cluster _ key _ id is the current initial cluster number, and the list (cluster) is the initial cluster number which is the same as or close to the current cluster

Output, list of cluster _ out [ ], dit types

As shown in table 6, the algorithm mainly merges similar clusters together by means of indexing, and finally merges 8 standard gate addresses into two clusters, that is, (1, 2, 4, 6, 7, 8) is a cluster, and (3, 5) is a cluster.

TABLE 6

As described above, in the embodiment of the present application, standard portal addresses in a national range are first converted into structured addresses, feature extraction is performed on the structured addresses to obtain a feature set, where the feature set at least includes road and route number information and/or area names of areas, similarity between the standard portal addresses is determined through the feature set, and finally, a plurality of standard portal addresses are aggregated according to the similarity to obtain a plurality of clusters, so that different standard portal addresses belonging to the same area are aggregated under the same cluster.

In the first embodiment, the standard addresses of the same area are grouped under one cluster based on an address aggregation algorithm, and in other embodiments of the present invention, a package aggregation model can be constructed accordingly. Fig. 3 is a block diagram of a second embodiment of an address aggregation device according to the present application, and referring to fig. 3, in the second embodiment, the device further includes:

a cluster naming device 600, configured to name the obtained multiple clusters, so as to obtain names of the clusters. In one embodiment of the present application, the cluster naming apparatus 600 may obtain the name of the cluster as a cluster ID, as shown in table 4.

In an embodiment of the present application, when naming a cluster ID as a name of a cluster, the apparatus provided by the present application further includes a package aggregation model establishing device 700, configured to obtain a plurality of feature sets corresponding to a plurality of standard addresses that constitute the cluster; associating features in the feature set with the cluster ID; and creating an inverted index according to the cluster ID to form a package aggregation model, wherein the package aggregation model is a text model. In the embodiment, the feature grams extracted based on the template are taken as term, the associated cluster ID is taken as document ID to create an inverted index, the weight of each gram to the cluster ID can be calculated by tf-idf, and the gram retrieval function can be used for providing a basis for the subsequent retrieval process. The text model solves the problem that the optimal AOI is directly matched through text similarity: features are indexed inversely in conjunction with cluster IDs.

In an embodiment of the present application, the apparatus provided by the present application further includes a model matching device 800, configured to obtain a communication address of a package, match the communication address with the package aggregation model, obtain a cluster corresponding to the package, and aggregate the package under the cluster.

That is, after the text model is successfully established, packages which subsequently reach the logistics tail end are collected, the communication addresses of the packages can be directly searched in the text model, and the interest areas corresponding to the package addresses are obtained through text similarity matching with the optimal AOI. The receiving address of the user is generally the result of omitting part of place names of the standard address, that is, the user address is generally the substring of the standard address, and the matching rate of the text model is greatly improved through various combinations in the embodiment.

In one embodiment of the present application, when naming the cluster ID as the name of the cluster, the apparatus provided by the present application further comprises a package aggregation model building device for building a longitude and latitude model. The dimension model in this embodiment is mainly to wrap business points around AOI (not matching AOI textually) to nearby AOI, such as points of business nature around AOI like a cell, school, office building, etc., such as convenience store, various types of small stores, restaurant, etc., whose addresses generally do not match the AOI address, but can be gathered in AOI wrapped cluster because of being around AOI.

In this embodiment, the package aggregation model building device is configured to obtain a plurality of standard addresses that form the cluster, where the standard addresses include standard addresses and longitudes and latitudes corresponding to the standard addresses; associating the plurality of standard addresses with the cluster ID; determining a central point and a boundary of the interest area according to a plurality of longitudes and latitudes included by the plurality of standard addresses; establishing a polygon of the interest region according to the central point and the boundary of the interest region; establishing a mesh in the polygon; and establishing a mapping relation between the grids and the cluster ID to form a package aggregation model, wherein the package aggregation model is a longitude and latitude model.

Specifically, the standard door address is associated with a cluster ID, that is, the cluster ID encloses a standard address set, and the standard address has latitude and longitude information, so that a latitude and longitude point set is enclosed, and the average value of the point sets is the center point of the AOI. And the boundaries formed by connecting the outermost points of the longitude and latitude points are AOI polygons. AOI polygons can be represented in a number of ways, and the following is a grid as an example:

a grid of i.100 meters by 100 meters can be implemented by reserving 3 bits for longitude and latitude decimal points, and grid IDs are identified by strings of characters pieced together by longitude and latitude, such as longitude: 116.379861, latitude: 40.077701, grid ID is: 116379_ 40077.

ii.50 meters by 50 meters grid, which can be implemented by reserving 4 bits in longitude and latitude decimal point, grid ID is identified by strings of longitude and latitude pieced together, such as longitude: 116.379861, latitude: 40.077701, grid ID is: 1163798_ 400777.

A top 7 bit trellis of geohash.

And iv, establishing a mapping relation between the grid ID and the cluster ID by the method.

v. polygon retrieval: the grid ID of the user can be calculated based on the latitude and longitude of the user address, and the cluster ID is calculated based on the grid ID.

In an embodiment of the present application, the model matching apparatus 800 included in the apparatus provided by the present application is further configured to obtain a communication address of a package, match the communication address with the package aggregation model, obtain a cluster corresponding to the package, and aggregate the package under the cluster. Namely, after the longitude and latitude model is successfully established, packages which subsequently reach the logistics tail end are picked up and dispatched, the package addresses can be directly searched in the longitude and latitude model, the optimal AOI is obtained, and the interest areas corresponding to the package addresses are obtained.

In an embodiment of the present application, when a cluster ID is named as a name of a cluster, the apparatus provided by the present application may further include a package aggregation model building device that builds a package aggregation model, where the package aggregation model includes both a text model building device and a latitude and longitude model building device. Therefore, when the text model and the longitude and latitude model are successfully established, packages which arrive at a new logistics terminal and are picked up are subjected to follow-up, and the text model is more accurate than the longitude and latitude model due to the fact that an error exists when the package address is converted into the longitude and latitude and the package address of a user is not a standard door address generally. In the embodiment, the text model can be firstly searched when the address on the user package is searched, if the text model has a result, the range is directly determined, and otherwise, the longitude and latitude model is searched.

So, the equipment that this application provided can discern the different expression modes (synonymous AOI address) of same AOI address to parcel together in with same AOI, here is in the same place parcel and the inside parcel of AOI on the peripheral commercial address of AOI together simultaneously, thereby promotes the efficiency that the parcel was gathered greatly. The packages are gathered together according to the AOI of the residential area, the school district, the office building and the like, so that the packages in the AOI are processed on the business, and the business efficiency is greatly improved. Advantages of the encapsulation polymerization technique over other techniques:

1. the bunched packages are mutually accessible in space.

2. Natural obstacles such as: the packages on the two sides of the main road, the river, the cell enclosing wall, the mountain and the like are gathered together.

3. The linear distance between the longitude and latitude of the gathered packages is close to the actual walking distance.

4. Inside the same cluster, from provincial administrative district to AOI level, the addresses are synonymous addresses, so the invention is beneficial to realizing address standardization.

In an embodiment of the present application, for the convenience of dispatching a member, the cluster naming apparatus 600 may also name a cluster, and for the convenience of identifying by the member, the cluster name is not a cluster ID.

In one embodiment, cluster naming apparatus 600 includes: the characteristic set acquisition module is used for acquiring a characteristic set corresponding to a plurality of standard addresses forming the cluster;

the frequency determining module is used for sequentially determining the frequency of each road and the road number information and the frequency of the name according to the feature set;

and the name selection module is used for taking the feature with the highest frequency as the name of the cluster.

In this embodiment, the frequency of occurrence of road and road number information and the frequency of occurrence of names in the feature set of all standard addresses in a cluster are counted, and the feature with the highest frequency is used as the name of the cluster.

In still another embodiment of the present application, the cluster naming apparatus 600 includes:

the characteristic set acquisition module is used for acquiring a characteristic set corresponding to a plurality of standard addresses forming the cluster;

the region name screening module is used for screening out the names of a plurality of interest regions from the feature set;

and the name selection module is used for taking the name of the interest area with the highest actual use frequency as the name of the cluster.

In this embodiment, all names in the feature set of all standard addresses in a cluster are counted, and the name of the region of interest with the highest actual use frequency is used as the name of the cluster.

In other embodiments of the present application, the selection criteria for cluster names may also be: the determining factor of AOI must be contained: the AOI name or number of house is the name which is often used by users because the wrapping capacity is strong, i.e. the scope of jurisdiction is large. In a specific embodiment, this can be done by: counting word frequency of each gram extracted from the feature template, then filtering out features which cannot be cluster names, only keeping feature grams extracted from three templates of AOI, AOI category, road No, road No and AOI, taking the feature grams of frequency topN in the 3 templates, wherein the road, road No and AOI contain road, road No and AOI when viewed from the templates, and expressing the co-occurrence relationship of doorplate numbers and AOI, so that the co-occurrence relationship needs to be added to the features extracted from road, road No and AOI, and the method is realized by traversing all the grams of the AOI templates, segmenting the maps into two grams according to road, road No and AOI, and weighting the word frequency of the AOm extracted based on the templates of road, road No and AOI to the grams of road, road No and AOI. And (3) sorting the grams extracted by taking the AOI as the template according to the AOI types, respectively taking the AOI, the load and the first ranked gram of the load No, and taking the ratio of the frequencies to the load, wherein if the ratio is greater than a set threshold, the cluster is better named by the AOI name, and otherwise, the house number load and the load No are taken.

As described above, in the embodiment of the application, firstly, the standard door addresses of the same area are identified based on an address aggregation algorithm, the door addresses of the same area are aggregated under one cluster, secondly, a package aggregation model is constructed, and finally, the communication address of the logistics package is matched with the package aggregation model to obtain the optimal cluster, so that the packages of the same area are aggregated under the same cluster.

The present application further provides an apparatus for package aggregation, the apparatus comprising:

In one embodiment of the present application, the apparatus further comprises:

and the cluster naming device is used for naming the clusters to obtain the cluster names, and the cluster names comprise cluster IDs.

The package aggregation model building apparatus in one embodiment comprises:

the characteristic set acquisition module is used for acquiring a plurality of characteristic sets corresponding to a plurality of standard addresses forming the cluster;

a feature association module, configured to associate a feature in the feature set with the cluster ID;

an index creating module for creating an inverted index according to the cluster ID to form a package aggregation model, wherein the package aggregation model is a text model

In the embodiment, the feature grams extracted based on the template are taken as term, the associated cluster ID is taken as document ID to create an inverted index, the weight of each gram to the cluster ID can be calculated by tf-idf, and the gram retrieval function can be used for providing a basis for the subsequent retrieval process. The text model solves the problem that the optimal AOI is directly matched through text similarity: features are indexed inversely in conjunction with cluster IDs. When the text model is successfully established, packages which subsequently reach the logistics tail end are collected, the communication addresses of the packages can be directly searched in the text model, and the interest areas corresponding to the package addresses are obtained through text similarity matching with the optimal AOI. The receiving address of the user is generally the result of omitting partial place names from the standard address, that is, the user address is generally a substring of the standard address, and in the embodiment, the matching rate of the text model is greatly improved through various combinations.

The package aggregation model building apparatus in one embodiment comprises:

the system comprises a door address acquisition module, a door address acquisition module and a control module, wherein the door address acquisition module is used for acquiring a plurality of standard door addresses forming the cluster, and the standard door addresses comprise standard addresses and longitudes and latitudes corresponding to the standard addresses;

a portal address correlation module for correlating the plurality of standard portal addresses with the cluster ID;

the boundary determining module is used for determining the central point and the boundary of the interest area according to a plurality of longitudes and latitudes contained in the plurality of standard door addresses;

the polygon establishing module is used for establishing a polygon of the interest area according to the central point and the boundary of the interest area;

a mesh establishing module for establishing a mesh in the polygon;

and the mapping relation establishing module is used for establishing the mapping relation between the grids and the cluster IDs to form a package aggregation model, and the package aggregation model is a longitude and latitude model.

In the embodiment, after the longitude and latitude model is successfully established, packages which subsequently arrive at the logistics tail end and are picked up and dispatched can be directly searched in the longitude and latitude model, and the optimal AOI is obtained to obtain the interest area corresponding to the package address.

After the apparatus of the present application is described, a method for aggregating packages and a method for aggregating addresses of the present application are described below with reference to the accompanying drawings. The implementation of the method can be referred to the implementation of the above-mentioned device, and repeated details are not repeated.

Fig. 4 is a flowchart of a first embodiment of a method for address aggregation according to the present application, and referring to fig. 4, the method for address aggregation according to the present application includes:

s101: a plurality of standard door addresses are obtained. In a specific embodiment, a standard address can be obtained by related tools in the prior art (such as a Baidu map, a Gade map, and the like), and the standard address generally comprises a standard address and a longitude and latitude related to the standard address. The standard address requires that the address must be accurate to four levels (including province, city, district, county, town and town streets), the house number corresponds to the AOI, the administrative division information corresponds to the detailed address, and the address standard is clean and has no errors.

S102: the plurality of standard gate addresses are converted into a plurality of structured addresses.

In one embodiment of the present application, the standard portal address may be translated into a structured address by a word segmentation tool, specifically, the standard door address is subjected to word segmentation, the aim is to extract the place name information in the standard door address, semantic labeling information is added to each place name information (the labeled contents mainly comprise provincial administrative district prov, regional administrative district city, county administrative district, country administrative district town, development district devZone, community/village committee communication, main road, sub road subRoad, main road number roadNo, sub road number subRoadNo, AOI, smaller-range AOI subAoi, building number houso, unit number cellNo, floor number floorNo, room number roommNo, entity inside a room and the like, and at least road and road number information of an interest area and/or the name of the interest area are included), and finally the place name information is put into a structural template according to the semantic labeling information to obtain the structural address.

Taking the three-street Yunhong pool area 866 of Zhejiang university in Hangzhou West lake area in Zhejiang province as an example, the structural address obtained by conversion is the name of Zhejiang province/province, Hangzhou city/city, West lake area/area, three-street/street, Yuhangtang pool area/road, 866/road number, and Zhejiang university Zijing pool area/AOI.

S103: and performing feature extraction on the plurality of structured addresses to obtain a plurality of feature sets corresponding to the plurality of standard addresses. Since the addresses are aggregated together according to the AOI dimension of the interest areas, the feature extraction device needs to extract features by taking the AOI as the center, and the extracted feature quantity is greatly increased by combining administrative division information (province, city, county, street, community and the like) and the core determinants of the AOI. The feature set thus includes region-of-interest attribute information, which includes at least road and route number information and/or region name of the region. One standard portal address is converted into one structural address, and one structural address is subjected to feature extraction to obtain a feature set.

After the standard address is structured, the standard address is converted into a structured object, so that the feature extraction can be realized by combining fields in the structured object, namely the feature can be templated. In one embodiment of the present application, the features in the structured address can be extracted by defining the feature template in advance. For example, if features in a feature template are all present in a structured address, the structured address may be translated into a plurality of features, and if a feature is not present, the feature is not output into the feature set. A plurality of features are predefined in the feature template.

The feature template is composed of member fields of address structured objects, features are extracted only when all the member fields of the structured objects contained in the template are not null, and otherwise, the extracted result is null. Since the object of this application is to group packages together according to the AOI dimension, AOI-centric feature extraction is necessary (specifically, each feature must contain at least information that can determine AOI).

By combining administrative division information (province, city, district, street, community, etc.) and the core determinants of AOI, the amount of extracted features is greatly increased. In one embodiment of the present application, the structure definition of the feature template may be as shown in table 1 below. The embodiment shown in table 1 is based on template-based ngram (template-based N-tuple model) feature extraction, 6 templates are predefined, and the structured address is "zhejiang/prev hang city/city hang hough/discrict warehouse street good road 999/townsgood road/road 999/roadNo good international building/AOI office/AOI category" with "the top part of the slash is semantic labeling information, identifying the structural information of the current word in the address, the feature extracted based on the feature template of table 1 is shown on the right side of table 1.

In another embodiment of the present application, the preset template may further include other information, such as an ambiguity level, as shown in table 2. In the example of the feature template shown in table 2, the template-based N-gram includes 6N-grams, the first column is a N-gram of the feature template, the second column is an ambiguity level, which can indicate whether the feature extracted by the N-gram can uniquely determine a specific address, and how ambiguous, 0 indicates complete ambiguity (if two AOI addresses include such feature, it can be directly determined as a synonymous AOI address), 1 indicates slight ambiguity, and 2 indicates greater ambiguity, which can in turn define ambiguity levels. The features extracted based on the feature template of table 2 are shown on the right side of table 2. In the embodiment shown in table 2, the extracted ambiguity corresponding to a plurality of features in the feature set corresponding to the standard portal address is the ambiguity of the N-gram model, such as: the ambiguity level of the Ngram feature templates road, roadNo and AOI in table 2 is set to 1, and the ambiguities of the features "good circuit", "999 number" and "happy international building" in the feature set extracted in the embodiment are all 1, that is, slightly ambiguous.

Referring to fig. 4, the method further includes step S104: and determining the similarity between any two standard addresses in the standard addresses according to a plurality of feature sets corresponding to the standard addresses. In one embodiment of the present application, the similarity between two standard addresses is determined sequentially by a similarity formula. Specifically, the similarity formula may be a jaccard similarity (formula 1), a cosine similarity (formula 2), a formula 3, or a formula 4. The similarity of the feature sets corresponding to the two standard dooraddresses is calculated based on Jaccard (jaccard) similarity, the feature vectors of the two feature sets are represented by A and B, and a Jaccard similarity calculation method is shown in a formula 1. Formula 2 is euclidean distance (cosine identity), and formula 3 is that the number of feature intersections between two feature sets is divided by the number of smaller feature intersections in the two sets to obtain a similarity score. Formula 4 is the similarity score obtained by dividing the number of feature intersections between the two feature sets by the number of the larger feature intersections in the two sets.

S105: and aggregating the plurality of standard addresses according to the similarity to obtain a plurality of clusters.

In one embodiment of the present application, step S105 includes:

s201: in an embodiment of the present application, when the similarity between two standard addresses calculated in S104 is greater than or equal to a preset threshold, the two standard addresses are used as the similar standard addresses. In an actual use process, when the similarity between two standard addresses is determined according to formula 1 in S104, the preset threshold may be 0.33, and when the similarity is greater than or equal to 0.33, the two standard addresses are considered as similar standard addresses.

In another embodiment of the present application, when there is the same feature in the feature set corresponding to two standard addresses and the ambiguity level of the same feature is unambiguous, the two standard addresses are used as similar standard addresses. That is, in the embodiment shown in table 2, if there is the same feature in the feature set corresponding to each of the two standard addresses and the ambiguity level of the same feature is 0, that is, there is no ambiguity, it indicates that the two standard addresses can be merged.

In another embodiment of the present application, when there is a synonymous region of interest in a feature set corresponding to two standard door addresses, the two standard door addresses are used as similar standard door addresses. That is, if the names of the two AOIs are synonymous under the four-level administrative division, the feature sets in which they are located may be merged.

Specifically, in the stream-ending solicitation scenario shown in fig. 1, assuming that 8 packages with numbers 1, 2, and … … 8 are shared, 8 standard addresses can be obtained according to the package addresses, and the standard address corresponding to each standard address can be determined by the similar address determining module, as shown in table 3.

S202: judging whether each standard gate address and the corresponding similar standard gate address are in the cluster, if not, executing S203, otherwise, executing S204;

s203: adding the standard door address and the corresponding similar standard door address into a newly built cluster;

s204: adding the standard door address and the corresponding similar standard door address into the cluster;

s205: and adding the standard gate address and the corresponding similar standard gate address into the cluster.

In a specific implementation manner, whether each standard gate address and the corresponding similar standard gate address are in a cluster or not can be sequentially judged by traversing all the standard gate addresses, and when the standard gate addresses and the corresponding similar standard gate addresses do not exist, it is indicated that the standard gate addresses and the corresponding similar standard gate addresses do not have a cluster yet, so that the standard gate addresses and the corresponding similar standard gate addresses need to be added into a newly-built cluster, otherwise, the standard gate addresses and the corresponding similar standard gate addresses are added into the cluster. In the embodiment shown in table 1, when the number 1 is traversed to the number 8, the standard gate address with the number 1 and the corresponding similar standard gate addresses 2 and 6 are not added to the cluster, so a cluster is newly created, when the number of the cluster is set to 1, the clusters sometimes have 1, 2 and 6, and then the number is traversed to the number 8, and finally two clusters are obtained by aggregation, as shown in table 4.

In the embodiment shown in fig. 1, after the address aggregation device of the present application processes the packages, 8 packages are finally divided into two interest areas, wherein one interest area has 6 packages, and the other interest area has 2 packages, so that in the logistics scene shown in fig. 1, the packages are stacked according to the interest areas, and the efficiency of automatic sorting, collecting and dispatching of the packages can be greatly improved.

In another embodiment of the present application, S105 may sequentially find the standard gate addresses to which each standard gate address is similar through a double-loop traversal, and then merge the similar clusters together through a cluster merging algorithm. And merging the clusters together through a cluster merging algorithm, wherein the specific algorithm is to sequentially traverse the list of each standard gate address, find whether the key of each standard gate address is in the cluster list, and if so, merge the standard gate address and the corresponding cluster in the cluster list to finally obtain the aggregated cluster.

In the first embodiment, standard addresses of the same area are grouped under one cluster based on an address aggregation algorithm, and a package aggregation model can be constructed according to the standard addresses in other embodiments of the present invention. Fig. 5 is a flowchart of a second embodiment of a method for address aggregation according to the present application, referring to fig. 5, in the second embodiment, the method further includes:

s106: and naming the obtained clusters to obtain the names of the clusters.

In an embodiment of the present application, the name of the cluster obtained in S106 may be a cluster ID, and as shown in table 4, the cluster ID is used as the name of the cluster.

In one embodiment of the present application, the method further comprises:

s107: when the cluster ID is named as the name of the cluster, establishing a package aggregation model, wherein the package aggregation model is a text model and comprises the following steps: acquiring a plurality of feature sets corresponding to a plurality of standard addresses forming the cluster; associating features in the feature set with the cluster ID; and creating an inverted index according to the cluster ID to form a text model. In the embodiment, the feature grams extracted based on the template are taken as term, the associated cluster ID is taken as document ID to create an inverted index, the weight of each gram to the cluster ID can be calculated by tf-idf, and the gram retrieval function can be used for providing a basis for the subsequent retrieval process. The text model solves the problem that the optimal AOI is directly matched through text similarity: the features are indexed backwards in conjunction with the cluster ID.

In one embodiment of the present application, the method provided by the present application further comprises:

s108: and matching a package aggregation model, specifically, acquiring a communication address of a package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster. Namely, after the text model is successfully established, the package address of a package newly arrived at the logistics terminal acquisition and dispatch position can be directly searched in the text model, and the interest area corresponding to the package address is obtained through text similarity matching with the optimal AOI. The receiving address of the user is generally the result of omitting part of place names of the standard address, that is, the user address is generally the substring of the standard address, and the matching rate of the text model is greatly improved through various combinations in the embodiment.

In one embodiment of the present application, when naming the cluster ID as the name of the cluster, the method provided by the present application further comprises establishing a package aggregation model, which is a longitude and latitude model. The dimension model in this embodiment is mainly to wrap business points around AOI (not matching AOI textually) to nearby AOI, such as points of business nature around AOI like a cell, school, office building, etc., such as convenience store, various types of small stores, restaurant, etc., whose addresses generally do not match the AOI address, but can be gathered in AOI wrapped cluster because of being around AOI.

In this embodiment, establishing the latitude and longitude model includes: acquiring a plurality of standard door addresses forming the cluster, wherein the standard door addresses comprise standard addresses and longitude and latitude; associating the plurality of standard addresses with the cluster ID; determining a central point and a boundary of the interest area according to a plurality of longitudes and latitudes included by the plurality of standard door addresses; establishing a polygon of the interest region according to the central point and the boundary of the interest region; establishing a mesh in the polygon; and establishing a mapping relation between the grids and the cluster IDs to form a longitude and latitude model.

Specifically, the standard door address is associated with a cluster ID, that is, the cluster ID encloses a standard address set, and the standard address has latitude and longitude information, so that a latitude and longitude point set is enclosed, and the average value of the point sets is the center point of the AOI. And the boundaries formed by connecting the outermost points of the longitude and latitude points are AOI polygons. AOI polygons may be represented in a number of ways, such as a mesh.

In an embodiment of the present application, the method further includes a matching step of a package model, specifically, acquiring a communication address of a package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster. Namely, after the longitude and latitude model is successfully established, packages which are subsequently newly arrived at the logistics tail end and are picked up and dispatched can be directly searched in the longitude and latitude model, the optimal AOI is obtained, and the interest area corresponding to the package address is obtained.

In one embodiment of the present application, when naming the cluster ID as the name of the cluster, the method provided by the present application may further include establishing a package aggregation model, which includes establishing a text model and a latitude and longitude model. Therefore, after the text model and the longitude and latitude model are successfully established, packages which are newly arrived at the logistics tail end and are picked up and dispatched are obtained, the package address is converted into the longitude and latitude, an error exists, and the package address of the user is not a standard door address generally, so that the text model is more accurate than the longitude and latitude model. In the embodiment, the text model can be firstly searched when the address on the user package is searched, if the text model has a result, the range is directly determined, and otherwise, the longitude and latitude model is searched.

Therefore, the method provided by the application can identify different expression modes (synonymous AOI addresses) of the same AOI address, so that packages in the same AOI are gathered together, and the packages on the peripheral business addresses of the AOI and the packages inside the AOI are gathered together at the same time, so that the package aggregation efficiency is greatly improved. The packages are gathered together according to the AOI of a community, a school zone, an office building and the like, so that the packages in the AOI are processed on the service, and the service efficiency is greatly improved.

In an embodiment of the present application, S106 may also name the cluster for the convenience of dispatch by a dispatcher, and the cluster name is not a cluster ID for the convenience of identification by the dispatcher.

In one embodiment, S106 includes: acquiring a feature set corresponding to a plurality of standard addresses forming the cluster;

sequentially determining the frequency of each road and the road number information and the frequency of the name according to the feature set;

and taking the feature with the highest frequency as the name of the cluster.

In yet another embodiment of the present application, S106 includes:

acquiring a feature set corresponding to a plurality of standard addresses forming the cluster;

screening out names of a plurality of interest areas from the feature set;

and taking the name of the interest area with the highest actual use frequency as the name of the cluster.

In other embodiments of the present application, the selection criteria for cluster names may also be: the determining factor of AOI must be contained: the AOI name or number of the house is strong in the packaging capacity, namely the belonged scope is large, and the AOI name or number of the house is frequently used by users. In a specific embodiment, this can be done by: counting the word frequency of each gram extracted from the feature template, then filtering out the feature which can not be a cluster name, only keeping the feature gram extracted from three templates of AOI, AOI category, road No, road, RoadNo and AOI, taking the gram feature of frequency topN in the 3 templates, because the road, road No and AOI contain the road, road No and AOI from the template and express the co-occurrence relation of house number and AOI, the co-occurrence relation needs to be added to the features extracted from the road, road No and AOI, so as to realize the traversal of all the grams of the road, road No and AOI templates, and the word frequency of the gram extracted from the templates of the road, road No and AOI is weighted to the gram of the road, road No and AOI. And (3) sorting the grams extracted by taking the AOI as the template according to the AOI types, respectively taking the AOI, the load and the first ranked gram of the load No, and taking the ratio of the frequencies to the load, wherein if the ratio is greater than a set threshold, the cluster is better named by the AOI name, and otherwise, the house number load and the load No are taken.

As described above, the present application performs feature extraction centering on AOI, determines whether different AOI are synonymous based on similarity, and merges synonymous AOI addresses into the same cluster. The method comprises the steps of firstly identifying standard door addresses of the same area based on an address aggregation algorithm, aggregating the door addresses of the same area under one cluster, secondly constructing a package aggregation model, and finally matching a communication address on a logistics package with the package aggregation model to obtain an optimal cluster, so that the packages of the same area are aggregated under the same cluster.

The present application further provides a method of package aggregation, the method comprising:

acquiring a plurality of standard door addresses;

and acquiring a communication address of the package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster.

In one embodiment of the present application, the method further comprises:

and naming the clusters to obtain the cluster names, wherein the cluster names comprise cluster IDs.

Creating a package aggregation model for the plurality of clusters, respectively, in one embodiment comprises:

acquiring a plurality of feature sets corresponding to a plurality of standard addresses forming the cluster;

associating features in the feature set with the cluster ID;

and creating an inverted index according to the cluster ID to form a package aggregation model, wherein the package aggregation model is a text model.

In the embodiment, the feature grams extracted based on the template are taken as term, the associated cluster ID is taken as document ID to create an inverted index, the weight of each gram to the cluster ID can be calculated by tf-idf, and the gram retrieval function can be used for providing a basis for the subsequent retrieval process. The text model solves the problem that the optimal AOI is directly matched through text similarity: features are indexed inversely in conjunction with cluster IDs. When the text model is successfully established, the packages arriving at the logistics tail end subsequently are picked up, the communication addresses of the packages can be directly searched in the text model, and the interest areas corresponding to the package addresses are obtained through text similarity matching with the optimal AOI. The receiving address of the user is generally the result of omitting part of place names of the standard address, that is, the user address is generally the substring of the standard address, and the matching rate of the text model is greatly improved through various combinations in the embodiment.

the method comprises the steps of obtaining a plurality of standard door addresses forming the cluster, wherein the standard door addresses comprise standard addresses and longitudes and latitudes corresponding to the standard addresses, associating the standard door addresses with the cluster ID, determining a central point and a boundary of an interest area according to the plurality of longitudes and latitudes included by the standard door addresses, establishing a polygon of the interest area according to the central point and the boundary of the interest area, establishing grids in the polygon, and establishing a mapping relation between the grids and the cluster ID to form a package aggregation model, wherein the package aggregation model is a longitude and latitude model.

It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

Although the present application provides method steps as described in an embodiment or flowchart, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in processes, methods, articles, or apparatus that include the recited elements is not excluded.

In the 90's of the 20 th century, improvements to a technology could clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements to process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is an integrated circuit whose Logic functions are determined by a user programming the Device. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in purely computer readable program code means, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. For the person skilled in the art

In other words, various modifications and changes are possible in the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of address aggregation, the method comprising:

acquiring a plurality of standard door addresses;

performing feature extraction on the plurality of structured addresses to obtain a plurality of feature sets corresponding to the plurality of standard portal addresses, wherein the feature sets comprise attribute information of interest areas;

aggregating the plurality of standard addresses according to the similarity to obtain a plurality of clusters;

aggregating the plurality of standard addresses according to the similarity to obtain a plurality of clusters, including:

determining a similar standard door address of each standard door address;

judging whether each standard gate address and the corresponding similar standard gate address are in a cluster or not;

if not, adding the standard door address and the corresponding similar standard door address into a newly built cluster;

otherwise, adding the standard door address and the corresponding similar standard door address into the cluster;

taking the newly-built clusters and the clusters as a plurality of clusters obtained after aggregation;

performing feature extraction on the plurality of structured addresses to obtain a plurality of feature sets corresponding to the plurality of standard addresses, including: feature extraction is carried out on the structured address through an N-element model based on a template, the N-element model comprises an ambiguity grade, and the ambiguity corresponding to a plurality of features in the feature set corresponding to the standard portal address obtained through extraction is the ambiguity of the N-element model.

2. The method of claim 1, wherein translating the plurality of standard addresses into the plurality of structured addresses comprises:

extracting place name information in the standard door address;

annotating semantic labeling information for each place name information, wherein the semantic labeling information at least comprises road and route number information of an interest area and/or a name of the interest area;

and putting the place name information into a structured template according to the semantic annotation information to obtain a structured address.

3. The method of claim 1, wherein determining the similarity between any two of the plurality of standard addresses according to the plurality of feature sets corresponding to the plurality of standard addresses comprises: and determining the similarity between any two standard door addresses in the plurality of standard door addresses through a similarity formula.

4. The method of claim 3, wherein determining similar standard addresses for each standard address comprises: when the same characteristics exist in the characteristic sets corresponding to the two standard door addresses and the ambiguity level of the same characteristics is unambiguous; or when the similarity between the two standard door addresses is not less than a preset threshold value; or when the feature sets corresponding to the two standard door addresses have synonymous interest areas, taking the two standard door addresses as similar standard door addresses.

5. The method according to any one of claims 3 to 4, wherein the region of interest attribute information comprises road and route number information of a region of interest and/or a name of the region of interest, and the method further comprises naming the plurality of clusters to obtain the names of the clusters.

6. The method of claim 5, wherein the name of the cluster comprises a cluster ID, the method further comprising:

associating features in the feature set with the cluster ID;

7. The method of claim 5, wherein the name of the cluster comprises a cluster ID, the method further comprising:

acquiring a plurality of standard addresses forming the cluster, wherein the standard addresses comprise standard addresses and longitudes and latitudes corresponding to the standard addresses;

associating the plurality of standard addresses with the cluster ID;

determining a central point and a boundary of the interest area according to a plurality of longitudes and latitudes included by the plurality of standard addresses;

establishing a polygon of the interest area according to the central point and the boundary of the interest area;

establishing a mesh in the polygon;

and establishing a mapping relation between the grids and the cluster ID to form a package aggregation model, wherein the package aggregation model is a longitude and latitude model.

8. The method according to any one of claims 6 or 7, further comprising: acquiring a communication address of a package; matching the communication address with the package aggregation model to obtain a cluster corresponding to the package; aggregating the package under the cluster.

9. The method of claim 8, wherein naming the plurality of clusters comprises:

sequentially determining the frequency of each road and road number information and the frequency of the area names according to the feature set;

and taking the feature with the highest frequency as the name of the cluster.

10. The method of claim 8, wherein naming the plurality of clusters comprises:

screening a plurality of area names from the feature set;

and taking the area name with the highest actual use frequency as the name of the cluster.

11. A method of parcel polymerization, comprising:

acquiring a plurality of standard door addresses;

acquiring a communication address of a package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster;

determining a similarity between any two of the plurality of standard addresses comprises:

determining a similar standard door address of each standard door address;

judging whether each standard gate address and the corresponding similar standard gate address are in the cluster;

otherwise, adding the standard gate address and the corresponding similar standard gate address into the cluster;

12. The method of claim 11, further comprising:

13. The method of claim 12, wherein creating a package aggregation model for the plurality of clusters, respectively, comprises:

associating features in the feature set with the cluster ID;

14. The method of claim 12, wherein creating a package aggregation model for the plurality of clusters, respectively, comprises:

associating the plurality of standard addresses with the cluster ID;

establishing a polygon of the interest region according to the central point and the boundary of the interest region;

establishing a mesh in the polygon;

and establishing a mapping relation between the grids and the cluster IDs to form a package aggregation model, wherein the package aggregation model is a longitude and latitude model.

15. An apparatus for address aggregation, the apparatus comprising:

the characteristic extraction device is used for carrying out characteristic extraction on the plurality of structural addresses to obtain a plurality of characteristic sets corresponding to the plurality of standard dooraddresses, and the characteristic sets comprise interest area attribute information;

the address aggregation device is used for aggregating the standard addresses according to the similarity to obtain a plurality of clusters;

the address aggregation apparatus includes:

the similar door address determining module is used for determining a similar standard door address of each standard door address;

the first adding module is used for adding the standard door address and the corresponding similar standard door address into a newly built cluster;

the second adding module is used for adding the standard gate address and the corresponding similar standard gate address into the cluster;

the address aggregation module is used for adding the standard gate address and the corresponding similar standard gate address into the cluster;

the feature extraction device includes: feature extraction is carried out on the structured address through an N-element model based on a template, the N-element model comprises an ambiguity grade, and the ambiguity corresponding to a plurality of features in the feature set corresponding to the standard portal address obtained through extraction is the ambiguity of the N-element model.

16. The apparatus of claim 15, wherein the address translation device is configured to: extracting place name information in the standard door address; annotating semantic labeling information for each place name information, wherein the semantic labeling information at least comprises road and route number information of an interest area and/or a name of the interest area; and putting the place name information into a structured template according to the semantic annotation information to obtain a structured address.

17. The apparatus according to claim 16, wherein the similarity determining means comprises: and determining the similarity between any two standard addresses in the plurality of standard addresses through a similarity formula.

18. The apparatus of claim 17, wherein the similar portal address determination module is configured to: when the same characteristics exist in the characteristic sets corresponding to the two standard door addresses and the ambiguity level of the same characteristics is unambiguous; or when the similarity between the two standard door addresses is not less than a preset threshold value; or when the feature sets corresponding to the two standard door addresses have synonymous interest areas, taking the two standard door addresses as similar standard door addresses.

19. The apparatus according to any one of claims 15 to 18, wherein the region of interest attribute information comprises road and route number information of a region of interest and/or a name of the region of interest, the apparatus further comprising cluster naming means for naming the obtained plurality of clusters to obtain the name of the cluster.

20. The apparatus according to claim 19, wherein the name of the cluster includes a cluster ID, and the apparatus further comprises a package aggregation model building means for obtaining a plurality of feature sets corresponding to a plurality of standard addresses constituting the cluster; associating features in the feature set with the cluster ID; and creating an inverted index according to the cluster ID to form a package aggregation model, wherein the package aggregation model is a text model.

21. The apparatus of claim 19, wherein the name of the cluster comprises a cluster ID, the apparatus further comprising a package aggregation model building means for obtaining a plurality of standard addresses comprising a standard address and a longitude and latitude corresponding to the standard address; associating the plurality of standard addresses with the cluster ID; determining a central point and a boundary of the interest area according to a plurality of longitudes and latitudes included by the plurality of standard addresses; establishing a polygon of the interest region according to the central point and the boundary of the interest region; establishing a mesh in the polygon; and establishing a mapping relation between the grids and the cluster IDs to form a package aggregation model, wherein the package aggregation model is a longitude and latitude model.

22. The apparatus according to any one of claims 20 or 21, further comprising a model matching device configured to obtain a communication address of a package, match the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregate the package under the cluster.

23. The apparatus of claim 22, wherein the cluster naming means comprises:

the characteristic set acquisition module is used for acquiring a characteristic set corresponding to a plurality of standard gate addresses forming the cluster;

the frequency determining module is used for sequentially determining the frequency of each road and the road number information and the frequency of the name according to the characteristic set;

24. The apparatus of claim 22, wherein the cluster naming means comprises:

25. An apparatus for parcel aggregation, the apparatus comprising:

the model matching device is used for acquiring a communication address of a package, matching the communication address with the package aggregation model to obtain a cluster corresponding to the package, and aggregating the package under the cluster;

the address aggregation apparatus includes:

the address conversion module is used for converting the standard addresses into a plurality of structured addresses;

the characteristic extraction module is used for carrying out characteristic extraction on the plurality of structural addresses to obtain a plurality of characteristic sets corresponding to the plurality of standard dooraddresses, and the characteristic sets comprise interest area attribute information;

the similarity determining module is used for determining the similarity between any two standard door addresses in the standard door addresses according to a plurality of feature sets corresponding to the standard door addresses;

the address aggregation device is specifically configured to:

determining a similar standard door address of each standard door address;

the feature extraction module is specifically configured to: feature extraction is carried out on the structured address through an N-element model based on a template, the N-element model comprises an ambiguity grade, and the ambiguity corresponding to a plurality of features in the feature set corresponding to the standard portal address obtained through extraction is the ambiguity of the N-element model.

26. The apparatus of claim 25, further comprising:

27. The apparatus of claim 26, wherein the package aggregation model building means comprises:

the characteristic set acquisition module is used for acquiring a plurality of characteristic sets corresponding to a plurality of standard gate addresses forming the cluster;

and the index creating module is used for creating an inverted index according to the cluster ID to form a package aggregation model, and the package aggregation model is a text model.

28. The apparatus of claim 26, wherein the package aggregation model building means comprises:

a portal address association module for associating the plurality of standard portal addresses with the cluster ID;

a mesh establishing module for establishing a mesh in the polygon;