CN112818684B - Address element ordering method and device, electronic equipment and storage medium - Google Patents

Address element ordering method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112818684B
CN112818684B CN202110126026.0A CN202110126026A CN112818684B CN 112818684 B CN112818684 B CN 112818684B CN 202110126026 A CN202110126026 A CN 202110126026A CN 112818684 B CN112818684 B CN 112818684B
Authority
CN
China
Prior art keywords
address
address information
cluster
processed
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110126026.0A
Other languages
Chinese (zh)
Other versions
CN112818684A (en
Inventor
周筠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xunmeng Information Technology Co Ltd
Original Assignee
Shanghai Xunmeng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xunmeng Information Technology Co Ltd filed Critical Shanghai Xunmeng Information Technology Co Ltd
Priority to CN202110126026.0A priority Critical patent/CN112818684B/en
Publication of CN112818684A publication Critical patent/CN112818684A/en
Application granted granted Critical
Publication of CN112818684B publication Critical patent/CN112818684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/387Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Abstract

The invention provides an address element ordering method, an address element ordering device, electronic equipment and a storage medium, wherein the address element ordering method comprises the following steps: acquiring address information to be processed; extracting a plurality of element fields from the address information to be processed; calculating the reverse file frequency of each element field; and ordering the element fields of the address information to be processed from large to small based on the reverse file frequency. The invention realizes the importance sorting of the address elements, is convenient for providing important new sorting and related parameters in the subsequent address information processing algorithm, optimizes the address information processing algorithm, and improves the logistics experience of users.

Description

Address element ordering method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer applications, and in particular, to a method and apparatus for ordering address elements, an electronic device, and a storage medium.
Background
Currently, in the logistic scenario, for a shipping address given by a user, the shipping address needs to be identified and processed, such as calculating a logistic route, distributing a receiver, distributing a dispatcher, and the like. In various algorithms for address processing, the importance of the address element is an important component of the algorithm, and the ability to score the importance of the address element is also the basis of various algorithms.
Therefore, how to realize the importance sorting of the address elements, so as to provide important new sorting and related parameters for the subsequent address information processing algorithm, and optimize the address information processing algorithm, thereby improving the logistics experience of users is a technical problem to be solved in the field.
Disclosure of Invention
In order to overcome the defects of the related art, the invention provides an address element sorting method, an address element sorting device, electronic equipment and a storage medium, so that importance sorting of address elements is realized, important new sorting and related parameters are conveniently provided in a subsequent address information processing algorithm, and the address information processing algorithm is optimized, so that the logistics experience of a user is improved.
According to one aspect of the present invention, there is provided an address element ordering method, including:
Acquiring address information to be processed;
Extracting a plurality of element fields from the address information to be processed;
Calculating the reverse file frequency of each element field;
And ordering the element fields of the address information to be processed from large to small based on the reverse file frequency.
In some embodiments of the invention, said calculating the reverse file frequency for each of said element fields comprises:
for each of the element fields:
counting the number n of address information of the element field in an address information base;
The element field reverse file frequency F is calculated according to the following formula:
F=lg(N/n),
where N is the total number of address information in the address information base.
In some embodiments of the invention, the address information base is constructed according to the following steps:
acquiring a historical logistics order;
acquiring a receiving address and a sending address of the historical logistics order;
merging the destination address and the sender address representing the same address information;
And constructing the address information base according to the combined destination address and the combined sender address.
In some embodiments of the present invention, after constructing the address information base, the method further includes:
Taking each piece of address information in the address information base as the address information to be processed so as to obtain the ordering of the element fields of each piece of address information;
The sequence of the element fields of each address information is formed into a field sequence which is associated with the address information and is stored in the address information base;
A plurality of class clusters obtained by clustering the element fields according to the first m element fields of the field sequence of the address information base, wherein each class cluster is associated with a sub-field sequence formed by m element fields, and m is an integer greater than or equal to 1;
And dividing an address area according to the plurality of class clusters.
In some embodiments of the invention, the address area is used to indicate pickup pens and/or dispatch pens for logistic packages.
In some embodiments of the invention, further comprising:
Taking the delivery address of the to-be-received logistics package or the delivery address of the to-be-dispatched logistics package as the to-be-processed address;
Acquiring a sub-field sequence of the address to be processed;
determining a class cluster to which a delivery address/a receiving address of each logistics package belongs according to the subfield sequence of the address to be processed so as to determine a pick-up address area or a dispatch address area of the logistics package;
and carrying out pickup/delivery on the logistics package according to the pickup address area or the delivery address area.
In some embodiments of the present invention, the clustering the element fields according to the first m element fields of the field sequence of the address information base includes:
dividing address information with the same field of the first m element fields of the field sequence of each address information of the address information library into quasi-clusters, and associating sub-field sequences formed by the first m element fields;
and executing a merging step for each sub-field sequence associated with the quasi class cluster:
Judging whether the sub-field sequence associated with the quasi-cluster exists in the field sequence associated with another quasi-cluster or not;
if yes, merging the quasi-cluster with another quasi-cluster to form a cluster;
updating the sub-field sequences associated with the class clusters according to the reverse file frequency of the sub-field sequences of the two quasi class clusters;
If not, the quasi class cluster is taken as a class cluster.
In some embodiments of the present invention, the dividing the address area according to the plurality of class clusters includes:
acquiring a receiving address and/or a sending address of a historical logistics order in a preset time period;
Counting the historical logistics order quantity of the receiving address and/or the sending address belonging to each class cluster;
for each of the class clusters:
Judging whether the number of the historical logistics orders of which the receiving address and/or the sending address belong to the class cluster is larger than a preset number threshold value or not;
if yes, the cluster is used as an address area;
If not, taking the class cluster as a candidate class cluster;
for each candidate class cluster, combining a plurality of candidate class clusters into an address area based on the semantic distance and/or the physical distance.
According to still another aspect of the present invention, there is also provided an address element sorting apparatus, including:
the acquisition module is configured to acquire address information to be processed;
The extraction module is configured to extract a plurality of element fields from the address information to be processed;
a calculation module configured to calculate a reverse file frequency for each of the element fields;
And the sorting module is configured to sort the element fields of the address information to be processed from large to small based on the reverse file frequency.
According to still another aspect of the present invention, there is also provided an electronic apparatus including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.
According to a further aspect of the present invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.
Compared with the prior art, the invention has the advantages that:
According to the invention, a plurality of element fields are extracted from the address information to be processed, and the element fields of the address information to be processed are ordered from large to small based on the reverse file frequency of the element fields. Therefore, the importance sorting of the address elements is realized, so that important new sorting and relevant parameters are conveniently provided for the subsequent address information processing algorithm, and the address information processing algorithm is optimized, thereby improving the logistics experience of users.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 shows a flow chart of a method of ordering address elements according to an embodiment of the invention.
Fig. 2 shows a flow chart of building an address information base according to an embodiment of the invention.
Fig. 3 shows a flow chart of partitioning address areas according to a constructed address information base according to an embodiment of the present invention.
Fig. 4 shows a flow chart of region matching for solicitation/dispatch based on address information according to an embodiment of the present invention.
Fig. 5 shows a flow chart of clustering the element fields according to the first m element fields of the field sequence of the address information base according to an embodiment of the invention.
FIG. 6 illustrates a flow chart of partitioning address regions according to the plurality of class clusters according to an embodiment of the present invention.
FIG. 7 shows a block diagram of an address element ordering apparatus according to an embodiment of the invention.
Fig. 8 schematically illustrates a computer-readable storage medium according to an exemplary embodiment of the present invention.
Fig. 9 schematically illustrates an electronic device according to an exemplary embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
In various embodiments of the present invention, the address element ordering method provided in the present invention may be applied to a logistics platform, an electronic commerce platform, or any platform where a third party needs to use address information, but the application scenario of the present invention is not limited thereto, and is not described herein.
FIG. 1 shows a flow chart of a method of ordering address elements according to an embodiment of the invention. The address element ordering method comprises the following steps:
step S110: and obtaining the address information to be processed.
Specifically, the address information to be processed may be address information input by the user, may be historical address information input by the user, or may be address information extracted according to the user identifier/order identifier in the system, which is not limited in the present invention. In different application scenarios, different ways of obtaining the address information to be processed are all within the protection scope of the invention.
Step S120: and extracting a plurality of element fields from the address information to be processed.
Specifically, step S120 may acquire only the element field, or may acquire the element field and its address label (e.g., province, city, district, etc.).
Specifically, in step S120, a word segmentation algorithm, a labeling algorithm, etc. may be used to extract a plurality of element fields from the address information to be processed. The word segmentation algorithm and the labeling algorithm can be word segmentation based on a word segmentation algorithm of character string matching. Based on character string matching, namely scanning the character string, if the substring of the character string is found to be identical with the word in the address standard word stock, the matching is realized. Such segmentation typically incorporates some heuristic rules such as "forward/reverse max match", "long word first", etc. The word segmentation algorithm and the labeling algorithm can also be used for word segmentation and labeling based on a word segmentation method of statistics and machine learning. The word segmentation method based on statistics and machine learning models the Chinese based on manually marked parts of speech and statistical features, namely training model parameters according to observed data (marked corpus). Those skilled in the art may implement many different word segmentation algorithms and labeling algorithms, and will not be described in detail herein.
Further, the invention can set the number of element fields required to be acquired as required. The invention can set the element field of the corresponding setting address label (the setting address label can be the interest point, the road name, the town name, the street name, etc.) which needs to be obtained according to the requirement. The invention is not limited in this regard.
Step S130: and calculating the reverse file frequency of each element field.
Specifically, the reverse file frequency of each of the element fields may be calculated as follows:
Specifically, the reverse document frequency (inverse document frequency, abbreviated as IDF) is an index for measuring the importance of words.
In the N documents, the weight of a certain word is inversely proportional to the number of articles document frequency (document frequency, abbreviated as DF) of the words in the N documents, so that a calculation formula of the reverse document frequency can be defined as f=lg (N/N) as a logarithmic function. Thus, in step S130, for each of the element fields, the number n of address information in which the element field exists in an address information base is counted. The element field reverse file frequency F is calculated according to the following formula: f=lg (N/N), where N is the total number of address information in the address information library.
Step S140: and ordering the element fields of the address information to be processed from large to small based on the reverse file frequency.
Specifically, step S140 uses the reverse file frequency as the importance weight of each element field in the address information, and thus, each element field can be sorted from large to small based on the importance weight. In other words, in the present embodiment, the lower the probability that an element field appears in address information, the higher its importance, the more can be used as important information of the address information to distinguish other address information.
In the address element ordering method provided by the invention, a plurality of element fields are extracted from the address information to be processed, and the element fields of the address information to be processed are ordered from large to small based on the reverse file frequency of each element field. Therefore, the importance sorting of the address elements is realized, so that important new sorting and relevant parameters are conveniently provided for the subsequent address information processing algorithm, and the address information processing algorithm is optimized, thereby improving the logistics experience of users.
Referring now to FIG. 2, FIG. 2 illustrates a flow chart for building an address information library according to an embodiment of the present invention. Fig. 2 shows the following steps in total:
step S101: a historical logistics order is obtained.
In particular, when applied to a logistics platform, the historical logistics orders may refer to the logistics orders generated by the logistics platform. When applied to an e-commerce scenario, the historical logistics order may refer to a logistics order generated by the e-commerce platform and associated with a shopping order or a logistics order generated by a logistics platform acquired after the e-commerce platform interacts with the logistics platform. Therefore, the historical logistics orders can be acquired from the server of the logistics platform or the server of the electronic commerce platform, the invention is not limited by the method, and different acquisition modes, different data sources and different meanings of the logistics orders in different scenes are all within the protection scope of the invention.
Step S102: and acquiring the receiving address and the sending address of the historical logistics order.
Specifically, since the receiving address and the sending address are required to be used in the address information processing, step S102 obtains the receiving address and the sending address of each history flow order.
Step S103: and merging the destination address and the sender address which represent the same address information.
Specifically, in multiple historical logistics orders, there may be multiple identical ship-to addresses, and identical ship-to addresses. The present embodiment processes only the respective different addresses, and therefore, the same destination address and destination address acquired in step S102 are combined in step S103 to reduce the data amount of the address information used for constructing the address information base.
Step S104: and constructing the address information base according to the combined destination address and the combined sender address.
Specifically, the combined destination address and the combined sender address can be directly used as address information in the address information base, so that the construction of the address information base is realized. The invention is not limited in this regard.
Thus, the address information base is constructed by the address information of the historical logistics orders through the steps S101 to S104, so as to be suitable for the address information to be processed acquired in the logistics scene.
Referring now to fig. 3, fig. 3 illustrates a flow chart of partitioning address regions according to a constructed address information base, according to an embodiment of the present invention. Specifically, fig. 3 shows the steps after construction of the address information base:
Step S105: and taking each piece of address information in the address information base as the address information to be processed so as to obtain the ordering of the element fields of each piece of address information.
Specifically, step S105 may perform steps S110 to S140 shown in fig. 1 for each address information in the address information base to obtain the ordering of the element fields of each of the address information.
Step S106: the sequence of the element fields of each address information is formed into a field sequence which is stored in the address information base in association with the address information.
Step S107: and clustering the element fields according to the first m element fields of the field sequence of the address information base to obtain a plurality of class clusters, wherein each class cluster is associated with a sub-field sequence formed by m element fields, and m is an integer greater than or equal to 1.
Specifically, m can be set as required, and when m is smaller, the number of class clusters is smaller, and the address information contained in each class cluster is more; when m is larger, the number of class clusters is larger, and each class cluster contains less address information.
Step S108: and dividing an address area according to the plurality of class clusters.
Thus, the element fields of the address information in the address information base can be sorted by multiplexing the address element sorting method in steps S105 to S108. Meanwhile, since the ordering is performed according to the importance of the element fields, the earlier the ordering is, the more the address information can be distinguished. Based on the sorting mode, the address information in the address information base is clustered according to the first m fields to obtain a plurality of class clusters (the plurality of address information in the same class cluster has the same first m fields with higher importance so as to be distinguished from other address information, and the address information with the same first m fields with higher importance is usually located on the same interest point, cell and road section), thereby realizing the division of the address area. The division of the address area may be used up to the stream pickup, dispatch, transportation, etc. Further, the size of the obtained address area may be adjusted by adjusting the size of m. In general, the smaller m, the smaller the address area obtained; the larger m, the larger the address area obtained.
In some embodiments of the present invention, the address area obtained by the division of step S108 is used to indicate the pickup fence and/or the dispatch fence of the logistic package. In other words, the address area of the same collecting fence can be collected by the same collecting website/collecting staff; the address area of the same dispatch fence may be dispatched by the same dispatch website/dispatcher.
Referring now to fig. 4, fig. 4 is a flow chart illustrating region matching for solicitation/dispatch based on address information according to an embodiment of the present invention. Fig. 4 shows the following steps in total:
Step S150: and taking the delivery address of the to-be-received logistics package or the delivery address of the to-be-dispatched logistics package as the to-be-processed address.
Step S160: and acquiring the sub-field sequence of the address to be processed.
Specifically, step S160 corresponds to performing element field sorting from step S110 to step S140 on the delivery address of the package to be picked up or the delivery address of the package to be dispatched, and obtaining a sub-field sequence according to the sorting result. The sub-field sequence may include only m fields, and the present invention is not limited thereto, and other numbers of fields are also within the scope of the present invention.
Step S170: and determining the class cluster of the delivery address/receiving address of each logistics package according to the sub-field sequence of the address to be processed so as to determine the package address area or the dispatch address area of the logistics package.
Specifically, since each cluster is obtained by clustering the first m fields of the field sequence of the address information, each cluster can associate the first m fields of the address information under the cluster. Therefore, the matching of the first m fields of the sub-field sequence and m fields associated with the class cluster obtained in step S160 can be used to determine the class cluster to which the sub-field sequence belongs, so as to determine which class cluster address area the sender address/receiver address of the logistic package corresponding to the sub-field sequence is located in.
Thus, step S170 may determine the package address area based on the package address information. Step S170 may also determine a dispatch address area based on the recipient address information.
Step S180: and carrying out pickup/delivery on the logistics package according to the pickup address area or the delivery address area.
Specifically, step S180 may determine a pickup website/pickup member according to the pickup address area determined based on the delivery address information in step S170. Step S180 may also determine a delivery website/delivery person according to the delivery address area determined based on the recipient address information in step S170. In a further embodiment of the present invention, the path planning of the package/dispatch may also be performed based on the package address area/dispatch address area.
When the package of the package collection logistics occurs, the sorting field of the package collection address/delivery address can be obtained according to the sorting method, so that the sub-field sequence is obtained, cluster matching is performed, the package collection/delivery area is determined, and package collection/delivery guidance is facilitated.
Referring now to fig. 5, fig. 5 shows a flowchart of clustering the element fields according to the first m element fields of the field sequence of the address information base according to an embodiment of the present invention. Fig. 5 shows the following steps in total:
step S1071: dividing the address information with the same field of the first m element fields of the field sequence of each address information of the address information library into quasi-clusters, and associating the sub-field sequences formed by the first m element fields.
Specifically, considering the clustering mode of the sub-field sequences formed based on the first m element fields, the number of address information in the class clusters cannot be controlled accurately. Therefore, in the present embodiment, the class cluster obtained based on the sub-field sequence cluster formed by the first m element fields is first used as the quasi class cluster to be subjected to the subsequent further processing steps.
Step S1072: and executing a merging step on the sub-field sequence associated with each quasi class cluster.
The merging step comprises the following steps: judging whether the sub-field sequence associated with the quasi-cluster exists in the field sequence associated with another quasi-cluster. If the sub-field sequence associated with the quasi-cluster exists in the field sequence associated with another quasi-cluster, merging the quasi-cluster and the other quasi-cluster to form a class cluster, and updating the sub-field sequence associated with the class cluster according to the reverse file frequency of the sub-field sequences of the two quasi-clusters. And if the sub-field sequence associated with the quasi-cluster does not exist in the field sequence associated with another quasi-cluster, taking the quasi-cluster as a cluster.
Specifically, in step S1072, partial address information insufficiency is considered, resulting in that the sub-field sequence of a partial cluster is in the subsequent field sequence of another cluster, so that the two clusters can be combined to form a new cluster, thereby avoiding the situation of region division errors. Further, when the quasi-cluster is merged with another quasi-cluster to form a cluster, reordering (merging repeated fields) can be performed according to the reverse file frequency of each field in the sub-field sequences of the two quasi-clusters, so that the sub-field sequences (m fields still ordered in front) can be obtained based on reordering, and thus, the storage requirement of the sub-field sequences associated with the clusters can be reduced. In other embodiments, the sub-field sequences of the multiple quasi-class clusters before the merging can be associated with the class clusters after the merging, so that the matching accuracy of the subsequent sub-field sequences can be improved.
Further, step S1072 may be performed iteratively. For example, when the quasi cluster is combined with another quasi cluster to form a new quasi cluster, step S1072 is performed again until no sub-field sequence associated with the quasi cluster exists in each quasi cluster. Thus, complete merging of class clusters is achieved.
Referring now to fig. 6, fig. 6 illustrates a flow chart of partitioning address regions according to the plurality of class clusters, according to an embodiment of the present invention. Fig. 6 shows the following steps:
Step S1081: the method comprises the steps of acquiring a receiving address and/or an issuing address of a historical logistics order within a preset time period.
Step S1082: and counting the historical logistics order quantity of the receiving address and/or the sending address belonging to each class cluster.
Step S1083: and judging whether the number of the historical logistics orders of which the receiving address and/or the sending address belong to the class cluster is larger than a preset number threshold value or not for each class cluster.
If the number of the historical logistics orders of the receiving address and/or the sending address belonging to the cluster is greater than the predetermined number threshold, step S1084 is executed: the cluster is used as an address area.
If the number of the historical logistics orders of the receiving address and/or the sending address belonging to the cluster is not greater than the predetermined number threshold, step S1085 is executed: taking the class cluster as a candidate class cluster;
step S1086: for each candidate class cluster, combining a plurality of candidate class clusters into an address area based on the semantic distance and/or the physical distance.
Specifically, the above steps S1081 to S1085 determine whether each cluster can be used alone as an address area or need to be merged as an address area based on the history flow order volume statistics. Thus, it can be determined whether or not the address area divided according to the class cluster is reasonable based on the historic flow order quantity belonging to the class cluster.
In a specific implementation, when the number of the historical logistics orders of the receiving address and/or the sending address belonging to the cluster is greater than a predetermined number threshold, the division of the address area of the cluster is reasonable, so that the cluster can be used as an address area. Specifically, the predetermined number of thresholds may be set as needed, for example, may be set according to a minimum daily average amount of pickup (or a minimum amount of pickup in a set period of time) of the pickup fence of the pickup member represented by the address area, so as to ensure that the number of historical logistics orders in the address area can meet the workload requirement of the pickup member/pickup member of the pickup member fence. For another example, the minimum daily average dispatch amount (or the minimum dispatch amount in a set time period) of the dispatch fence represented by the address area can be set, so that the historical logistics order number of the address area can be ensured to meet the workload requirement of the dispatch fence.
And when the number of the historical logistics orders of the receiving address and/or the sending address belongs to the cluster is not larger than a preset number threshold, the number of the historical logistics orders of the cluster cannot meet the workload requirement of the pick-up fence/the sending fence, so that the cluster needs to be combined, and a proper address area is determined. In the above embodiment, the appropriate address area may be determined by merging a plurality of candidate class clusters into one address area based on the semantic distance and/or the physical distance. Specifically, after merging into the address area, the determination in step S1083 may be performed again on the address area, so as to determine whether the address area is suitable, and if not, merging needs to be performed again. Further, in the present embodiment, since the sub-field sequences are already ordered based on the importance of the element fields, in general, the closer the semantic distance, the closer the actual distance of the two address information. Step S1086 may thus perform cluster-like merging based on the semantic distance and/or the physical distance of the sub-field sequences of the clusters. The present invention can also realize more ways of combining cluster changes, which are not described herein.
The above are merely a plurality of specific implementations of the address element ordering method of the present invention, and each implementation may be implemented independently or in combination, which is not limited by the present invention. Further, the flow chart of the present invention is merely illustrative, and the execution order of steps is not limited thereto, and the splitting, merging, sequential exchange, and other synchronous or asynchronous execution of steps are all within the scope of the present invention.
Referring now to FIG. 7, FIG. 7 is a block diagram illustrating an address element ordering apparatus according to an embodiment of the invention. The address element ordering device 200 includes an acquisition module 210, an extraction module 220, a calculation module 230, and an ordering module 240.
The acquisition module 210 is configured to acquire address information to be processed;
the extracting module 220 is configured to extract a plurality of element fields from the address information to be processed;
the calculation module 230 is configured to calculate a reverse file frequency for each of the element fields;
the sorting module 240 is configured to sort the element fields of the address information to be processed from large to small based on the reverse file frequency.
In the address element sorting apparatus of the exemplary embodiment of the present invention, the element fields of the address information to be processed are sorted from large to small by extracting a plurality of element fields from the address information to be processed and based on a reverse file frequency of each element field. Therefore, the importance sorting of the address elements is realized, so that important new sorting and relevant parameters are conveniently provided for the subsequent address information processing algorithm, and the address information processing algorithm is optimized, thereby improving the logistics experience of users.
Fig. 7 is a schematic illustration only, and the address element sorting apparatus 200 provided by the present invention is not limited by the present invention, and the splitting, merging and adding of the modules are all within the scope of the present invention. The address element sorting device 200 provided by the present invention may be implemented by software, hardware, firmware, plug-in and any combination thereof, which is not limited to the present invention.
In an exemplary embodiment of the invention, a computer readable storage medium is also provided, on which a computer program is stored, which program, when being executed by, for example, a processor, can implement the steps of the address element ordering method described in any of the above embodiments. In some possible embodiments, the aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the address element ordering method section of this specification, when said program product is run on the terminal device.
Referring to fig. 8, a program product 700 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partially on the tenant device, as a stand-alone software package, partially on the tenant computing device, partially on a remote computing device, or entirely on a remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected through the internet using an internet service provider).
In an exemplary embodiment of the invention, an electronic device is also provided, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the address element ordering method of any of the embodiments described above via execution of the executable instructions.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 500 shown in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 9, the electronic device 500 is embodied in the form of a general purpose computing device. The components of electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 connecting the different system components (including the memory unit 520 and the processing unit 510), a display unit 540, etc.
Wherein the storage unit stores program code that is executable by the processing unit 510 such that the processing unit 510 performs the steps according to various exemplary embodiments of the present invention described in the address element ordering method section of the present specification. For example, the processing unit 510 may perform the steps shown in any one or more of fig. 1-6.
The memory unit 520 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 5201 and/or cache memory unit 5202, and may further include Read Only Memory (ROM) 5203.
The storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a tenant to interact with the electronic device 500, and/or any device (e.g., router, modem, etc.) that enables the electronic device 500 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 550. Also, electronic device 500 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 560. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 500, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiment of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the address element sorting method according to the embodiment of the present invention.
Compared with the prior art, the invention has the advantages that:
According to the invention, a plurality of element fields are extracted from the address information to be processed, and the element fields of the address information to be processed are ordered from large to small based on the reverse file frequency of the element fields. Therefore, the importance sorting of the address elements is realized, so that important new sorting and relevant parameters are conveniently provided for the subsequent address information processing algorithm, and the address information processing algorithm is optimized, thereby improving the logistics experience of users.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (8)

1. A method of ordering address elements, comprising:
Acquiring address information to be processed;
Extracting a plurality of element fields from the address information to be processed;
Calculating the reverse file frequency of each element field;
sorting the element fields of the address information to be processed from large to small based on the reverse file frequency;
The address element ordering method further comprises the following steps:
acquiring an address information base, and taking all address information in the address information base as the address information to be processed so as to obtain the ordering of element fields of all the address information;
The sequence of the element fields of each address information is formed into a field sequence which is associated with the address information and is stored in the address information base;
A plurality of class clusters obtained by clustering the element fields according to the first m element fields of the field sequence of the address information base, wherein each class cluster is associated with a sub-field sequence formed by m element fields, and m is an integer greater than or equal to 1;
dividing an address area according to the plurality of class clusters;
the clustering of the element fields according to the first m element fields of the field sequence of the address information base includes:
dividing address information with the same field of the first m element fields of the field sequence of each address information of the address information library into quasi-clusters, and associating sub-field sequences formed by the first m element fields;
and executing a merging step for each sub-field sequence associated with the quasi class cluster:
Judging whether the sub-field sequence associated with the quasi-cluster exists in the field sequence associated with another quasi-cluster or not;
if yes, merging the quasi-cluster with another quasi-cluster to form a cluster;
updating the sub-field sequences associated with the class clusters according to the reverse file frequency of the sub-field sequences of the two quasi class clusters;
if not, taking the quasi class cluster as a class cluster;
the dividing the address area according to the plurality of class clusters includes:
acquiring a receiving address and/or a sending address of a historical logistics order in a preset time period;
Counting the historical logistics order quantity of the receiving address and/or the sending address belonging to each class cluster;
for each of the class clusters:
Judging whether the number of the historical logistics orders of which the receiving address and/or the sending address belong to the class cluster is larger than a preset number threshold value or not;
if yes, the cluster is used as an address area;
If not, taking the class cluster as a candidate class cluster;
for each candidate class cluster, combining a plurality of candidate class clusters into an address area based on the semantic distance and/or the physical distance.
2. The method of address element ordering of claim 1, wherein said calculating a reverse file frequency for each of said element fields comprises:
for each of the element fields:
counting the number n of address information of the element field in an address information base;
The element field reverse file frequency F is calculated according to the following formula:
F=lg(N/n),
where N is the total number of address information in the address information base.
3. The address element ordering method of claim 2, wherein the address information base is constructed according to the steps of:
acquiring a historical logistics order;
acquiring a receiving address and a sending address of the historical logistics order;
merging the destination address and the sender address representing the same address information;
And constructing the address information base according to the combined destination address and the combined sender address.
4. The method of claim 1, wherein the address area is used to indicate pickup pens and/or dispatch pens for logistic packages.
5. The address element ordering method of claim 4, further comprising:
Taking the delivery address of the to-be-received logistics package or the delivery address of the to-be-dispatched logistics package as the to-be-processed address;
Acquiring a sub-field sequence of the address to be processed;
determining a class cluster to which a delivery address/a receiving address of each logistics package belongs according to the subfield sequence of the address to be processed so as to determine a pick-up address area or a dispatch address area of the logistics package;
and carrying out pickup/delivery on the logistics package according to the pickup address area or the delivery address area.
6. Address element ordering means, characterized by being adapted to perform the address element ordering method according to any of the claims 1 to 5; the address element ordering device comprises:
the acquisition module is configured to acquire address information to be processed;
The extraction module is configured to extract a plurality of element fields from the address information to be processed;
a calculation module configured to calculate a reverse file frequency for each of the element fields;
And the sorting module is configured to sort the element fields of the address information to be processed from large to small based on the reverse file frequency.
7. An electronic device, the electronic device comprising:
A processor;
a memory having stored thereon a computer program which, when executed by the processor, performs:
the address element ordering method of any one of claims 1 to 5.
8. A storage medium having a computer program stored thereon, the computer program when executed by a processor performing:
the address element ordering method of any one of claims 1 to 5.
CN202110126026.0A 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium Active CN112818684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126026.0A CN112818684B (en) 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126026.0A CN112818684B (en) 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112818684A CN112818684A (en) 2021-05-18
CN112818684B true CN112818684B (en) 2024-04-19

Family

ID=75860193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110126026.0A Active CN112818684B (en) 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112818684B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420074B (en) * 2021-06-30 2024-03-01 中国航空油料有限责任公司 Flight display information display method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1832845A1 (en) * 2006-03-09 2007-09-12 Klick Tel AG Navigation device comprising size dependent address element sorting
CN104200369A (en) * 2014-08-27 2014-12-10 北京京东尚科信息技术有限公司 Method and device for determining commodity delivery range
CN108460046A (en) * 2017-02-21 2018-08-28 菜鸟智能物流控股有限公司 Address aggregation method and equipment
CN109101474A (en) * 2017-06-20 2018-12-28 菜鸟智能物流控股有限公司 Address aggregation method, package aggregation method and equipment
CN109271462A (en) * 2018-11-23 2019-01-25 河北航天信息技术有限公司 A kind of taxpayer's tax registration registered address information cluster method based on K-means algorithm model
CN110019617A (en) * 2017-12-05 2019-07-16 腾讯科技(深圳)有限公司 The determination method and apparatus of address mark, storage medium, electronic device
CN111159974A (en) * 2019-12-30 2020-05-15 北京明略软件系统有限公司 Address information standardization method and device, storage medium and electronic equipment
CN111382922A (en) * 2018-12-29 2020-07-07 顺丰科技有限公司 Information acquisition task allocation method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10348745B2 (en) * 2017-01-05 2019-07-09 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1832845A1 (en) * 2006-03-09 2007-09-12 Klick Tel AG Navigation device comprising size dependent address element sorting
CN104200369A (en) * 2014-08-27 2014-12-10 北京京东尚科信息技术有限公司 Method and device for determining commodity delivery range
CN108460046A (en) * 2017-02-21 2018-08-28 菜鸟智能物流控股有限公司 Address aggregation method and equipment
CN109101474A (en) * 2017-06-20 2018-12-28 菜鸟智能物流控股有限公司 Address aggregation method, package aggregation method and equipment
CN110019617A (en) * 2017-12-05 2019-07-16 腾讯科技(深圳)有限公司 The determination method and apparatus of address mark, storage medium, electronic device
CN109271462A (en) * 2018-11-23 2019-01-25 河北航天信息技术有限公司 A kind of taxpayer's tax registration registered address information cluster method based on K-means algorithm model
CN111382922A (en) * 2018-12-29 2020-07-07 顺丰科技有限公司 Information acquisition task allocation method and device
CN111159974A (en) * 2019-12-30 2020-05-15 北京明略软件系统有限公司 Address information standardization method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于文本挖掘的地址分类研究及应用;马雄飞;中国优秀硕博士学位论文全文数据库(硕士)信息科技辑;20200115(第1期);I138-2670 *

Also Published As

Publication number Publication date
CN112818684A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
WO2022141861A1 (en) Emotion classification method and apparatus, electronic device, and storage medium
CN113157927B (en) Text classification method, apparatus, electronic device and readable storage medium
CN112818685A (en) Address matching method and device, electronic equipment and storage medium
CN112328909B (en) Information recommendation method and device, computer equipment and medium
CN111967808B (en) Method, device, electronic equipment and storage medium for determining commodity circulation object receiving mode
CN109034199B (en) Data processing method and device, storage medium and electronic equipment
CN111553556A (en) Business data analysis method and device, computer equipment and storage medium
CN113360768A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN113393306A (en) Product recommendation method and device, electronic equipment and computer readable medium
CN112818684B (en) Address element ordering method and device, electronic equipment and storage medium
CN107506399B (en) Method, system, device and storage medium for fast segmentation of data unit
CN107729944B (en) Identification method and device of popular pictures, server and storage medium
CN111368189B (en) Goods source sorting recommendation method and device, electronic equipment and storage medium
CN112288362A (en) Parcel re-delivery method, parcel delivery method and related equipment
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
US20220188292A1 (en) Data processing method, apparatus, electronic device and readable storage medium
CN112016321B (en) Method, electronic device and storage medium for mail processing
CN115017385A (en) Article searching method, device, equipment and storage medium
CN114282121A (en) Service node recommendation method, system, device and storage medium
CN114297235A (en) Risk address identification method and system and electronic equipment
CN113672703A (en) User information updating method, device, equipment and storage medium
CN111125272B (en) Regional characteristic acquisition method, regional characteristic acquisition device, computer equipment and medium
CN113076402A (en) Comment data analysis method and system, electronic device and storage medium
CN112488199A (en) Logistics distribution mode prediction method, system, equipment and storage medium
CN112132500A (en) Method for constructing dispatch fence, method for identifying super dispatch list and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant