CN112818684A - Address element sorting method and device, electronic equipment and storage medium - Google Patents

Address element sorting method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112818684A
CN112818684A CN202110126026.0A CN202110126026A CN112818684A CN 112818684 A CN112818684 A CN 112818684A CN 202110126026 A CN202110126026 A CN 202110126026A CN 112818684 A CN112818684 A CN 112818684A
Authority
CN
China
Prior art keywords
address
address information
cluster
processed
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110126026.0A
Other languages
Chinese (zh)
Other versions
CN112818684B (en
Inventor
周筠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xunmeng Information Technology Co Ltd
Original Assignee
Shanghai Xunmeng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xunmeng Information Technology Co Ltd filed Critical Shanghai Xunmeng Information Technology Co Ltd
Priority to CN202110126026.0A priority Critical patent/CN112818684B/en
Publication of CN112818684A publication Critical patent/CN112818684A/en
Application granted granted Critical
Publication of CN112818684B publication Critical patent/CN112818684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/387Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Library & Information Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Remote Sensing (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an address element sorting method, an address element sorting device, electronic equipment and a storage medium, wherein the address element sorting method comprises the following steps: acquiring address information to be processed; extracting a plurality of element fields from the address information to be processed; calculating the reverse file frequency of each element field; and sequencing all element fields of the address information to be processed from big to small based on the reverse file frequency. The invention realizes the importance sequencing of the address elements so as to provide important new sequencing and related parameters for a subsequent address information processing algorithm and optimize the address information processing algorithm, thereby improving the logistics experience of users.

Description

Address element sorting method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer applications, and in particular, to an address element sorting method and apparatus, an electronic device, and a storage medium.
Background
Currently, in a logistics scenario, for a given receiving and dispatching address of a user, the receiving and dispatching address needs to be identified and processed, such as calculating a logistics route, allocating a receiver, allocating a dispatcher, and the like. In algorithms for processing various addresses, the importance of address elements is an important component of the algorithms, and the importance of the address elements can be scored, so that the method is also the basis of various algorithms.
Therefore, how to realize the importance sorting of the address elements so as to provide important new sorting and related parameters for a subsequent address information processing algorithm and optimize the address information processing algorithm, thereby improving the logistics experience of users is a technical problem to be solved urgently in the field.
Disclosure of Invention
In order to overcome the defects of the related technologies, the invention provides an address element sorting method, an address element sorting device, electronic equipment and a storage medium, so that the importance sorting of address elements is realized, important new sorting and related parameters are provided for a subsequent address information processing algorithm, the address information processing algorithm is optimized, and the logistics experience of a user is improved.
According to an aspect of the present invention, there is provided an address element sorting method, including:
acquiring address information to be processed;
extracting a plurality of element fields from the address information to be processed;
calculating the reverse file frequency of each element field;
and sequencing all element fields of the address information to be processed from big to small based on the reverse file frequency.
In some embodiments of the invention, the calculating the inverse file frequency of each element field comprises:
for each of the element fields:
counting the number n of address information of the element field in an address information base;
the element field reverse file frequency F is calculated according to the following formula:
F=lg(N/n),
and N is the total number of the address information in the address information base.
In some embodiments of the invention, the address information base is constructed according to the following steps:
acquiring a historical logistics order;
acquiring an addressee and an addressee of the historical logistics order;
merging each receiving address and each sending address which represent the same address information;
and constructing the address information base according to the combined receiving address and the sending address.
In some embodiments of the present invention, building the address information base further includes:
taking each address information in the address information base as the address information to be processed to obtain the sequencing of element fields of each address information;
sorting element fields of each address information to form a field sequence which is associated with the address information and stored in the address information base;
clustering the element fields according to the first m element fields of the field sequences of the address information base to obtain a plurality of class clusters, wherein each class cluster is associated with a sub-field sequence formed by the m element fields, and m is an integer greater than or equal to 1;
and dividing the address area according to the plurality of class clusters.
In some embodiments of the invention, the address area is used to indicate a pick-up fence and/or a dispatch fence for a logistics package.
In some embodiments of the invention, further comprising:
taking the delivery address of the logistics package to be picked up or the receiving address of the logistics package to be dispatched as the address to be processed;
obtaining a subfield sequence of the address to be processed;
determining a class cluster to which a delivery address/a receiving address of each logistics package belongs according to the subfield sequence of the address to be processed so as to determine a pickup address area or a delivery address area of the logistics package;
and carrying out pulling/sending on the logistics package according to the pulling address area or sending address area.
In some embodiments of the present invention, the clustering the element fields according to the first m element fields of the field sequence of the address information base includes:
dividing the address information with the same first m element fields of the field sequence of each address information of the address information base into a quasi-cluster, and associating the sub-field sequences formed by the first m element fields;
for each sub-field sequence associated with the quasi-cluster, performing a merging step:
judging whether the subfield sequence associated with the quasi-class cluster exists in the field sequence associated with another quasi-class cluster;
if so, merging the quasi cluster and another quasi cluster to form a quasi cluster;
updating the sub-field sequences associated with the class clusters according to the reverse file frequency of the sub-field sequences of the two quasi-class clusters;
if not, the quasi-cluster is taken as a cluster.
In some embodiments of the present invention, said dividing the address area according to the plurality of class clusters comprises:
acquiring a receiving address and/or a sending address of a historical logistics order in a preset time period;
counting the historical logistics order quantity of the receiving address and/or the sending address belonging to each cluster;
for each of the clusters:
judging whether the historical logistics order quantity of the receiving address and/or the sending address belonging to the cluster is larger than a preset quantity threshold value or not;
if yes, the cluster is used as an address area;
if not, the cluster is taken as a candidate cluster;
and for each candidate class cluster, merging a plurality of candidate class clusters into one address area based on the semantic distance and/or the physical distance.
According to another aspect of the present invention, there is also provided an address element sorting apparatus, including:
the acquisition module is configured to acquire address information to be processed;
an extraction module configured to extract a plurality of element fields from the address information to be processed;
a calculation module configured to calculate a reverse file frequency for each of the element fields;
and the sorting module is configured to sort the element fields of the address information to be processed from big to small based on the reverse file frequency.
According to still another aspect of the present invention, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps as described above.
According to yet another aspect of the present invention, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps as described above.
Compared with the prior art, the invention has the advantages that:
according to the method, a plurality of element fields are extracted from the address information to be processed, and the element fields of the address information to be processed are sorted from big to small based on the reverse file frequency of the element fields. Therefore, the importance ordering of the address elements is realized, so that important new ordering and related parameters are provided for a subsequent address information processing algorithm, the address information processing algorithm is optimized, and the logistics experience of a user is improved.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 shows a flow diagram of an address element ordering method according to an embodiment of the invention.
Fig. 2 shows a flow chart for building an address information base according to an embodiment of the invention.
Fig. 3 illustrates a flowchart of dividing address regions according to a constructed address information base according to an embodiment of the present invention.
Fig. 4 shows a flow chart of region matching according to address information for package/dispatch according to an embodiment of the present invention.
Fig. 5 shows a flowchart for clustering the element fields according to the first m element fields of the field sequence of the address information base according to the embodiment of the present invention.
Fig. 6 is a flowchart illustrating address area division according to the plurality of class clusters according to an embodiment of the present invention.
Fig. 7 is a block diagram illustrating an address element sorting apparatus according to an embodiment of the present invention.
Fig. 8 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the invention.
Fig. 9 schematically illustrates an electronic device in an exemplary embodiment of the invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
In each embodiment of the present invention, the address element sorting method provided by the present invention may be applied to a logistics platform, an e-commerce platform, or any platform that needs to use address information by a third party, but the application scenario of the present invention is not limited thereto, and is not described herein again.
FIG. 1 shows a flow diagram of an address element ordering method according to an embodiment of the invention. The address element sorting method comprises the following steps:
step S110: and acquiring address information to be processed.
Specifically, the address information to be processed may be address information input by the user, historical address information already input by the user, or address information extracted according to the user identifier/order identifier in the system, which is not limited in the present invention. In different application scenarios, different acquisition modes of the address information to be processed are within the protection scope of the present invention.
Step S120: and extracting a plurality of element fields from the address information to be processed.
Specifically, step S120 may obtain only the element field, or may obtain the element field and its address label (e.g., province, city, district, etc.).
Specifically, in step S120, a word segmentation algorithm, a labeling algorithm, and the like may be adopted to extract a plurality of element fields from the address information to be processed. The word segmentation algorithm and the labeling algorithm can be word segmentation based on a word segmentation algorithm of character string matching. Based on character string matching, namely scanning character strings, if the substrings of the character strings are found to be the same as the words in the address standard word bank, the matching is carried out. Such word segmentation will usually incorporate some heuristic rules such as "forward/backward maximum match", "long word first", etc. The word segmentation algorithm and the labeling algorithm can also be based on the word segmentation method of statistics and machine learning to perform word segmentation and labeling. The word segmentation method based on statistics and machine learning models Chinese based on the parts of speech and statistical characteristics of artificial labels, namely training model parameters according to observed data (labeled corpora). Those skilled in the art can implement more different word segmentation algorithms and labeling algorithms, which are not described herein.
Further, the present invention can set the number of element fields required to be acquired as needed. The invention can set element fields (the setting address labels can be interest points, road names, town names, street names, etc.) of corresponding setting address labels which need to be acquired according to requirements. The invention is not so limited.
Step S130: and calculating the reverse file frequency of each element field.
Specifically, the inverse file frequency of each element field may be calculated as follows:
specifically, an Inverse Document Frequency (IDF) or inverse document frequency is an index for measuring the importance of words.
In N documents, the weight of a word is inversely proportional to the frequency (DF) of the number documents in which the word appears in the N documents, and thus, the formula for calculating the inverse document frequency may be defined as F ═ lg (N/N) by a logarithmic function. Thus, in step S130, for each element field, the number n of address information of the element field existing in an address information base is counted. The element field reverse file frequency F is calculated according to the following formula: f ═ lg (N/N), where N is the total number of address information in the address information base.
Step S140: and sequencing all element fields of the address information to be processed from big to small based on the reverse file frequency.
Specifically, step S140 is based on the inverse file frequency as the importance weight of each element field in the address information, and thus, each element field can be sorted from large to small based on the importance weight. In other words, in the present embodiment, the lower the probability of the element field appearing in the address information, the higher the importance thereof, and the more important the element field can be as important information of the address information to distinguish other address information.
In the address element sorting method provided by the invention, a plurality of element fields are extracted from the address information to be processed, and the element fields of the address information to be processed are sorted from big to small based on the reverse file frequency of the element fields. Therefore, the importance ordering of the address elements is realized, so that important new ordering and related parameters are provided for a subsequent address information processing algorithm, the address information processing algorithm is optimized, and the logistics experience of a user is improved.
Referring now to fig. 2, fig. 2 illustrates a flow diagram for building an address information base, according to an embodiment of the invention. Fig. 2 shows the following steps together:
step S101: and acquiring a historical logistics order.
In particular, when applied to a logistics platform, historical logistics orders can refer to logistics orders generated by the logistics platform. When the method is applied to an e-commerce scene, the historical logistics order can refer to a logistics order generated by an e-commerce platform and associated with a shopping order or a logistics order generated by a logistics platform and acquired after the e-commerce platform interacts with the logistics platform. Therefore, the historical logistics orders can be obtained from the server of the logistics platform or from the server of the commercial platform, the invention is not limited by the above, and different obtaining modes, different data sources and different meanings of the logistics orders under different scenes are within the protection scope of the invention.
Step S102: and acquiring the receiving address and the sending address of the historical logistics order.
Specifically, since both the destination and the delivery address are used in the address information processing, step S102 acquires the destination and the delivery address of each historical logistics order.
Step S103: and merging the receiving addresses and the sending addresses which represent the same address information.
Specifically, in a plurality of historical logistics orders, there may be a plurality of identical receiving addresses, a plurality of identical delivery addresses, and the same receiving address and delivery address. The present embodiment processes only the respective different addresses, and therefore, the same recipient address and delivery address acquired in step S102 are merged in step S103 to reduce the data amount of the address information for constructing the address information base.
Step S104: and constructing the address information base according to the combined receiving address and the sending address.
Specifically, the combined recipient address and the delivery address can be directly used as the address information in the address information base, so as to construct the address information base. The invention is not so limited.
Therefore, the address information base is constructed through the address information of the historical logistics order through the steps S101 to S104, so as to be suitable for the address information to be processed acquired in the logistics scene.
Referring now to fig. 3, fig. 3 illustrates a flow diagram for partitioning address regions according to a constructed address information base, according to an embodiment of the present invention. In particular, fig. 3 shows the steps after the construction of the address information base:
step S105: and taking each address information in the address information base as the address information to be processed to obtain the sequencing of the element fields of each address information.
Specifically, step S105 may perform steps S110 to S140 shown in fig. 1 for each address information in the address information base to obtain the ordering of the element fields of each address information.
Step S106: and sequencing the element fields of each address information to form a field sequence which is associated with the address information and is stored in the address information base.
Step S107: and clustering the element fields according to the first m element fields of the field sequences of the address information base to obtain a plurality of class clusters, wherein each class cluster is associated with a sub-field sequence formed by the m element fields, and m is an integer greater than or equal to 1.
Specifically, m can be set as needed, and when m is smaller, the number of class clusters is smaller, and each class cluster contains more address information; when m is larger, the number of class clusters is larger, and each class cluster contains less address information.
Step S108: and dividing the address area according to the plurality of class clusters.
Thus, the element fields of the address information in the address information base can be sorted by multiplexing the address element sorting method in steps S105 to S108. Meanwhile, the sorting is carried out according to the importance of the element fields, so that the address information can be distinguished more in the front sorting. Based on the sorting mode, the address information in the address information base is clustered according to the first m fields to obtain a plurality of clusters (the plurality of address information in the same cluster has the same first m fields with higher importance to be distinguished from other address information, and the address information of the same first m fields with higher importance is usually located on the same interest point, cell and road section), thereby realizing the division of the address areas. The division of the address area can be used up to streaming packages, dispatches, transports, etc. Further, the size of the obtained address area can be adjusted by adjusting the size of m. In general, the smaller m, the smaller the obtained address area; the larger m, the larger the obtained address area.
In some embodiments of the present invention, the address area obtained by the division of step S108 is used to indicate a pull-in fence and/or a dispatch fence of the logistics package. In other words, the address areas of the same cable fence can be cable by the same cable net spot/cable collector; the address areas of the same dispatch fence can be dispatched by the same dispatch site/dispatcher.
Referring now to fig. 4, fig. 4 is a flow chart illustrating region matching based on address information for picking up/dispatching according to an embodiment of the present invention. Fig. 4 shows the following steps in total:
step S150: and taking the delivery address of the logistics package to be picked up or the receiving address of the logistics package to be dispatched as the address to be processed.
Step S160: and acquiring the subfield sequence of the address to be processed.
Specifically, step S160 is equivalent to performing element field sorting of steps S110 to S140 on the delivery address of the to-be-picked-up logistics package or the receiving address of the to-be-sent logistics package, and obtaining the subfield sequence according to the sorting result. The subfield sequence may comprise only m fields, the invention is not limited thereto, and other numbers of fields are within the scope of the invention.
Step S170: and determining the class cluster to which the delivery address/the receiving address of each logistics package belongs according to the subfield sequence of the address to be processed so as to determine a pull address area or a dispatch address area of the logistics packages.
Specifically, each cluster is obtained by clustering the first m fields of the field sequence of the address information, so that each cluster can be associated with the first m fields of the address information under the cluster. Therefore, the first m fields of the sub-field sequence obtained in step S160 may be matched with the m fields associated with the class cluster, so as to determine the class cluster to which the sub-field sequence belongs, and thus determine in which class cluster the delivery address/receiving address of the logistics package corresponding to the sub-field sequence is located in.
Thus, step S170 may determine the destination address area based on the transmission address information. Step S170 may also be based on the dispatch address area determined by the recipient address information.
Step S180: and carrying out pulling/sending on the logistics package according to the pulling address area or sending address area.
Specifically, step S180 may determine the article pull website/article pull member according to the article pull address area determined based on the article transmission address information in step S170. Step S180 may also determine a dispatch website/dispatcher according to the dispatch address area determined based on the recipient address information in step S170. In a further embodiment of the invention, route planning of package/dispatch can be performed based on package address area/dispatch address area.
Therefore, when the package collecting logistics package appears, the sorting field of the receiving address/the sending address of the package can be obtained according to the sorting method, so that the subfield sequence is obtained, the cluster matching is carried out, the package collecting/sending area is determined, and the package collecting/sending guidance is facilitated.
Referring now to fig. 5, fig. 5 is a flow diagram illustrating clustering of element fields according to the first m element fields of a field sequence of the address information base according to an embodiment of the present invention. Fig. 5 shows the following steps in total:
step S1071: and dividing the address information with the same first m element fields of the field sequence of the address information base into a quasi-cluster, and associating the sub-field sequences formed by the first m element fields.
Specifically, considering that the number of address information in a cluster can not be accurately controlled only based on the clustering mode of the sub-field sequences formed by the first m element fields, and the clustering result is relatively accurately controlled. Therefore, in this embodiment, a class cluster obtained based on a sub-field sequence cluster formed by the first m element fields is first used as a quasi-class cluster to be subjected to the subsequent further processing steps.
Step S1072: and executing a merging step for each sub-field sequence associated with the quasi-class cluster.
The merging step comprises: it is determined whether the sub-field sequence associated with the quasi-cluster is present in a field sequence associated with another quasi-cluster. If the sub-field sequence associated with the quasi-cluster exists in the field sequence associated with another quasi-cluster, merging the quasi-cluster and the other quasi-cluster to form a class cluster, and updating the sub-field sequence associated with the class cluster according to the reverse file frequency of the sub-field sequences of the two quasi-clusters. And if the sub-field sequence associated with the quasi-class cluster does not exist in the field sequence associated with another quasi-class cluster, the quasi-class cluster is taken as a class cluster.
Specifically, in step S1072, it is considered that partial address information is incomplete, resulting in sub-field sequences of partial clusters being in subsequent field sequences of another cluster, so that the two clusters can be merged to form a new cluster, thereby avoiding the situation of region division error. Further, when merging the quasi-cluster with another quasi-cluster to form a cluster, reordering (merging duplicate fields) may be performed according to the inverse file frequency of each field in the sub-field sequences of the two quasi-clusters, so that the sub-field sequences (still m top-ranked fields) may be obtained based on the reordering, whereby the storage requirements of the sub-field sequences associated with the cluster may be reduced. In other embodiments, the merged cluster may be associated with sub-field sequences of multiple quasi-clusters before merging, thereby improving the matching accuracy of subsequent sub-field sequences.
Further, step S1072 may be performed iteratively. For example, when the quasi-cluster is merged with another quasi-cluster to form a new quasi-cluster, step S1072 is performed again until there is no sub-field sequence associated with the quasi-cluster in each quasi-cluster. Thus, complete merging of class clusters is achieved.
Referring now to fig. 6, fig. 6 is a flow chart illustrating address area division according to the plurality of class clusters according to an embodiment of the present invention. Fig. 6 shows the following steps:
step S1081: and acquiring the receiving address and/or the sending address of the historical logistics order in a preset time period.
Step S1082: and counting the historical logistics order quantity of the receiving address and/or the sending address belonging to each cluster.
Step S1083: and judging whether the historical logistics order quantity of the cluster to which the receiving address and/or the sending address belong is larger than a preset quantity threshold value or not for each cluster.
If the historical logistics order quantity of the addressee and/or the addressee belonging to the cluster is larger than the preset quantity threshold value, executing the step S1084: the cluster is used as an address area.
If the historical logistics order quantity of the addressee and/or the addressee belonging to the cluster is not larger than the preset quantity threshold value, executing the step S1085: taking the class cluster as a candidate class cluster;
step S1086: and for each candidate class cluster, merging a plurality of candidate class clusters into one address area based on the semantic distance and/or the physical distance.
Specifically, the steps S1081 to S1085 determine whether each cluster can be used as an address area alone or needs to be combined as an address area based on the historical logistics order quantity statistics. Thus, whether or not the address area divided according to the class cluster is reasonable can be determined based on the amount of historical logistics orders belonging to the class cluster.
In a specific implementation, when the number of historical logistics orders of which the recipient addresses and/or the sender addresses belong to the cluster is larger than a predetermined number threshold, it indicates that the address area of the cluster is divided reasonably, so that the cluster can be used as an address area. Specifically, the predetermined number threshold may be set as needed, for example, the predetermined number threshold may be set according to a minimum daily average package quantity (or a minimum package quantity within a set time period) of package collecting points/package collecting members of the package collecting fence represented by the address area, so as to ensure that the historical logistics order quantity of the address area can meet the workload demand of the package collecting points/package collecting members of the package collecting fence. For another example, the minimum daily average dispatch amount (or the minimum dispatch amount in a set time period) of the dispatch point/dispatch member receiving fence represented by the address area may be set, so as to ensure that the historical logistics order quantity of the address area can meet the workload demand of the dispatch point/dispatch member of the dispatch fence.
And when the historical logistics order quantity of the receiving address and/or the sending address belonging to the cluster is not larger than the preset quantity threshold value, the historical logistics order quantity of the cluster cannot meet the workload requirements of a package pulling fence/a package sending fence, so that the cluster combination is needed to determine a proper address area. In the above embodiment, the appropriate address region may be determined by merging a plurality of candidate class clusters into one address region based on the semantic distance and/or the physical distance. Specifically, after merging into an address area, the determination in step S1083 may be performed again on the address area to determine whether the address area is appropriate, and if not, merging needs to be performed again. Further, in the present embodiment, since the sub-field sequences have been sorted based on the importance of the element field, generally, the closer the semantic distance, the closer the actual distance of the two address information is. So that step S1086 may perform a merging of clusters of classes based on the semantic distance and/or the physical distance of the sub-field sequences of the clusters of classes. The present invention can also implement more variation ways of merging clusters, which are not described herein.
The above are merely a plurality of specific implementations of the address element sorting method of the present invention, and each implementation may be implemented independently or in combination, and the present invention is not limited thereto. Furthermore, the flow charts of the present invention are merely schematic, the execution sequence between the steps is not limited thereto, and the steps can be split, combined, exchanged sequentially, or executed synchronously or asynchronously in other ways within the protection scope of the present invention.
Referring now to fig. 7, fig. 7 is a block diagram illustrating an address element sorting apparatus according to an embodiment of the present invention. The address element sorting apparatus 200 includes an obtaining module 210, a extracting module 220, a calculating module 230, and a sorting module 240.
The obtaining module 210 is configured to obtain address information to be processed;
the extraction module 220 is configured to extract a plurality of element fields from the address information to be processed;
the calculation module 230 is configured to calculate a reverse file frequency for each of the element fields;
the sorting module 240 is configured to sort element fields of the address information to be processed from large to small based on the reverse file frequency.
In the address element sorting apparatus according to the exemplary embodiment of the present invention, a plurality of element fields are extracted from the address information to be processed, and each element field of the address information to be processed is sorted from large to small based on a reverse file frequency of each element field. Therefore, the importance ordering of the address elements is realized, so that important new ordering and related parameters are provided for a subsequent address information processing algorithm, the address information processing algorithm is optimized, and the logistics experience of a user is improved.
Fig. 7 is a schematic diagram illustrating the address element sorting apparatus 200 provided by the present invention, respectively, and the splitting, merging and adding of modules are within the protection scope of the present invention without departing from the concept of the present invention. The address element sorting apparatus 200 provided by the present invention can be implemented by software, hardware, firmware, plug-in and any combination thereof, which is not limited by the present invention.
In an exemplary embodiment of the present invention, a computer-readable storage medium is also provided, on which a computer program is stored, which when executed by, for example, a processor, may implement the steps of the address element sorting method described in any one of the above embodiments. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the present invention described in the address element sorting method section above in this description, when said program product is run on the terminal device.
Referring to fig. 8, a program product 700 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the invention, there is also provided an electronic device that may include a processor and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the address element ordering method of any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 500 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 that couples various system components including the memory unit 520 and the processing unit 510, a display unit 540, and the like.
Wherein the storage unit stores program code executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention described in the address element sorting method section above in this specification. For example, the processing unit 510 may perform the steps as shown in any one or more of fig. 1-6.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
The memory unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 560. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, or a network device, etc.) to execute the above address element sorting method according to the embodiment of the present invention.
Compared with the prior art, the invention has the advantages that:
according to the method, a plurality of element fields are extracted from the address information to be processed, and the element fields of the address information to be processed are sorted from big to small based on the reverse file frequency of the element fields. Therefore, the importance ordering of the address elements is realized, so that important new ordering and related parameters are provided for a subsequent address information processing algorithm, the address information processing algorithm is optimized, and the logistics experience of a user is improved.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims (11)

1. An address element sorting method, comprising:
acquiring address information to be processed;
extracting a plurality of element fields from the address information to be processed;
calculating the reverse file frequency of each element field;
and sequencing all element fields of the address information to be processed from big to small based on the reverse file frequency.
2. The address element ordering method according to claim 1, wherein said calculating a reverse file frequency for each of said element fields comprises:
for each of the element fields:
counting the number n of address information of the element field in an address information base;
the element field reverse file frequency F is calculated according to the following formula:
F=lg(N/n),
and N is the total number of the address information in the address information base.
3. The address element ordering method according to claim 2, wherein the address information base is constructed according to the following steps:
acquiring a historical logistics order;
acquiring an addressee and an addressee of the historical logistics order;
merging each receiving address and each sending address which represent the same address information;
and constructing the address information base according to the combined receiving address and the sending address.
4. The address element ordering method according to claim 3, wherein building the address information base further comprises:
taking each address information in the address information base as the address information to be processed to obtain the sequencing of element fields of each address information;
sorting element fields of each address information to form a field sequence which is associated with the address information and stored in the address information base;
clustering the element fields according to the first m element fields of the field sequences of the address information base to obtain a plurality of class clusters, wherein each class cluster is associated with a sub-field sequence formed by the m element fields, and m is an integer greater than or equal to 1;
and dividing the address area according to the plurality of class clusters.
5. The address element ordering method according to claim 4, wherein the address area is used to indicate a pull fence and/or a dispatch fence of a logistics parcel.
6. The address element ordering method according to claim 5, further comprising:
taking the delivery address of the logistics package to be picked up or the receiving address of the logistics package to be dispatched as the address to be processed;
obtaining a subfield sequence of the address to be processed;
determining a class cluster to which a delivery address/a receiving address of each logistics package belongs according to the subfield sequence of the address to be processed so as to determine a pickup address area or a delivery address area of the logistics package;
and carrying out pulling/sending on the logistics package according to the pulling address area or sending address area.
7. The address element ordering method according to claim 4, wherein said clustering said element fields according to the first m element fields of a field sequence of said address information base comprises:
dividing the address information with the same first m element fields of the field sequence of each address information of the address information base into a quasi-cluster, and associating the sub-field sequences formed by the first m element fields;
for each sub-field sequence associated with the quasi-cluster, performing a merging step:
judging whether the subfield sequence associated with the quasi-class cluster exists in the field sequence associated with another quasi-class cluster;
if so, merging the quasi cluster and another quasi cluster to form a quasi cluster;
updating the sub-field sequences associated with the class clusters according to the reverse file frequency of the sub-field sequences of the two quasi-class clusters;
if not, the quasi-cluster is taken as a cluster.
8. The address element ordering method according to claim 4, wherein said dividing the address area according to the plurality of class clusters comprises:
acquiring a receiving address and/or a sending address of a historical logistics order in a preset time period;
counting the historical logistics order quantity of the receiving address and/or the sending address belonging to each cluster;
for each of the clusters:
judging whether the historical logistics order quantity of the receiving address and/or the sending address belonging to the cluster is larger than a preset quantity threshold value or not;
if yes, the cluster is used as an address area;
if not, the cluster is taken as a candidate cluster;
and for each candidate class cluster, merging a plurality of candidate class clusters into one address area based on the semantic distance and/or the physical distance.
9. An address element sorting apparatus, comprising:
the acquisition module is configured to acquire address information to be processed;
an extraction module configured to extract a plurality of element fields from the address information to be processed;
a calculation module configured to calculate a reverse file frequency for each of the element fields;
and the sorting module is configured to sort the element fields of the address information to be processed from big to small based on the reverse file frequency.
10. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory having stored thereon a computer program that, when executed by the processor, performs:
a method of sorting address elements according to any one of claims 1 to 8.
11. A storage medium having a computer program stored thereon, the computer program when executed by a processor performing:
a method of sorting address elements according to any one of claims 1 to 8.
CN202110126026.0A 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium Active CN112818684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126026.0A CN112818684B (en) 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126026.0A CN112818684B (en) 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112818684A true CN112818684A (en) 2021-05-18
CN112818684B CN112818684B (en) 2024-04-19

Family

ID=75860193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110126026.0A Active CN112818684B (en) 2021-01-29 2021-01-29 Address element ordering method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112818684B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420074A (en) * 2021-06-30 2021-09-21 中国航空油料有限责任公司 Display method and device of flight display information, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1832845A1 (en) * 2006-03-09 2007-09-12 Klick Tel AG Navigation device comprising size dependent address element sorting
CN104200369A (en) * 2014-08-27 2014-12-10 北京京东尚科信息技术有限公司 Method and device for determining commodity delivery range
US20180191748A1 (en) * 2017-01-05 2018-07-05 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address
CN108460046A (en) * 2017-02-21 2018-08-28 菜鸟智能物流控股有限公司 Address aggregation method and equipment
CN109101474A (en) * 2017-06-20 2018-12-28 菜鸟智能物流控股有限公司 Address aggregation method, package aggregation method and equipment
CN109271462A (en) * 2018-11-23 2019-01-25 河北航天信息技术有限公司 A kind of taxpayer's tax registration registered address information cluster method based on K-means algorithm model
CN110019617A (en) * 2017-12-05 2019-07-16 腾讯科技(深圳)有限公司 The determination method and apparatus of address mark, storage medium, electronic device
CN111159974A (en) * 2019-12-30 2020-05-15 北京明略软件系统有限公司 Address information standardization method and device, storage medium and electronic equipment
CN111382922A (en) * 2018-12-29 2020-07-07 顺丰科技有限公司 Information acquisition task allocation method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1832845A1 (en) * 2006-03-09 2007-09-12 Klick Tel AG Navigation device comprising size dependent address element sorting
CN104200369A (en) * 2014-08-27 2014-12-10 北京京东尚科信息技术有限公司 Method and device for determining commodity delivery range
US20180191748A1 (en) * 2017-01-05 2018-07-05 Cisco Technology, Inc. Associating a user identifier detected from web traffic with a client address
CN108460046A (en) * 2017-02-21 2018-08-28 菜鸟智能物流控股有限公司 Address aggregation method and equipment
CN109101474A (en) * 2017-06-20 2018-12-28 菜鸟智能物流控股有限公司 Address aggregation method, package aggregation method and equipment
CN110019617A (en) * 2017-12-05 2019-07-16 腾讯科技(深圳)有限公司 The determination method and apparatus of address mark, storage medium, electronic device
CN109271462A (en) * 2018-11-23 2019-01-25 河北航天信息技术有限公司 A kind of taxpayer's tax registration registered address information cluster method based on K-means algorithm model
CN111382922A (en) * 2018-12-29 2020-07-07 顺丰科技有限公司 Information acquisition task allocation method and device
CN111159974A (en) * 2019-12-30 2020-05-15 北京明略软件系统有限公司 Address information standardization method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马雄飞: "基于文本挖掘的地址分类研究及应用", 中国优秀硕博士学位论文全文数据库(硕士)信息科技辑, no. 1, 15 January 2020 (2020-01-15), pages 138 - 2670 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420074A (en) * 2021-06-30 2021-09-21 中国航空油料有限责任公司 Display method and device of flight display information, electronic equipment and storage medium
CN113420074B (en) * 2021-06-30 2024-03-01 中国航空油料有限责任公司 Flight display information display method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112818684B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN108628830B (en) Semantic recognition method and device
CN110580308B (en) Information auditing method and device, electronic equipment and storage medium
CN103455545A (en) Location estimation of social network users
CN113157927B (en) Text classification method, apparatus, electronic device and readable storage medium
CN112328909B (en) Information recommendation method and device, computer equipment and medium
CN112818685A (en) Address matching method and device, electronic equipment and storage medium
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
CN114780746A (en) Knowledge graph-based document retrieval method and related equipment thereof
CN115130711A (en) Data processing method and device, computer and readable storage medium
CN113360768A (en) Product recommendation method, device and equipment based on user portrait and storage medium
CN112818684B (en) Address element ordering method and device, electronic equipment and storage medium
CN117290561B (en) Service state information feedback method, device, equipment and computer readable medium
CN107506399B (en) Method, system, device and storage medium for fast segmentation of data unit
CN111967808B (en) Method, device, electronic equipment and storage medium for determining commodity circulation object receiving mode
CN113590756A (en) Information sequence generation method and device, terminal equipment and computer readable medium
CN113139838A (en) Hotel service evaluation method, system, equipment and storage medium
CN112487120A (en) Method, device and equipment for classifying recipient addresses and storage medium
CN107506407A (en) A kind of document classification, the method and device called
CN112417996A (en) Information processing method and device for industrial drawing, electronic equipment and storage medium
CN115409553B (en) Advertisement putting system and method based on big data and position information
CN116402166A (en) Training method and device of prediction model, electronic equipment and storage medium
CN116485019A (en) Data processing method and device
CN114780712B (en) News thematic generation method and device based on quality evaluation
CN113591881B (en) Intention recognition method and device based on model fusion, electronic equipment and medium
CN114880600A (en) Method, device, electronic equipment and storage medium for displaying hotel information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant