CN111723165A - Address interest point determining method, device and system - Google Patents
Address interest point determining method, device and system Download PDFInfo
- Publication number
- CN111723165A CN111723165A CN201910205506.9A CN201910205506A CN111723165A CN 111723165 A CN111723165 A CN 111723165A CN 201910205506 A CN201910205506 A CN 201910205506A CN 111723165 A CN111723165 A CN 111723165A
- Authority
- CN
- China
- Prior art keywords
- interest
- point
- address data
- address
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000012549 training Methods 0.000 claims description 19
- 238000013145 classification model Methods 0.000 claims description 11
- 238000004140 cleaning Methods 0.000 claims description 10
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 10
- 238000012163 sequencing technique Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 12
- 238000003860 storage Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- 241001494479 Pecora Species 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The application discloses an address interest point prediction method, device and system. The address interest point prediction method comprises the following steps: acquiring first address data to be processed; determining at least one candidate point of interest corresponding to the first address data; generating a second address data set comprising at least one candidate point of interest based on the first address data and the at least one candidate point of interest; obtaining a language model score of at least one second address data through a language model as an address context correlation score of the candidate interest point, wherein the language model is obtained by learning from the address data set; and determining the interest point corresponding to the first address data from at least one candidate interest point according to the address context correlation score. By adopting the processing mode, the candidate interest points are subjected to scoring and sequencing by combining the address language model and the address context information, so that the interest points of the first address data are determined; therefore, the address interest point prediction accuracy can be effectively improved.
Description
Technical Field
The application relates to the technical field of natural language processing, in particular to a method, a device and a system for determining address interest points.
Background
With the increasing application of information systems, Point of Interest (POI) prediction of addresses has important significance and great business value. Taking the logistics distribution in the e-commerce field as an example, the lack of accurate POI information from the user's receiving address brings difficulties to the logistics distribution link, for example, the receiving address "information center of street after temple in city, province, hall and province, commission, exists on the address of street after temple, and the courier does not know whether the information center is hall or commission without prior knowledge, thereby seriously affecting the distribution efficiency, and in order to improve the distribution efficiency, the corresponding POI information needs to be determined according to the user's address, that is, the address interest point prediction needs to be performed.
Currently, a typical address interest point prediction method is a method for predicting a POI by using electronic map retrieval, and the processing procedure of the method is as follows. Information of "road" and "road number" is extracted from the user address, and the "road + road number" is searched on an electronic map service such as a high school, etc., and a candidate POI list is recalled, and for example, a plurality of POI information such as "aribbicstream park" can be recalled based on information that "road + road number" is "westerly road 969".
However, in the process of implementing the invention, the inventor finds that the technical scheme has at least the following problems: 1) the POI retrieval result is heavily redundant, for example, nearly 50 candidate POIs can be recalled on the high-grade map by 'wenyi west road 969', including redundant POIs such as 'No. 2 building of the aribby west stream park', 'aribby dongbao city', etc.; 2) the ordering quality of the POI retrieval result is poor, for example, the 'No. 2 building of the Alibaxi park' is arranged at the first place, but the most suitable POI is intuitively judged to be the 'Alibaxi park' from the activity degree of the POI; 3) the POI prediction cannot be performed according to the address context, such as "temple backdrop" and "washings" @ commission "can be recalled according to" way + street "in" information center "in" tetrakawa province ", but considering the context" information center ", the POI should be" washings "@ hall". In summary, the prior art has a problem of low address interest point prediction accuracy.
Disclosure of Invention
The application provides an address interest point prediction method, which aims to solve the problem of low address interest point prediction accuracy in the prior art. The present application additionally provides an address point of interest prediction apparatus.
The application provides an address interest point prediction method, which comprises the following steps:
acquiring first address data to be processed;
determining at least one candidate point of interest corresponding to the first address data;
generating a second address data set comprising at least one candidate point of interest based on the first address data and the at least one candidate point of interest;
obtaining a language model score of at least one second address data through a language model, and taking the language model score as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
and according to the address context correlation score, determining the interest point corresponding to the first address data from the at least one candidate interest point.
Optionally, the method further includes:
collecting third address data to form a third address data set;
identifying the interest points in the third address data through the interest point identification model;
learning the language model from the fourth address dataset identifying the point of interest.
Optionally, the third address data is collected via the internet.
Optionally, the method further includes:
and performing data cleaning on the acquired third address data according to the data cleaning rule.
Optionally, the determining at least one candidate point of interest corresponding to the first address data includes:
at least one candidate point of interest corresponding to the first address data is determined based on a first set of correspondence between address data of a first address granularity and the point of interest.
Optionally, the method further includes:
performing point of interest aggregation at the first address granularity according to a second set of correspondence between fourth address data and points of interest;
and generating the first corresponding relation set according to the aggregated interest points.
Optionally, the generating the first set of correspondence relationships according to the aggregated interest points includes:
obtaining the interest point category of the aggregated interest points through an interest point classification model;
and determining interest points corresponding to the interest point categories according to the occurrence times of the aggregated interest points, and taking the interest points as the interest points corresponding to the address data of the first address granularity.
Optionally, the method further includes:
and learning the interest point classification model from a training data set comprising the interest points and interest point categories.
Optionally, the interest point category in the training data set is determined by the following method:
and acquiring the interest point category of the interest point through the electronic map.
Optionally, the generating the first set of correspondence relationships according to the aggregated interest points includes:
obtaining synonym interest points in the aggregated interest points through an interest point synonym identification model;
and determining the reserved synonym interest points as the interest points corresponding to the address data of the first address granularity according to the occurrence frequency of the aggregated interest points.
Optionally, the method further includes:
and learning from a training data set comprising the first interest point, the second interest point and synonym labeling information to obtain the interest point and synonym recognition model.
The present application further provides an address interest point prediction system, including:
the client is used for sending an interest point prediction request aiming at the target address data to the server; receiving the interest points corresponding to the target address data returned by the server;
the server is used for receiving the interest point prediction request; determining at least one candidate point of interest corresponding to the target address data; generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest; obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set; determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score; and returning the interest points corresponding to the target address data to the client.
The application also provides an address interest point prediction method, which comprises the following steps:
receiving a point of interest prediction request for target address data;
determining at least one candidate point of interest corresponding to the target address data;
generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest;
obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score;
and returning the interest points corresponding to the target address data to the requester.
The present application further provides an address interest point prediction apparatus, including:
a first address data acquisition unit configured to acquire first address data to be processed;
a candidate interest point determining unit configured to determine at least one candidate interest point corresponding to the first address data;
a second address data set generating unit, configured to generate a second address data set including at least one candidate point of interest according to the first address data and the at least one candidate point of interest;
the candidate interest point scoring unit is used for acquiring a language model score of at least one second address data through a language model to serve as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
and the interest point determining unit is used for determining the interest point corresponding to the first address data from the at least one candidate interest point according to the address context correlation score.
The present application further provides an address interest point prediction apparatus, including:
a request receiving unit configured to receive a point of interest prediction request for target address data;
a candidate interest point determining unit for determining at least one candidate interest point corresponding to the target address data;
an address data set generating unit, configured to generate an address data set including at least one candidate point of interest according to the target address data and the at least one candidate point of interest;
the candidate interest point scoring unit is used for acquiring a language model score of at least one address data through a language model to serve as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
an interest point determining unit, configured to determine, according to the address context correlation score, an interest point corresponding to the target address data from the at least one candidate interest point;
and the interest point returning unit is used for returning the interest point corresponding to the target address data to the requester.
The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the various methods described above.
The present application also provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the various methods described above.
Compared with the prior art, the method has the following advantages:
according to the address interest point prediction method provided by the embodiment of the application, first address data to be processed is obtained; determining at least one candidate point of interest corresponding to the first address data; generating a second address data set comprising at least one candidate point of interest based on the first address data and the at least one candidate point of interest; obtaining a language model score of at least one second address data through a language model, and taking the language model score as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set; determining a point of interest corresponding to the first address data from the at least one candidate point of interest according to the address context correlation score; the processing mode enables the candidate interest points to be subjected to scoring and sequencing through the address language model and the address context information, so that the interest points of the first address data are determined; therefore, the address interest point prediction accuracy can be effectively improved.
Drawings
FIG. 1 is a flow chart of an embodiment of an address interest point prediction method provided by the present application;
FIG. 2 is a flowchart of a language model generation method for predicting an address interest point according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of an embodiment of an address interest point prediction method provided by the present application for generating a first set of correspondence relationships;
fig. 4 is a detailed flowchart of step S303 of an address interest point prediction method according to an embodiment of the present application;
fig. 5 is a flowchart illustrating a step S303 of an address interest point prediction method according to another embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an embodiment of an address interest point prediction apparatus provided in the present application;
FIG. 7 is a schematic diagram of an embodiment of an address point of interest prediction system provided herein;
FIG. 8 is a diagram illustrating an embodiment of an address interest point prediction method provided herein;
fig. 9 is a schematic diagram of an embodiment of an address interest point prediction apparatus provided in the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The application provides an address interest point prediction method and device and an address interest point prediction system. Each of the schemes is described in detail in the following examples.
The technical scheme provided by the application has the core technical idea that: and determining at least one candidate interest point corresponding to the first address data, and combining an address language model trained according to the address data set and address context information to score and sort the candidate interest points so as to determine the interest points of the first address data. The addresses comprising the candidate interest points are scored by the language model, and the sequence of the candidate interest points is optimized according to the score of the language model, so that the prediction accuracy of the address interest points can be effectively improved.
First embodiment
Please refer to fig. 1, which is a flowchart illustrating an embodiment of a method for predicting an address interest point according to the present application, wherein an execution body of the method includes an address interest point predicting apparatus. The address interest point prediction method provided by the application comprises the following steps:
step S101: first address data to be processed is acquired.
The first address data comprises address data for which point of interest prediction is required. The first address data may be address data of missing point of interest information, address data including erroneous point of interest information, or the like. For example, the address "information center of street post street of city, mitsung, city, province" is an address where the information of the point of interest is missing, and the address "information center of street post street of city, mitsung, city.
After the first address data is acquired, the next step may be performed to determine at least one candidate interest point corresponding to the first address data.
Step S103: at least one candidate point of interest corresponding to the first address data is determined.
The first address data may correspond to a plurality of candidate interest points, and to determine the plurality of candidate interest points corresponding to the first address data, various embodiments may be adopted, such as determining the candidate interest points by using an electronic map retrieval method, and the like.
In this embodiment, the candidate interest points are determined as follows: at least one candidate point of interest corresponding to the first address data is determined based on a first set of correspondence between address data of a first address granularity and the point of interest.
The first set of correspondence relationships, also referred to as a POI predictive search library, includes a plurality of first correspondence relationships. The first correspondence includes address data of a first address granularity and at least one point of interest.
The first address granularity includes a granularity of a road number level (road number level), and address data of the granularity may not include point of interest information and an accurate address of a house number level. For example, the address data of the first address granularity is "the West No. 969 West Hangzhou region in Hangzhou City of Zhejiang province", rather than "the West No. 969 second floor 201 room in Yugzhou Hangzhou city of Zhejiang province", where "the second floor 201 room" is an accurate address.
To recall the at least one candidate interest point corresponding to the first address data from the first correspondence set, the first address data may be first structurally analyzed by using Named Entity Recognition (NER) or the like, to obtain coarse-grained (first address granularity) address element information such as province, city, district, street, road, and road number, for example, the first address data is "hangzhou district wen-west road No. 969" in hangzhou city in zhe jiang province, and the analyzed address elements include "hangzhou road No. 969". Upon identifying address elements of a first address granularity corresponding to the first address data, at least one candidate point of interest corresponding to the first address data may be recalled from a POI predictive search library using the parsed address elements.
After determining at least one candidate interest point corresponding to the first address data, the next step may be entered, and at least one second address data including the candidate interest point is generated according to the first address data and the at least one candidate interest point.
Step S105: generating a second address data set comprising at least one candidate point of interest from the first address data and the at least one candidate point of interest.
According to the method provided by the embodiment of the application, all recalled candidate interest points are respectively spliced with the first address data to generate a plurality of second address data with the candidate interest points, and the second address data form a second address data set.
For example, if the first address data "zhanghou district text one west way 969" in hangzhou city, zhejiang, is determined by step S103 that the corresponding candidate points of interest include "ariiba xi stream district", "post office", and "science and technology city", the second address data includes: ' Zhejiang Hangzhou city Yunzhuo district wen xi lu 969 Ali baba xi yun ', ' Zhejiang Hangzhou city Yunzhou district wen xi lu 969 Xiaopost office ' and Zhejiang Hangzhou district wen xi lu 969 technical city '.
After generating a plurality of second address data, the next step may be entered, and a language model score of each second address data is determined by the language model as an address context correlation score of the candidate interest point.
Step S107: and obtaining a language model score of at least one second address data through a language model to serve as an address context correlation score of the candidate interest point.
The language model is an address language model obtained by learning from the address data set, and can score address fragments. The address data in the address data set may include the interest point information, an address language model may be learned from the address data set including the interest point information, the address fragments are scored through the language model, and the score is used as the score of the candidate interest point, so that the score of the candidate interest point is affected by the context address information of the candidate interest point, and is therefore referred to as the address context correlation score. The address context correlation score is more accurate than a score based on the number of occurrences of the candidate point of interest.
For example, "Wenyu Xilu No. 969 Alibaxi Xixi park in Hangzhou city, Zhejiang province" scored 0.91, namely: the "Wenyu district Wenyu No. 969 Xiaopost office in Hangzhou city, Zhejiang province" scored 0.73, and the "Wenyu district Wenyu No. 969 science and technology City in Hangzhou city, Zhejiang province" scored 0.54.
For another example, the reason why the address language model provides a higher score to the information center of the street of the four province, the information center of the four province, the information sample of the internet address of the information center of the subordinate unit of the street of the four province, the information sample being included in the address data set of the training language model is that the information center is the information center of the 3 th floor of the information center of the street of the four province, the ewe, the four province, the four family, the four province, the four city.
Please refer to fig. 2, which is a flowchart illustrating a language model generated according to an embodiment of an address interest point prediction method provided in the present application. In this embodiment, the language model may be generated by the following steps:
step S201: third address data is collected to form a third address data set.
The third address data refers to address data related to the training language model, and may be collected by various embodiments, such as purchasing a third address data set from a data company, and so on.
In this embodiment, the third address data is collected through an internet address mining method, such as collecting a receiving address of an e-commerce order and the like.
Because the collected original address data may have problems of data non-specification, duplication and the like, the method provided by the embodiment may further include the following steps: and performing data cleaning on the acquired third address data according to the data cleaning rule. In specific implementation, the data cleaning and normalization process may include the following steps: full angle and half angle conversion, traditional and simplified conversion, digital to Chinese conversion, punctuation processing, and the like.
Step S203: and identifying the interest points in the third address data through the interest point identification model.
The interest point identification model comprises an interest point identification model obtained by learning from a training data set marked with interest point information. The training data may include address and point of interest annotation information. For example, the training data is a correspondence record between the address "Renxinglu 11 number of Hai lake district of Beijing City" and the interest point "Central television station". Through the interest point identification model, whether the third address data includes the interest points or not and which interest points are included can be identified.
The interest point identification technology can be realized based on a named entity identification technology, and the named entity identification technology belongs to a mature technology, so the detailed description is omitted here.
Step S205: learning the language model from the fourth address dataset identifying the point of interest.
After identifying the point of interest in the third address data, a fourth address data set including the point of interest may be filtered from the third address data set. In this embodiment, millions of fourth address data with POIs are finally generated, such as "small post office of building 6 in west road 969 in yuebaba west stream park in hangzhou city, zhejiang", "information center 3 building in living room in sichuan province in sikken city, hou city blue sheep wang family corner street temple, backstreet, etc.
The language model can then be learned from the fourth address data set. The language model may be arranged to enter a fragment of an address, such as "Wen West Arribabaxi park", which determines the probability that this is an effective address and outputs a score of 0 to 1, such as "West Arribabaxi park" having a score of 0.67, with higher scores indicating that the address is more likely to be an effective address.
In particular, the address language model may be trained in conjunction with existing language model training tools, such as by deep learning language model training tools (e.g., RNN models) or using statistical language models. Since the language model generation technology belongs to a mature technology, it is not described herein again.
Please refer to fig. 3, which is a flowchart illustrating an embodiment of a method for predicting an address interest point to generate a first set of corresponding relationships according to the present application. In this embodiment, the first set of correspondence relationships may be generated by:
step S301: performing point of interest aggregation at the first address granularity according to a second set of correspondence between fourth address data and points of interest.
Step S303: and generating the first corresponding relation set according to the aggregated interest points.
Based on step S205, performing interest point data aggregation on the dimension of the first address granularity (e.g., "provincial + road number"), to obtain all interest point lists corresponding to the same address of "provincial + road number", for example, the interest point list corresponding to "shang xi lu 969" in hangzhou region of hangzhou, zhejiang province is "aribab xi park; a xi garden; a Taobao park; building II in Xixi xi Xixi zone of Ali Baba; taobao xi park; the city of Drepara Aribaba; a tenderer bank ", and the like. The first correspondence includes address data of a first address granularity and a corresponding point of interest list.
However, as can be seen from the above example, if the aggregated interest points are directly used as the interest points of the first corresponding relationship, many redundant interest points may be included in the interest point list, such as "a bar xi yu" and "a xi yu" are a pair of redundant interest points with the same meaning, and so on. In order to simplify the candidate POI list, the method provided by the embodiment of the present application may perform redundant POI elimination processing by the following two redundant POI removal methods, so that an accurate candidate POI index library may be constructed, thereby obtaining an accurate candidate POI.
Please refer to fig. 4, which is a flowchart illustrating a step S303 of an embodiment of an address interest point prediction method according to the present application. In this embodiment, step S303 may include the following sub-steps:
step S3031: and obtaining the interest point category of the aggregated interest points through an interest point classification model.
The points of interest may be classified by a point of interest classification model, which may be "industrial parks", "residential communities", and so on.
The interest point classification model can be obtained by learning from a training data set comprising interest points and interest point categories through a machine learning method. The interest point category in the training dataset may be determined as follows: the interest point category of the interest point is obtained through the electronic map, and if a query request for a certain interest point is submitted to a high-level map service, the high-level map returns the category to which the interest point belongs.
Step S3033: and determining interest points corresponding to the interest point categories according to the occurrence times of the aggregated interest points, and taking the interest points as the interest points corresponding to the address data of the first address granularity.
After the interest points in the interest point list are classified by using the interest point classification model, only one of the interest points with the highest frequency can be reserved for a plurality of interest points belonging to the same interest point category. For example, "a" aribaxi park "and" aribazawa "are both" industrial parks "by classification models, and since the" aribaxi park "is used more frequently than" aribawa ", the" aribazawa "is deleted and the" aribaxi park "is reserved.
The frequency of the points of interest may be derived from the POI data aggregation result of step S301, for example, for a fourth address data corresponding to 100 pieces of address data, the interest list of the fourth address data includes "arizona xi garden" and "arizona paul town", wherein 10 addresses in the 100 pieces of address data include "arizona xi garden", the number of occurrences of the points of interest is 10, and 5 addresses in the 100 pieces of address data include "dispak city", the number of occurrences of the points of interest is 5.
One method of removing redundant points of interest has been described thus far, and another method of removing redundant points of interest is described below.
Please refer to fig. 5, which is a flowchart illustrating a step S303 of an embodiment of an address interest point prediction method according to the present application. In this embodiment, step S303 may further include the following sub-steps:
step S3035: and obtaining synonym interest points in the aggregated interest points through an interest point synonym recognition model.
Whether the two interest points are synonyms or not can be identified through the interest point synonym identification model, and the identification result is yes or no.
The interest point synonym recognition model can be obtained by learning from a training data set comprising a first interest point, a second interest point and synonym labeling information through a machine learning method. The synonym labeling information can adopt a manual labeling mode, and can also automatically label whether the two interest points are synonyms or not through an algorithm.
Step S3037: and determining the reserved synonym interest points as the interest points corresponding to the address data of the first address granularity according to the occurrence frequency of the aggregated interest points.
And (3) carrying out synonym judgment on the POI by using a synonym identification model of the interest point, if two POIs in the candidate POI list are synonyms, for example, "Alibaxi park" and "xi park" are synonyms, the POI with high frequency can be reserved, and the POI of "xi park" is deleted.
The simplified POI prediction list is obtained by the two methods, for example, the simplified POI list corresponding to ' West Lu 969 in the Hangzhou region in Hangzhou city, Zhejiang is ' Alibaxi stream park '; a small post office; science and technology city ".
In specific implementation, the POI candidate address data (in the format of "province, road, route, and candidate simplified POI list") obtained in step S3037 may be indexed by using an open-source search engine tool (Solr, elastic search, etc.) to construct and generate the POI prediction search library. Establishing inverted indexes on the 5 fields of ' province ', ' city ', ' district ', ' road ' and ' road number ', so that ' West ' 969 can recall ' the Alibary xi district No. 969 in the Yunhong district in Hangzhou city of Zhejiang province; a small post office; science and technology city "such search results.
Thus, the scoring of the candidate interest points is described, and after the address context correlation score of the candidate interest points is determined, the next step may be performed, and the interest points of the first address data are determined according to the score.
Step S109: and according to the address context correlation score, determining the interest point corresponding to the first address data from the at least one candidate interest point.
The address context correlation score may be used to rank the candidate points of interest to select, from the plurality of candidate points of interest, a point of interest with a higher score as the point of interest of the first address data, e.g., the candidate point of interest with the highest score as the point of interest of the first address data.
For example, for the first address data "zhanghou district text yi xi lu 969, zhe jiang hangzhou city", the corresponding candidate interest point "ariiba xi yuan" scores 0.91, "small post office" scores 0.73, "science city" scores 0.54, and the finally predicted interest point of the first address data is "ariaba xi yuan".
For another example, for the first address data "temple street information center", the score of the corresponding candidate interest point "sichuan province living room" is higher than the score of "sichuan province living room", and the interest point of the first address data finally obtained by prediction is "sichuan province living room".
In the above steps, the step of generating the language model and the first corresponding relationship set may be performed in an off-line processing manner, and steps S101 to S109 belong to an on-line process. In this embodiment, the processing procedure of the offline flow part is to learn from the processed fourth address data set to obtain an address language model, and further eliminate redundant POIs to obtain valid POI data, thereby constructing a candidate interest point search library for POI prediction. And carrying out structural analysis on the new address (query) by the online process, carrying out candidate POI retrieval in a candidate POI retrieval library through 'provincial region + road number', scoring and sequencing the retrieved candidate POI by using an address language model, and taking the POI with the highest score as the final predicted POI.
As can be seen from the foregoing embodiments, the address interest point prediction method provided in the embodiments of the present application obtains first address data to be processed; determining at least one candidate point of interest corresponding to the first address data; generating a second address data set comprising at least one candidate point of interest based on the first address data and the at least one candidate point of interest; obtaining a language model score of at least one second address data through a language model, and taking the language model score as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set; determining a point of interest corresponding to the first address data from the at least one candidate point of interest according to the address context correlation score; the processing mode enables the candidate interest points to be subjected to scoring and sequencing through the address language model and the address context information, so that the interest points of the first address data are determined; therefore, the address interest point prediction accuracy can be effectively improved.
In the foregoing embodiments, an address interest point prediction method is provided, and correspondingly, the present application also provides an address interest point prediction apparatus. The apparatus corresponds to an embodiment of the method described above.
Second embodiment
Please refer to fig. 6, which is a schematic diagram of an embodiment of an address poi predicting apparatus according to the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The present application further provides an address interest point prediction apparatus, including:
a first address data acquisition unit 601 configured to acquire first address data to be processed;
a candidate interest point determining unit 603, configured to determine at least one candidate interest point corresponding to the first address data;
a second address data set generating unit 605, configured to generate a second address data set including at least one candidate point of interest according to the first address data and the at least one candidate point of interest;
a candidate interest point scoring unit 607, configured to obtain a language model score of at least one second address data through a language model, as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
the interest point determining unit 609 is configured to determine, according to the address context correlation score, an interest point corresponding to the first address data from the at least one candidate interest point.
Optionally, the method further includes:
the third address data acquisition unit is used for acquiring third address data to form a third address data set;
the interest point identification unit is used for identifying the interest point in the third address data through the interest point identification model;
and the language model learning unit is used for learning the language model from the fourth address data set of the identified interest points.
Optionally, the third address data acquisition unit is specifically configured to acquire the third address data through the internet.
Optionally, the method further includes:
and the data cleaning unit is used for cleaning the acquired third address data according to the data cleaning rule.
Optionally, the candidate interest point determining unit 603 is configured to determine at least one candidate interest point corresponding to the first address data according to a first set of correspondence relationships between the address data of the first address granularity and the interest point.
Optionally, the method further includes:
an interest point unit, configured to perform interest point aggregation on the first address granularity according to a second set of correspondence between fourth address data and interest points;
and the first corresponding relation set generating unit is used for generating the first corresponding relation set according to the aggregated interest points.
Optionally, the first correspondence relationship set generating unit includes:
an interest point category obtaining subunit, configured to obtain, through an interest point classification model, an interest point category of the aggregated interest points;
and the first redundancy removing subunit is configured to determine, according to the occurrence number of the aggregated interest points, interest points corresponding to each interest point category as interest points corresponding to address data of the first address granularity.
Optionally, the method further includes:
and the interest point category model building unit is used for learning the interest point classification model from a training data set comprising the interest points and interest point categories.
Optionally, the interest point category in the training data set is determined by the following method:
and acquiring the interest point category of the interest point through the electronic map.
Optionally, the first correspondence relationship set generating unit includes:
a synonym interest point obtaining subunit, configured to obtain, through an interest point synonym identification model, a synonym interest point in the aggregated interest points;
and the second redundancy removing subunit is used for determining the reserved synonym interest points according to the occurrence frequency of the aggregated interest points, and the reserved synonym interest points are used as the interest points corresponding to the address data with the first address granularity.
Optionally, the method further includes:
and the synonym recognition model building unit is used for learning and obtaining the interest point synonym recognition model from the training data set comprising the first interest point, the second interest point and the synonym marking information.
Third embodiment
Please refer to fig. 7, which is a diagram illustrating an embodiment of an address poi prediction system according to the present application. Since the system embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The system embodiments described below are merely illustrative.
An address interest point prediction system of this embodiment includes: a client 701 and a server 702. The client 701 includes, but is not limited to, a mobile communication device, a personal computer, a PAD, an iPad, and other terminal devices.
For example, the client 701 is a smartphone and can send an interest point prediction request for target address data to a server; receiving the interest points corresponding to the target address data returned by the server; the server is used for receiving the interest point prediction request; determining at least one candidate point of interest corresponding to the target address data; generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest; obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by studying from the address data set; determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score; and returning the interest points corresponding to the target address data to the client.
As can be seen from the foregoing embodiments, the address interest point prediction system provided in the embodiments of the present application receives, by a server, an interest point prediction request for target address data sent by a client; determining at least one candidate point of interest corresponding to the target address data; generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest; obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by studying from the address data set; determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score; and returning the interest points corresponding to the target address data to the client so as to display the interest point information to the user; the processing mode enables the candidate interest points to be subjected to scoring and sequencing through the address language model and the address context information, so that the interest points of the first address data are determined; therefore, the address interest point prediction accuracy can be effectively improved.
In the foregoing embodiment, an address interest point prediction system is provided, and correspondingly, the present application further provides an address interest point prediction method. The method corresponds to the embodiment of the system described above.
Fourth embodiment
Please refer to fig. 8, which is a flowchart illustrating an embodiment of an address poi prediction method according to the present application. Since the method embodiment is basically similar to the system embodiment, the description is simple, and the relevant points can be referred to the partial description of the system embodiment. The method embodiments described below are merely illustrative.
The present application further provides an address interest point prediction method, including:
step S801: receiving a point of interest prediction request for target address data;
step S803: determining at least one candidate point of interest corresponding to the target address data;
step S805: generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest;
step S807: obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
step S809: determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score;
step S811: and returning the interest points corresponding to the target address data to the requester.
As can be seen from the foregoing embodiments, the address interest point prediction method provided in the embodiments of the present application receives an interest point prediction request for target address data; determining at least one candidate point of interest corresponding to the first address data; generating a second address data set comprising at least one candidate point of interest based on the first address data and the at least one candidate point of interest; obtaining a language model score of at least one second address data through a language model, and taking the language model score as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set; determining an interest point corresponding to the first address data from the at least one candidate interest point according to the address context correlation score, and returning the interest point corresponding to the target address data to the requester; the processing mode enables the candidate interest points to be subjected to scoring and sequencing through the address language model and the address context information, so that the interest points of the first address data are determined; therefore, the address interest point prediction accuracy can be effectively improved.
In the foregoing embodiments, an address interest point prediction method is provided, and correspondingly, the present application also provides an address interest point prediction apparatus. The apparatus corresponds to an embodiment of the method described above.
Fifth embodiment
Please refer to fig. 9, which is a schematic diagram of an embodiment of an address poi predicting apparatus according to the present application. Since the apparatus embodiments are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for relevant points. The device embodiments described below are merely illustrative.
The present application further provides an address interest point prediction apparatus, including:
a request receiving unit 901 configured to receive a point of interest prediction request for target address data;
a candidate interest point determining unit 903, configured to determine at least one candidate interest point corresponding to the target address data;
an address data set generating unit 905, configured to generate an address data set including at least one candidate point of interest according to the target address data and the at least one candidate point of interest;
a candidate interest point scoring unit 907, configured to obtain a language model score of at least one address data through a language model, as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
an interest point determining unit 909 for determining an interest point corresponding to the target address data from the at least one candidate interest point according to the address context correlation score;
an interest point returning unit 9011, configured to return, to the requester, an interest point corresponding to the target address data.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Claims (15)
1. An address interest point prediction method, comprising:
acquiring first address data to be processed;
determining at least one candidate point of interest corresponding to the first address data;
generating a second address data set comprising at least one candidate point of interest based on the first address data and the at least one candidate point of interest;
obtaining a language model score of at least one second address data through a language model, and taking the language model score as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
and according to the address context correlation score, determining the interest point corresponding to the first address data from the at least one candidate interest point.
2. The method of claim 1, further comprising:
collecting third address data to form a third address data set;
identifying the interest points in the third address data through the interest point identification model;
learning the language model from the fourth address dataset identifying the point of interest.
3. The method of claim 2, wherein the third address data is collected over the internet.
4. The method of claim 2, further comprising:
and performing data cleaning on the acquired third address data according to the data cleaning rule.
5. The method of claim 2, wherein determining at least one candidate point of interest corresponding to the first address data comprises:
at least one candidate point of interest corresponding to the first address data is determined based on a first set of correspondence between address data of a first address granularity and the point of interest.
6. The method of claim 5, further comprising:
performing point of interest aggregation at the first address granularity according to a second set of correspondence between fourth address data and points of interest;
and generating the first corresponding relation set according to the aggregated interest points.
7. The method of claim 6, wherein the generating the first set of correspondence relationships from the aggregated points of interest comprises:
obtaining the interest point category of the aggregated interest points through an interest point classification model;
and determining interest points corresponding to the interest point categories according to the occurrence times of the aggregated interest points, and taking the interest points as the interest points corresponding to the address data of the first address granularity.
8. The method of claim 7, further comprising:
and learning the interest point classification model from a training data set comprising the interest points and interest point categories.
9. The method of claim 8, wherein the interest point category in the training data set is determined as follows:
and acquiring the interest point category of the interest point through the electronic map.
10. The method of claim 6, wherein the generating the first set of correspondence relationships from the aggregated points of interest comprises:
obtaining synonym interest points in the aggregated interest points through an interest point synonym identification model;
and determining the reserved synonym interest points as the interest points corresponding to the address data of the first address granularity according to the occurrence frequency of the aggregated interest points.
11. The method of claim 10, further comprising:
and learning from a training data set comprising the first interest point, the second interest point and synonym labeling information to obtain the interest point and synonym recognition model.
12. An address point of interest prediction system, comprising:
the client is used for sending an interest point prediction request aiming at the target address data to the server; receiving the interest points corresponding to the target address data returned by the server;
the server is used for receiving the interest point prediction request; determining at least one candidate point of interest corresponding to the target address data; generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest; obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set; determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score; and returning the interest points corresponding to the target address data to the client.
13. An address interest point prediction method, comprising:
receiving a point of interest prediction request for target address data;
determining at least one candidate point of interest corresponding to the target address data;
generating an address data set comprising at least one candidate point of interest based on the target address data and the at least one candidate point of interest;
obtaining a language model score of at least one address data through a language model as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
determining a point of interest corresponding to the target address data from the at least one candidate point of interest according to the address context correlation score;
and returning the interest points corresponding to the target address data to the requester.
14. An address interest point prediction apparatus, comprising:
a first address data acquisition unit configured to acquire first address data to be processed;
a candidate interest point determining unit configured to determine at least one candidate interest point corresponding to the first address data;
a second address data set generating unit, configured to generate a second address data set including at least one candidate point of interest according to the first address data and the at least one candidate point of interest;
the candidate interest point scoring unit is used for acquiring a language model score of at least one second address data through a language model to serve as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
and the interest point determining unit is used for determining the interest point corresponding to the first address data from the at least one candidate interest point according to the address context correlation score.
15. An address interest point prediction apparatus, comprising:
a request receiving unit configured to receive a point of interest prediction request for target address data;
a candidate interest point determining unit for determining at least one candidate interest point corresponding to the target address data;
an address data set generating unit, configured to generate an address data set including at least one candidate point of interest according to the target address data and the at least one candidate point of interest;
the candidate interest point scoring unit is used for acquiring a language model score of at least one address data through a language model to serve as an address context correlation score of the candidate interest point; the language model is obtained by learning from the address data set;
an interest point determining unit, configured to determine, according to the address context correlation score, an interest point corresponding to the target address data from the at least one candidate interest point;
and the interest point returning unit is used for returning the interest point corresponding to the target address data to the requester.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910205506.9A CN111723165B (en) | 2019-03-18 | 2019-03-18 | Address interest point determination method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910205506.9A CN111723165B (en) | 2019-03-18 | 2019-03-18 | Address interest point determination method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111723165A true CN111723165A (en) | 2020-09-29 |
CN111723165B CN111723165B (en) | 2024-06-11 |
Family
ID=72562830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910205506.9A Active CN111723165B (en) | 2019-03-18 | 2019-03-18 | Address interest point determination method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111723165B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112732779A (en) * | 2020-12-29 | 2021-04-30 | 合肥市智享亿云信息科技有限公司 | Method for analyzing address text by big data based on site POI |
CN113438280A (en) * | 2021-06-03 | 2021-09-24 | 多点生活(成都)科技有限公司 | Vehicle starting control method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101128819A (en) * | 2004-12-30 | 2008-02-20 | 谷歌公司 | Local item extraction |
CN103609144A (en) * | 2011-06-16 | 2014-02-26 | 诺基亚公司 | Method and apparatus for resolving geo-identity |
CN104050205A (en) * | 2013-09-24 | 2014-09-17 | 腾讯科技(深圳)有限公司 | Address information input method, address information acquisition method, address information input device, address information acquisition device, equipment, and address information input system |
CN104933171A (en) * | 2015-06-30 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | Method and device for associating data of interest point |
WO2016165538A1 (en) * | 2015-04-13 | 2016-10-20 | 阿里巴巴集团控股有限公司 | Address data management method and device |
WO2017173783A1 (en) * | 2016-04-07 | 2017-10-12 | 中兴通讯股份有限公司 | Method of displaying point of interest, and terminal |
CN107580069A (en) * | 2017-09-22 | 2018-01-12 | 挖财网络技术有限公司 | The determination method and device of station address |
CN107622061A (en) * | 2016-07-13 | 2018-01-23 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and system for determining address uniqueness |
CN107656913A (en) * | 2017-09-30 | 2018-02-02 | 百度在线网络技术(北京)有限公司 | Map point of interest address extraction method, apparatus, server and storage medium |
-
2019
- 2019-03-18 CN CN201910205506.9A patent/CN111723165B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101128819A (en) * | 2004-12-30 | 2008-02-20 | 谷歌公司 | Local item extraction |
CN103609144A (en) * | 2011-06-16 | 2014-02-26 | 诺基亚公司 | Method and apparatus for resolving geo-identity |
CN104050205A (en) * | 2013-09-24 | 2014-09-17 | 腾讯科技(深圳)有限公司 | Address information input method, address information acquisition method, address information input device, address information acquisition device, equipment, and address information input system |
WO2016165538A1 (en) * | 2015-04-13 | 2016-10-20 | 阿里巴巴集团控股有限公司 | Address data management method and device |
CN104933171A (en) * | 2015-06-30 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | Method and device for associating data of interest point |
WO2017173783A1 (en) * | 2016-04-07 | 2017-10-12 | 中兴通讯股份有限公司 | Method of displaying point of interest, and terminal |
CN107622061A (en) * | 2016-07-13 | 2018-01-23 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and system for determining address uniqueness |
CN107580069A (en) * | 2017-09-22 | 2018-01-12 | 挖财网络技术有限公司 | The determination method and device of station address |
CN107656913A (en) * | 2017-09-30 | 2018-02-02 | 百度在线网络技术(北京)有限公司 | Map point of interest address extraction method, apparatus, server and storage medium |
Non-Patent Citations (1)
Title |
---|
王勇;刘纪平;郭庆胜;罗安;: "顾及位置关系的网络POI地址信息标准化处理方法", 测绘学报, no. 05, 15 May 2016 (2016-05-15) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112732779A (en) * | 2020-12-29 | 2021-04-30 | 合肥市智享亿云信息科技有限公司 | Method for analyzing address text by big data based on site POI |
CN113438280A (en) * | 2021-06-03 | 2021-09-24 | 多点生活(成都)科技有限公司 | Vehicle starting control method and device |
CN113438280B (en) * | 2021-06-03 | 2023-02-17 | 多点生活(成都)科技有限公司 | Vehicle starting control method and device |
Also Published As
Publication number | Publication date |
---|---|
CN111723165B (en) | 2024-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390054B (en) | Interest point recall method, device, server and storage medium | |
CN110008300B (en) | Method and device for determining alias of POI (Point of interest), computer equipment and storage medium | |
Gong et al. | A survey on dataset quality in machine learning | |
CN111488426A (en) | Query intention determining method and device and processing equipment | |
CN110968654B (en) | Address category determining method, equipment and system for text data | |
CN109637000B (en) | Invoice detection method and device, storage medium and electronic terminal | |
CN103514199A (en) | Method and device for POI data processing and method and device for POI searching | |
CN113505204B (en) | Recall model training method, search recall device and computer equipment | |
CN107194412A (en) | A kind of method of processing data, device, equipment and computer-readable storage medium | |
CN110990520B (en) | Address coding method and device, electronic equipment and storage medium | |
CN111666425B (en) | Automobile accessory searching method based on semantic knowledge | |
CN110688434B (en) | Method, device, equipment and medium for processing interest points | |
CN111310065A (en) | Social contact recommendation method and device, server and storage medium | |
CA2906767A1 (en) | Non-deterministic disambiguation and matching of business locale data | |
CN115017425B (en) | Location search method, location search device, electronic device, and storage medium | |
CN111723165A (en) | Address interest point determining method, device and system | |
CN111414357A (en) | Address data processing method, device, system and storage medium | |
CN111931077A (en) | Data processing method and device, electronic equipment and storage medium | |
CN113239173A (en) | Method and device for processing question and answer data, storage medium and electronic equipment | |
CN111460044B (en) | Geographic position data processing method and device | |
CN111126422B (en) | Method, device, equipment and medium for establishing industry model and determining industry | |
CN114911999A (en) | Name matching method and device | |
CN114328808A (en) | Address fuzzy matching method, address processing method, address fuzzy matching device and electronic equipment | |
CN110598122B (en) | Social group mining method, device, equipment and storage medium | |
CN110807082B (en) | Quality selective examination item determining method, system, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |