Heterogeneous address standard conversion and matching method
Technical Field
The invention belongs to the field of geographic information, and relates to a heterogeneous address standard conversion and matching method.
Background
With the development of big data, the address cloud platform is an external application and service platform based on a space standard address, real-time extraction, cleaning, matching and map loading of business data are achieved, the system achieves full map loading of the business data through forward and reverse matching, thematic mapping and publishing are carried out on the mapped business data, personalized requirements of various industries on maps are met, and exclusive maps of various industries are created. Address matching is the process of establishing correspondence between textual descriptive addresses and the spatial geographical location coordinates. The address matching service looks up matching objects for addresses according to specific steps. Firstly, standardizing an address; then the server searches the address matching reference data and searches a potential position; each candidate location is assigned a score based on its proximity to the address, and the address is finally matched with the highest score.
CN 105005577 discloses an address matching method, which adopts a layered progressive matching method, specifically comprising four steps of quick matching, longitude and latitude matching, fuzzy matching and manual judgment, wherein the quick matching is used for accurately matching a high-quality target address, and a chain type supplement mechanism is used for properly supplementing matching; the longitude and latitude matching is carried out on the target address and the adjacent cell thereof according to the longitude and latitude information provided by the map service provider; fuzzy matching is carried out on the target address and the similar cell by using a fuzzy index; and a manual judgment mechanism is used for checking and checking the matching result. The invention also comprises an address word segmentation technology and a confidence index mechanism of address matching accuracy. Although the invention applies the problem of compound application of multiple address matching technology, the address is standardized, so that the invention is difficult to adapt to the requirements of different fields on the address.
Disclosure of Invention
The invention provides a heterogeneous address standard conversion and matching method, which aims to overcome the defects of difficult address standardization, low adaptability and low accuracy.
In order to achieve the above object, the present invention provides a heterogeneous address standard conversion and matching method, which comprises the following steps:
step 1: and (3) verifying the data validity: if the data is legal, executing the step S2, otherwise, directly throwing the exception and ending the execution;
step 2: judging the forward matching of the data, if so, obtaining a matched administrative division information identifier, and performing forward matching; if the data does not meet the forward matching, performing reverse matching;
and step 3: judging whether the result of the forward matching or the reverse matching in the step 2 is empty, if so, throwing an exception, and ending the execution; if the matching result is not null, obtaining matching result information data;
and 4, step 4: judging whether the administrative districts are matched or not according to the matching result information data in the step 3, if so, performing attribution matching and returning a result; otherwise, directly returning the result and finishing the execution.
Preferably, the specific steps of the forward matching are sequentially as follows:
s11, verifying the validity of the data according to the incoming data detail address information and the information of province, city, district and county, and if the data are combined, performing the step S12; if the data is illegal, directly throwing the exception, and ending the execution;
s12: dividing data into information data of other conditions such as information points, detailed address information data, province, city, district and county and the like; if the data is information data of province, city, district and county, performing administrative district completion and standardized field processing;
s13: judging whether the incoming data has no information point and no detailed address at the same time according to the step S12; if the incoming data has no information point and detailed address at the same time, executing the processing without detailed address and information point and returning the result; if the incoming data is the information point or the detailed address information, the information point matching or the detailed address matching is executed correspondingly.
Further, the steps of completion of the administrative region and processing of the standardized fields are as follows in sequence:
s121, splicing and assembling the incoming data to inquire addresses, wherein the addresses comprise province, city, district and county, villages and towns, residence committee and street lane;
s122, splitting the spliced addresses to obtain corresponding phrases;
s123, solving the conflict problem of the administrative divisions by taking the selected administrative divisions as a standard;
and S124, executing Solr address query, completing province, city, district and county fields, and obtaining a standardized phrase list after word segmentation.
Further, the step of completing and processing standardized fields of the administrative region further comprises re-standardizing addresses, which in turn are:
firstly, acquiring a standard address SolrBean;
standardizing administrative division addresses according to the phrase types, and returning to a standard address Solr object; if the Solr objects do not exist or exist in multiple numbers, inquiring the address of the previous stage, and if the Solr objects do not exist or exist in multiple numbers, continuously inquiring the address of the previous stage until the Solr objects are unique;
finally, judging whether the standard address exists or not and whether the standard address matches the nearest address or not, if so, matching the nearest address, and specially matching doorplates, units and rooms; if no recent address exists, the obtained standard value SolrBean is returned.
Preferably, the specific steps of step S13 without detailed address and information point are:
s131, standardizing the address again, and setting and returning a minimum level administrative region standard address;
s132, judging whether the returned standard address exists: if yes, setting a matching rate score, and then acquiring standard address information of the minimum-level administrative division to obtain final standard address information; if not, throwing out exception processing and finishing execution.
Preferably, the information point matching step sequentially comprises:
firstly, judging whether a standard address of an information point exists, if so, executing standard address information according to the information point, and if not, executing detailed address matching;
then, setting administrative division codes and names according to the obtained standard address information;
and finally, carrying out matching rate conversion to obtain the final returned result object data.
The step of address detail matching is as follows in sequence:
firstly, solving the problem of administrative division conflict;
completing the phrase list according to the address extension object;
secondly, re-standardizing the address, and setting and returning a minimum-level administrative division standard address;
and then judging whether the returned standard address exists, if the standard address of the administrative division is not found, throwing exception handling, and if the standard address exists, judging whether the longitude and latitude coordinates are all zero: if the longitude and latitude coordinates are zero, the upper-level standard address is found up until standard address data are obtained; if the longitude and latitude coordinates are not zero, standard address data are obtained;
judging whether the standard address data only needs to be matched with one record, if so, setting a matching rate, and setting a return result object to obtain matching result object data; if one record does not need to be matched, executing solr query, matching a plurality of results, and obtaining matched result object data;
and finally, setting the administrative division codes and names according to the matching result object data, and returning the result.
Preferably, the step of reverse matching is:
firstly, inputting longitude and latitude XY coordinates and searching radius;
searching the information point with the nearest distance according to the longitude and latitude XY coordinates and the searching radius;
judging whether an information point exists in the final search radius, if so, returning the address information of the information point closest to the final search radius, and then returning the result; if not, the result object with the matching state being not matched to the proper address is set to return the result.
Preferably, the specific steps of attribution matching are as follows in sequence:
the method comprises the steps that related information of an address fragment object is transmitted, wherein the related information comprises administrative division codes, administrative district codes and longitude and latitude;
judging whether attribution matching is needed, if not, setting a result returning object and returning a result, if so, judging whether a management jurisdiction code exists: if the administrative district code exists, searching administrative district related information according to the administrative district code, setting a return result object and returning a result, and if the administrative district information is not searched, inquiring the administrative district information.
Further, the concrete steps of managing the query of the jurisdiction are as follows:
firstly, an administrative code list is assembled according to an address fragment object, and whether the size of the administrative code list is larger than 1 is judged according to the arrangement sequence of the administrative partitions from small to large: if the political region code list is not more than 1, performing solr space query according to the transmitted longitude and latitude coordinates to obtain matched minimum management region information, setting the obtained minimum management region information as a return result object and returning a result; if the administrative division code list is larger than 1, executing 'taking the 1 st according to the incoming administrative coding list and inquiring the association relation of the administrative district';
then judging whether the root management jurisdiction association relation query has a result: if no result is obtained, executing the step of removing the current administrative division code from the administrative division code column to obtain a new administrative division list, and then returning to the step of judging whether the size of the administrative division code list is larger than 1; if the query has a result, judging whether the query is unique;
if the query result is unique, acquiring management jurisdiction information, setting the management jurisdiction information as a returned result object and returning a result; if the query result is not unique, judging whether the administrative district association relationship is at the street lane level: if the number is the street level, successively judging the administrative codes downwards until the minimum administrative district information is matched, setting the obtained minimum administrative district information as a return result object and returning a result; and if the administrative division code is not at the street lane level, returning to the step of removing the current administrative division code from the administrative division code column.
The invention has the beneficial effects that:
1. service data (database, table or other form data) is conveniently configured through a tool, and matching items and matching output items are specified;
2. the forward and reverse matching of the address can be realized through the service, and the matching result is evaluated; home matching may improve the accuracy of conventional forward matching.
3. The administrative division completion and standardized field processing can meet the requirements of different fields on addresses, improve the adaptability of the addresses and output matching results into thematic maps; .
4. The multi-stage matching method and the return flow are set, the results of single correlation and multiple correlations are effectively distinguished, the fault tolerance rate is higher, and the matching accuracy is improved.
Drawings
FIG. 1 is a general flow diagram of the address matching method of the present invention;
FIG. 2 is a flow chart of a forward matching method of the present invention;
FIG. 3 is a flow chart of administrative region completion and standardized field processing of the present invention;
FIG. 4 is a flow chart of the present invention for re-normalizing addresses;
FIG. 5 is a flow chart of the present invention without detailing addresses and information points;
FIG. 6 is a flow chart of information point matching of the present invention;
FIG. 7 is a flow chart of the present invention detailing address matching;
FIG. 8 is a flow chart of the reverse matching of the present invention;
FIG. 9 is a flow chart of home matching of the present invention.
Detailed Description
In order to better understand the technical solution proposed by the present invention, the following further explains the present invention with reference to the drawings and specific embodiments.
As shown in fig. 1, a heterogeneous address standard conversion and matching method is characterized by sequentially comprising the following steps:
step 1: verifying the data validity, namely verifying whether the incoming information does not contain address information, if so, executing the step S2, otherwise, directly throwing out the exception and ending the execution;
step 2: judging the forward matching of the data, if so, obtaining a matched administrative division information identifier, and performing forward matching; if the data does not meet the forward matching, performing reverse matching;
and step 3: judging whether the result of the forward matching or the direction matching in the step S2 is empty, if so, throwing out an exception, and ending the execution; if the matching result is not null, obtaining matching result information data;
and 4, step 4: judging whether the administrative districts are matched or not according to the matching result information data in the S3, and if so, performing attribution matching and returning a result; otherwise, directly returning the result and finishing the execution.
1. Forward matching
As shown in fig. 2, the specific steps of the forward matching are in sequence:
s11, verifying the legality of the data according to the incoming data detailed address information and other condition information such as province, city, district and county, and if the data are combined, performing the step S12; if the data is illegal, directly throwing the exception, and ending the execution;
s12: dividing data into information data of other conditions such as information points, detailed address information data, province, city, district and county and the like; if the data is information data of other conditions such as province, city, district and county, performing administrative region completion and standardized field processing;
s13: judging whether the incoming data has no information point and no detailed address at the same time according to the step 12; if the incoming data has no information point and detailed address at the same time, executing the processing without detailed address and information point and returning the result; if the incoming data is the information point or the detailed address information, the information point matching or the detailed address matching is executed correspondingly.
1.1 administrative region completion and standardized field processing
As shown in fig. 3, the steps of completion of administrative regions and processing of standardized fields are as follows:
s121, splicing and assembling the incoming data to inquire addresses, wherein the addresses comprise province, city, district and county, villages and towns, residence committee and street lane;
s122, splitting the spliced addresses to obtain corresponding phrases;
s123, solving the conflict problem of the administrative divisions by taking the selected administrative divisions as a standard;
and S124, executing Solr address query, completing province, city, district and county fields, and obtaining a standardized phrase list after word segmentation.
1.2 renormalizing addresses
As shown in fig. 4, the step of re-normalizing the address is:
firstly, acquiring a standard address SolrBean;
standardizing administrative division addresses according to the phrase types, and returning to a standard address Solr object; if the Solr objects do not exist or exist in multiple numbers, inquiring the address of the previous stage, and if the Solr objects do not exist or exist in multiple numbers, continuously inquiring the address of the previous stage until the Solr objects are unique; finally, judging whether the standard address exists or not and whether the standard address matches the nearest address or not, if so, matching the nearest address, and specially matching doorplates, units and rooms; if no recent address exists, the obtained standard value SolrBean is returned.
1.3 having no detailed address and information point
As shown in fig. 5, the specific steps without detailed address and information point are:
s131, standardizing the address again, and setting and returning a minimum level administrative region standard address;
s132, judging whether the returned standard address exists: if yes, setting a matching rate score, and then acquiring standard address information of the minimum-level administrative division to obtain final standard address information; if not, throwing out exception processing and finishing execution.
1.4 information point matching
As shown in fig. 6, the specific steps of information point matching are as follows:
firstly, judging whether a standard address of an information point exists, if so, executing standard address information according to the information point, and if not, executing detailed address matching;
then, setting administrative division codes and names according to the obtained standard address information;
and finally, carrying out matching rate conversion to obtain the final returned result object data.
1.5 address matching
As shown in fig. 7, the steps of address matching in detail are:
firstly, solving the problem of administrative division conflict; completing the phrase list according to the address extension object; secondly, re-standardizing the address, and setting and returning a minimum-level administrative division standard address; and then judging whether the returned standard address exists, if the standard address of the administrative division is not found, throwing exception handling, and if the standard address exists, judging whether the longitude and latitude coordinates are all zero: if the longitude and latitude coordinates are zero, the upper-level standard address is found up until standard address data are obtained; if the longitude and latitude coordinates are not zero, standard address data are obtained;
judging whether the standard address data only needs to be matched with one record, if so, setting a matching rate, and setting a return result object to obtain matching result object data; if one record does not need to be matched, executing solr query, matching a plurality of results, and obtaining matched result object data;
and finally, setting the administrative division codes and names according to the matching result object data, and returning the result.
2. Reverse matching:
as shown in fig. 8, the specific steps of reverse matching are:
firstly, inputting longitude and latitude XY coordinates and searching radius;
searching the information point with the nearest distance; and searching the nearest information point in the buffer area by a solr space query method. And secondly, searching whether an information point exists in the radius, if so, returning the address information of the information point closest to the information point and returning a result, and if not, setting the obtained matching state as a result object which is not matched with the proper address and then returning the result.
3. Attribution matching
As shown in fig. 9, the specific steps of attribution matching are as follows:
the method comprises the steps that related information of an address fragment object is transmitted, wherein the related information comprises administrative division codes, administrative district codes and longitude and latitude;
judging whether attribution matching is needed, if not, setting a result returning object and returning a result, if so, judging whether a management jurisdiction code exists: if the administrative district code exists, searching administrative district related information according to the administrative district code, setting a return result object and returning a result, and if the administrative district information is not searched, inquiring the administrative district information.
3.1 managing jurisdictional information query
The method comprises the following specific steps:
firstly, an administrative code list is assembled according to an address fragment object, and whether the size of the administrative code list is larger than 1 is judged according to the arrangement sequence of the administrative partitions from small to large: if the political region code list is not more than 1, performing solr space query according to the transmitted longitude and latitude coordinates to obtain matched minimum management region information, setting the obtained minimum management region information as a return result object and returning a result; if the administrative division code list is larger than 1, executing 'taking the 1 st according to the incoming administrative coding list and inquiring the association relation of the administrative district';
then judging whether the root management jurisdiction association relation query has a result: if no result is obtained, executing the step of removing the current administrative division code from the administrative division code column to obtain a new administrative division list, and then returning to the step of judging whether the size of the administrative division code list is larger than 1; if the query has a result, judging whether the query is unique;
if the query result is unique, acquiring management jurisdiction information, setting the management jurisdiction information as a returned result object and returning a result; if the query result is not unique, judging whether the administrative district association relationship is at the street lane level: if the number is the street level, successively judging the administrative codes downwards until the minimum administrative district information is matched, setting the obtained minimum administrative district information as a return result object and returning a result; and if the administrative division code is not at the street lane level, returning to the step of removing the current administrative division code from the administrative division code column.
Adaptation and modification of the relevant modules and software architectures of the above-described embodiments may also be made by those skilled in the art, given the benefit of the teachings and teachings of the above description. Therefore, the present invention is not limited to the specific embodiments disclosed and described above, and some modifications and variations of the present invention should fall within the scope of the claims of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.