CN107491525A - Distributed address comparison method and device - Google Patents

Distributed address comparison method and device Download PDF

Info

Publication number
CN107491525A
CN107491525A CN201710709020.XA CN201710709020A CN107491525A CN 107491525 A CN107491525 A CN 107491525A CN 201710709020 A CN201710709020 A CN 201710709020A CN 107491525 A CN107491525 A CN 107491525A
Authority
CN
China
Prior art keywords
address
standard
multiple levels
standardization
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710709020.XA
Other languages
Chinese (zh)
Inventor
王思睿
秦锋剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Green Bay Network Technology Co., Ltd.
Original Assignee
Grass Count Language (beijing) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Grass Count Language (beijing) Technology Co Ltd filed Critical Grass Count Language (beijing) Technology Co Ltd
Priority to CN201710709020.XA priority Critical patent/CN107491525A/en
Publication of CN107491525A publication Critical patent/CN107491525A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Remote Sensing (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of distributed address comparison method and device, wherein, method includes:Treat alignment site and carry out address specifications processing, obtain the standardization address participle of multiple levels of the standard marks;The address participle marked according to the multiple levels of the standard of default algorithm carries out computing according to default burst key value, and burst address database is determined according to operation result, wherein, the normal address nodes of multiple levels of the standard is included in burst address database;The normal address node for the target criteria rank that the standardization address that multiple levels of the standard mark is distributed in the normal address node of multiple levels of the standard enters row address comparison, and obtains comparison result.Thus, the performance issue compared in address table up to magnanimity address date in the case of nonstandard is efficiently solved, while conversion effect will not be brought to lose, improves address comparison efficiency.

Description

Distributed address comparison method and device
Technical field
The present invention relates to microcomputer data processing field, more particularly to a kind of distributed address comparison method and dress Put.
Background technology
At present, in the application changed to Text Address to gis, hundred million grades of data conversion demand is had daily, it is necessary to will User's input address turns a coordinate points being mapped in map, to facilitate user to carry out visual analyzing on map.
In correlation technique, it is all unit solution that scheme is compared in address, and on the problem of magnanimity address compares, topic is main There is some following shortcoming:1) address base storage problem, need to establish the address base number with gis information before calculating is compared According to, the order of magnitude often several hundred million to tens rank, unit storage is spatially difficult to meet the needs of big data quantity.2) exist When user input data and candidate site storehouse data compare, the contrast conting amount elapsed time of full dose is long, each address ratio To being required for being compared calculating with more than one hundred million rank data, i.e., when calculating demand in real time in face of magnanimity address, real-time and big The demand of criticizing, which is all measured, to be difficult to be met.
The content of the invention
It is contemplated that at least solves one of above-mentioned technical problem to a certain extent.
Therefore, first purpose of the present invention is to propose a kind of distributed address comparison method, this method effectively solves In the performance issue that address table compares up to magnanimity address date in the case of nonstandard, while conversion effect will not be brought to damage Lose, improve address comparison efficiency.
Second object of the present invention is to propose a kind of distributed address comparison device.
The 3rd purpose of the present invention proposes a kind of computer equipment.
The 4th purpose of the present invention proposes a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention proposes a kind of computer program product.
For the above-mentioned purpose, first aspect present invention embodiment proposes a kind of distributed address comparison method, including:It is right Address to be compared carries out address specifications processing, obtains the standardization address participle of multiple levels of the standard marks;According to default The standardization address participle that algorithm marks the multiple levels of the standard carries out computing according to default burst key value, according to fortune Calculate result and determine burst address database, wherein, the normal addresses of multiple levels of the standard is included in the burst address database Node;Save the normal address that the standardization address that the multiple levels of the standard mark is distributed to the multiple levels of the standard The normal address node of target criteria rank in point enters row address comparison, and obtains comparison result.
The distributed address comparison method of the embodiment of the present invention, treat alignment site and carry out address specifications processing, obtain The standardization address participle of multiple levels of the standard marks, the standardization address that multiple levels of the standard are marked according to default algorithm Participle carries out computing according to default burst key value, burst address database is determined according to operation result, by multiple standard level The standardization address not marked is distributed to the standard of the target criteria rank in the normal address node of multiple levels of the standard Addressed nodes enter row address comparison, and obtain comparison result.Thus, efficiently solve and plunged into the commercial sea in address table up to nonstandard situation The performance issue that address date compares is measured, while conversion effect will not be brought to lose, improves address comparison efficiency.
In addition, distributed address comparison method according to the above embodiment of the present invention, can also have skill additional as follows Art feature:
Alternatively, address specifications processing is carried out in the alignment site for the treatment of, obtains the rule of multiple levels of the standard marks Before the participle of generalized address, in addition to:Pretreatment operation is carried out to the address to be compared, wherein, the pretreatment operation bag Include the one or more in capital and small letter conversion, the conversion of half full-shape, preset characters cleaning.
Alternatively, the alignment site for the treatment of carries out address specifications processing, obtains the specification of multiple levels of the standard marks Changing address participle includes:Cutting word processing is carried out to the address to be compared, obtains address participle;According to default address rank pair The address participle is labeled;According to default address specifications strategy, the address after the mark is segmented into row address and advised Generalized, and the address participle after the address specifications is marked with levels of the standard, to obtain the specification of multiple levels of the standard marks Change address participle.
Alternatively, it is described according to default address specifications strategy, the address participle after the mark is subjected to address specifications Change, and the address participle after the address specifications is marked with levels of the standard, in addition to:Before and after being segmented according to the address Text, the address rank of supplement levels of the standard mark.
Alternatively, the multiple levels of the standard are distributed in the standardization address for marking the multiple levels of the standard Normal address node in target criteria rank normal address node enter row address compare when, if not with the target Address after the address specifications, then be distributed to all by the address participle of the normal address node matching of levels of the standard Burst address database in all node be compared, obtain multiple comparison results;According to default screening strategy, in institute State and optimal comparison result is determined in multiple comparison results.
Alternatively, it is described burst address database is determined according to operation result before, in addition to:Normal address is carried out Address specificationsization processing, and it is stored in multiple to the normal address after standardization processing, split blade type according to default storage strategy In burst address database.
For the above-mentioned purpose, second aspect of the present invention embodiment proposes a kind of distributed address comparison device, including:Obtain Modulus block, address specifications processing is carried out for treating alignment site, obtain the standardization address point of multiple levels of the standard marks Word;Computing module, the standardization address for being marked the multiple levels of the standard according to default algorithm are segmented according to default Burst key value carries out computing;Determining module, for determining burst address database according to operation result, wherein, the burst The normal address node of multiple levels of the standard is included in address database;Comparing module, for by the multiple levels of the standard mark The standardization address of note is distributed to the standard of the target criteria rank in the address standard nodes of the multiple levels of the standard Addressed nodes enter row address comparison, and obtain comparison result.
The distributed address comparison device of the embodiment of the present invention, treat alignment site and carry out address specifications processing, obtain The standardization address participle of multiple levels of the standard marks, the standardization address that multiple levels of the standard are marked according to default algorithm Participle carries out computing according to default burst key value, burst address database is determined according to operation result, by multiple standard level The standardization address not marked is distributed to the standard of the target criteria rank in the normal address node of multiple levels of the standard Addressed nodes enter row address comparison, and obtain comparison result.Thus, efficiently solve and plunged into the commercial sea in address table up to nonstandard situation The performance issue that amount address date compares will not bring conversion effect to lose simultaneously, improve address comparison efficiency.
In addition, distributed address comparison device according to the above embodiment of the present invention, can also have skill additional as follows Art feature:
Alternatively, described device also includes pretreatment module, for carrying out pretreatment operation to the address to be compared, its In, the pretreatment operation includes the one or more in capital and small letter conversion, the conversion of half full-shape, preset characters cleaning.
Alternatively, the acquisition module includes:Cutting word unit, for carrying out cutting word processing to the address to be compared, obtain Address is taken to segment;First mark unit, for being labeled according to default address rank to address participle;Second mark Unit, for according to default address specifications strategy, the address participle after the mark to be carried out into address specifications, and with standard Rank marks the address participle after the address specifications, is segmented with obtaining the standardization address of multiple levels of the standard marks.
Alternatively, described device also includes:Memory module, for carrying out address specifications processing, and root to normal address According to default storage strategy to the normal address after standardization processing, split blade type is stored in multiple burst address databases.
To reach above-mentioned purpose, third aspect present invention embodiment proposes a kind of computer equipment, including memory, place Reason device and storage on a memory and the computer program that can run on a processor, during the computing device described program, The method described in first aspect embodiment is realized, methods described includes:Treat alignment site and carry out address specifications processing, obtain The standardization address participle of multiple levels of the standard marks;The standardization for being marked the multiple levels of the standard according to default algorithm Address participle carries out computing according to default burst key value, and burst address database is determined according to operation result, wherein, described point The normal address node of multiple levels of the standard is included in piece address database;The standardization that the multiple levels of the standard are marked The normal address node that location is distributed to the target criteria rank in the normal address node of the multiple levels of the standard is carried out Address compares, and obtains comparison result.
To reach above-mentioned purpose, fourth aspect present invention embodiment proposes a kind of non-transitory computer-readable storage medium Matter, computer program is stored thereon with, the method as described in first aspect embodiment, institute are realized when the program is executed by processor The method of stating includes:Treat alignment site and carry out address specifications processing, obtain the standardization address point of multiple levels of the standard marks Word;The standardization address participle for being marked the multiple levels of the standard according to default algorithm is carried out according to default burst key value Computing, burst address database is determined according to operation result, wherein, multiple levels of the standard are included in the burst address database Normal address node;The standardization address that the multiple levels of the standard mark is distributed to the multiple levels of the standard The normal address node of target criteria rank in the node of normal address enters row address comparison, and obtains comparison result.
To reach above-mentioned purpose, fifth aspect present invention embodiment proposes a kind of computer program product, when the meter When instruction in calculation machine program product is by computing device, the method as described in first aspect embodiment, methods described bag are performed Include:Treat alignment site and carry out address specifications processing, obtain the standardization address participle of multiple levels of the standard marks;According to pre- If the algorithm standardization address participle that marks the multiple levels of the standard carry out computing according to default burst key value, according to Operation result determines burst address database, wherein, the study plots of multiple levels of the standard is included in the burst address database Location node;The standardization address that the multiple levels of the standard mark is distributed to the normal address of the multiple levels of the standard The normal address node of target criteria rank in node enters row address comparison, and obtains comparison result.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Substantially and it is readily appreciated that, wherein:
Fig. 1 is the flow chart of distributed address comparison method according to an embodiment of the invention;
Fig. 2 is the flow chart of distributed address comparison method in accordance with another embodiment of the present invention;
Fig. 3 is the flow chart according to the distributed address comparison method of a specific embodiment of the invention;
Fig. 4 is the structural representation of distributed address comparison device according to an embodiment of the invention;
Fig. 5 is the structural representation of distributed address comparison device in accordance with another embodiment of the present invention;
Fig. 6 is the structural representation according to the distributed address comparison device of another embodiment of the invention;And
Fig. 7 is the structural representation according to the distributed address comparison device of a still further embodiment of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the distributed address comparison method and device of the embodiment of the present invention are described.
Fig. 1 is the flow chart of distributed address comparison method according to an embodiment of the invention, as shown in figure 1, the party Method includes:
Step 101, treat alignment site and carry out address specifications processing, with obtaining the standardization of multiple levels of the standard marks Location segments.
It is appreciated that under many application scenarios, the address to be compared of user's input of acquisition is complicated and changeable, causes to obtain Address to be compared and inadequate specification, such as, it may appear that similar " Beijing Zhichun Road " and " Zhichun Road, Haidian District, Beijing City 180 " This expression is lack of standardization, and the address table for causing directly to compare difficulty reaches, therefore, it is necessary to treating alignment site carries out districts and cities' standardization Processing.
Specifically, treat alignment site and carry out address specifications processing, with obtaining the standardization of multiple levels of the standard marks Location segments, and address participle is made standby.
It should be noted that according to the difference of concrete application scene, multiple marks can be got using different processing modes The standardization adress analysis of quasi- rank mark:
As a kind of possible implementation, as shown in Fig. 2 the step 101 further comprises:
Step 201, treat alignment site and carry out cutting word processing, obtain address participle.
Specifically, alignment site is treated according to correlation between the part of speech of address to be compared, word etc. to carry out at cutting word Reason, address participle is obtained, it is of course also possible to training in advance nerve outer network model etc., the model is inputted by address to be compared, with Address corresponding to acquisition segments.
Step 202, according to default address rank, location participle is labeled over the ground.
Specifically, address rank, such as " province " rank, " city " rank, " county " rank etc. are pre-set, it is default according to this Address rank, adress analysis is tentatively marked, the mark tentatively clearly goes out the address properties of adress analysis.
Step 203, according to default address specifications strategy, the address after mark is segmented and carries out address specifications, and with Address participle after levels of the standard mark address specifications, segmented with obtaining the standardization address of multiple levels of the standard marks.
Specifically, due to the statement of address to be compared and nonstandard, therefore, the address based on the address acquisition to be compared point Analysis may be also and nonstandard, therefore to according to default address specifications strategy, the address participle after mark is carried out into address specifications Change.
Such as by address segment " Beijing ", " Haidian " carry out address specifications be processed into " Beijing ", " Haidian District ", enter And compared for the ease of further entering row address, it will be segmented with the address after levels of the standard mark address specifications, it is more to obtain The standardization address participle of individual levels of the standard mark, for example above-mentioned " Beijing ", " Haidian District " are labeled as in { " Beijing ": " province " }, { " Haidian District ":" area " }.
Again for example, annotation results are " Heilungkiang [province], Suihua [city], northern woods [area], central street 180 [other] ", It is after standardization " Heilongjiang Province [province], Suihua City [city], Beilin District [area], central street 180 [other] ".
It should be noted that levels of the standard can be upgraded in time and supplemented in this example, or, the levels of the standard In may also include the levels of the standard of supplement, the address that the levels of the standard of the supplement can be directed to some specific properties is segmented into rower Note, to realize that it is comprehensive that the address after address specifications are marked with levels of the standard segments.
Specifically, in one embodiment of the invention, the context segmented according to address, supplement levels of the standard mark Address rank, after turning to " Beijing " to " Beijing " specification, can add that upper " city has jurisdiction over before " Haidian District " according to context specification Area " supplements levels of the standard.
As alternatively possible implementation, the comparison library for including multiple normal addresses participle is pre-set, will wait to compare The comparison library is inputted to address, if with some normal address analysis in certain in alignment site several coherent words and comparison library Matching degree highest, then using several coherent words as address segment.
For example compared for address to be compared " Beijing Haidian ", " Beijing " with the matching degree of " Beijing " in comparison library The matching degree highests such as " north ", " Bei Jinghai ", " Beijing " is thus subjected to cutting as address participle, based on same principle, " sea Form sediment " other combinations are compared with the matching degree of " Haidian District " in comparison library, matching degree highest, therefore, " Haidian " is used as address Analysis carries out cutting.
It should be noted that due under application scenes, in the address to be compared of acquisition except express it is lack of standardization with Outside, also thus, in order to further improve comparison efficiency, can also be treated comprising some incoherent noise elements before comparison Alignment site carries out dry processing.
As a kind of possible implementation, treat alignment site and carry out pretreatment operation, wherein, pretreatment operation includes One or more in capital and small letter conversion, the conversion of half full-shape, preset characters cleaning.Wherein, preset characters can be known Other spcial character, such as " * " etc..
In the present embodiment, by should the letter of small letter be changed into small letter, the letter that should capitalize is changed into capitalizing, should The half-angle character of half-angle is converted to full-shape etc..
Step 102, the standardization address marked multiple levels of the standard according to default algorithm is segmented according to default key Value carries out computing, and burst address database is determined according to operation result, wherein, multiple standard level are included in burst address database Other normal address node.
It is appreciated that address specifications processing is carried out for all normal addresses in advance, and according to default storage plan Slightly to the normal address after standardization processing, distributed storage is in multiple burst address databases, thus, by substantial amounts of standard Address carries out distributed storage, substantially increases comparison efficiency.
Wherein, default storage strategy may include the address division rank of distributed storage, such as, if according to province Location draws rank and carries out distributed storage, then hash algorithm etc. can be used to take number of all normal addresses based on province The computing mode such as mould, realize using the normal address of each province as a burst address database, such as, if according to city address Rank carries out distributed storage, then hash algorithm etc. can be used to carry out the fortune such as modulus to number of all normal addresses based on city Calculation mode, realize using the normal address in each city as a burst address database, it should be appreciated that address divides rank Lower, the burst address database of division is more, and comparison efficiency is higher.
Wherein, the normal address node of multiple levels of the standard is included in each burst address database, can be by highest level Node index is set, or, using the node of other any specifics as indexing, wherein, the index and default burst key Value is corresponding.
And then according to default algorithm, such as hash algorithm, the address of multiple levels of the standard is segmented according to default burst Key value carries out computing, and burst address database is determined according to budget result, wherein, default mathematical algorithm and default burst Key value is corresponding, you can and it is corresponding with the setting of the index based on burst address database, such as, the highest of burst address database The node of rank is " province ", sets and indexes for the node, then according to default algorithm, based on " province " address rank, to multiple The address participle of levels of the standard carries out computing, identifies the province belonging to it, with being distributed to burst corresponding to corresponding province Location database is compared.
Wherein, the storage mode of the mark address in burst address database can be tree-like storage or enter one The multilevel distributed storage of step.
For example, by normal address according to using city as burst key value, splitting data into 10 parts, and then after standardizing 10 modulus are pressed after carrying out hash by city in address to be compared, obtain burst address database corresponding to every address to be compared, will Address date after burst is deployed on corresponding machine according to specified configuration, and starts address base loading, concordance program.
Step 103, the standardization address that multiple levels of the standard mark is distributed to the study plot of multiple levels of the standard The normal address node of target criteria rank in the node of location enters row address comparison, and obtains comparison result.
Specifically, it is determined that in burst address database, the standardization address that multiple levels of the standard are marked segments distribution The normal address node of target criteria rank into the normal address node of multiple levels of the standard enters row address comparison, and obtains Comparison result.
That is, in actual mechanical process, the normal address section of the indexing criterion rank in burst address database Point, it can be used only for determining which burst address database address to be compared belongs to, can be with base in specific be compared (the normal address node of target criteria rank is highest level to the addressed nodes of highest level in burst address database Addressed nodes) start to compare, the normal address node for other target criteria ranks that can also be directly distributed to specify is compared It is right.
Certainly, in practical implementation, it is possible to which the standardization address for marking multiple levels of the standard is distributed to The normal address node of target criteria rank in the normal address node of multiple levels of the standard enter row address compare when, not with Address after address specifications, then be distributed to all by the address participle of the normal address node matching of target criteria rank Burst address database in all node be compared, multiple comparison results are obtained, according to default screening strategy, more Optimal comparison result is determined in individual comparison result.
Wherein, default screening strategy can be selection matching degree highest comparison result etc..
Certainly, in specific implementation process, in order to improve comparison efficiency, if the not study plot with target criteria rank The address participle of location node matching, then can also continue in next node corresponding with the normal address node of target criteria rank Matching, until have matched the node of predetermined number, still without matching, then the address after address specifications is distributed to institute All nodes are compared in some burst address databases.
For example, when the five-star addressed nodes of the normal address node group of target criteria rank, for standardization The addressed nodes of three-level before standard, such as western osculum Lu Dongsheng Technology Parks are not matched to, result is { " western osculum after standardization Road ":" road " }, { " east rises Technology Park ":" cell " }, this will be calculated on address distribution to all addressed nodes, chosen after collecting Optimal result.
In order that those skilled in the art, have to the distributed address comparison method of the embodiment of the present invention more clear The understanding of Chu, is illustrated with reference to specific embodiment:
In this example, address to be compared is " 5 in the western osculum Lu Shihua dragons more two in Haidian District, Beijing City " burst address Database carries out burst according to " city " address rank.
As shown in figure 3, prerinse, pretreatment, including capital and small letter, half full-shape are carried out to the normal address in the storehouse of normal address Conversion, spcial character cleaning etc., and then, cutting word, mark, standardization processing are carried out to normal address, and then, address is divided Piece forms burst address database, and the normal address Node distribution formula of multiple levels of the standard is included in burst address database Establish index.
Located in advance " 5 in the western osculum Lu Shihua dragons more two in Haidian District, Beijing City " address to be compared for obtaining user's input Reason, " 5 in the western osculum Lu Shihua dragons more two of Haidian District Beijing " cutting word generates in " Beijing ", " Haidian District ", " Xi little Kou Lu ", " in generation Hua Longyue bis- ", " 5 " address participle.
Address rank mark is carried out to result after participle, annotation results are { " Beijing ":" province " }, { " Haidian District ":" area " }, { " Xi little Kou Lu ":" road " }, { " in generation Hua Longyue bis- ":" cell " }, { " 5 ":" building number " }, further, enter row address rule Generalized, address is changed into { " Beijing " after mark:" province " }, { " districts under city administration ", " city " }.{ " Haidian District ":" area " }, { " western osculum Road ":" road " }, { " in generation Hua Longyue bis- ":" cell " }, { " 5 ":" building number " }, wherein, " Beijing can be turned to " Beijing " specification City ", can be according to the address rank that " districts under city administration " supplement levels of the standard mark in the addition of context specification before Haidian District.
Further, input address is distributed to specified node according to " city " level, to " districts under city administration " carry out hash after according to Nodes modulus, obtains corresponding burst address database, and the standardization address that multiple levels of the standard are marked is distributed to point In piece address database, the normal address node of the target criteria rank in the address standard nodes of multiple levels of the standard carries out ground Location compares, and obtains comparison result.
Thus, the distributed address comparison method of the embodiment of the present invention, can be effective by the way of to address library searching Reduction address compare number, the mode of distributed storage+Distributed Calculation, address is standardized, according to fixed rank Address distribution is carried out, the performance issue of magnanimity address date comparison is efficiently solved while conversion effect will not be brought to lose, can To meet the address conversion gis demands of daily hundred million rank.
In summary, the distributed address comparison method of the embodiment of the present invention, treat alignment site and carry out address specifications Processing, the standardization address participle of multiple levels of the standard marks is obtained, is marked multiple levels of the standard according to default algorithm The address participle that standardizes carries out computing according to default burst key value, and burst address database is determined according to operation result, will The standardization address of multiple levels of the standard marks is distributed to the target criteria in the normal address node of multiple levels of the standard The normal address node of rank enters row address comparison, and obtains comparison result.Thus, efficiently solve in address table up to lack of standardization In the case of the performance issue that compares of magnanimity address date, while conversion effect will not be brought to lose, improve address and compare effect Rate.
In order to realize above-described embodiment, the invention also provides a kind of distributed address comparison device, Fig. 4 is according to this hair The structural representation of the distributed address comparison device of bright one embodiment, as shown in figure 4, the distributed address comparison device bag Include:Acquisition module 100, computing module 200, determining module 300 and comparing module 400.
Wherein, acquisition module 100, address specifications processing is carried out for treating alignment site, obtains multiple levels of the standard The standardization address participle of mark.
It should be noted that according to the difference of concrete application scene, acquisition module 100 can obtain more in different ways The standardization address participle of individual levels of the standard mark, as a kind of possible embodiment, as shown in figure 5, acquisition module 100 includes Cutting word unit 110, first marks unit 120, second and marks unit 130.
Wherein, cutting word unit 110, cutting word processing is carried out for treating alignment site, obtains address participle.
First mark unit 120, for location participle to be labeled over the ground according to default address rank.
Second mark unit 130, for according to default address specifications strategy, the address after mark to be segmented into row address Standardization, and the address participle after address specifications is marked with levels of the standard, to obtain the standardization of multiple levels of the standard marks Address segments.
In one embodiment of the invention, as shown in fig. 6, on the basis of as shown in Figure 4, the distributed address compares Device also includes pretreatment module 500, and pretreatment operation is carried out for treating alignment site, wherein, pretreatment operation includes big One or more in small letter conversion, the conversion of half full-shape, preset characters cleaning.
Computing module 200, for the standardization address participle that is marked multiple levels of the standard according to default algorithm according to Default burst key value carries out computing.
Determining module 300, for determining burst address database according to operation result, wherein, in burst address database Normal address node comprising multiple levels of the standard.
It is appreciated that burst address database is pre-established, as shown in fig. 7, on the basis of as shown in Figure 4, the distribution Formula address comparison device also includes memory module 600, for carrying out address specifications processing to normal address, and according to default To the normal address after standardization processing, split blade type is stored in multiple burst address databases storage strategy.
Comparing module 400, the standardization address for multiple levels of the standard to be marked are distributed to multiple levels of the standard Normal address node in the normal address node of target criteria rank enter row address comparison, and obtain comparison result.
It should be noted that the foregoing explanation to distributed address comparison method, is also applied for the embodiment of the present invention Distributed address comparison device, its realization principle is similar, will not be repeated here.
In summary, the distributed address comparison device of the embodiment of the present invention, treat alignment site and carry out address specifications Processing, the standardization address participle of multiple levels of the standard marks is obtained, is marked multiple levels of the standard according to default algorithm The address participle that standardizes carries out computing according to default burst key value, determines burst address database according to operation result, will be more The standardization address of individual levels of the standard mark is distributed to the target criteria level in the normal address node of multiple levels of the standard Other normal address node enters row address comparison, and obtains comparison result.Thus, efficiently solve in address table up to nonstandard In the case of the performance issue that compares of magnanimity address date, while conversion effect will not be brought to lose, improve address comparison efficiency.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification Close and combine.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three It is individual etc., unless otherwise specifically defined.
Any process or method described otherwise above description in flow chart or herein is construed as, and represents to include Module, fragment or the portion of the code of the executable instruction of one or more the step of being used to realize custom logic function or process Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system including the system of processor or other can be held from instruction The system of row system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realized.Such as, if realized with hardware with another embodiment, following skill well known in the art can be used Any one of art or their combination are realized:With the logic gates for realizing logic function to data-signal from Logic circuit is dissipated, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can also That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and as independent production marketing or in use, can also be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above Embodiments of the invention are stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention System, one of ordinary skill in the art can be changed to above-described embodiment, change, replace and become within the scope of the invention Type.

Claims (10)

1. a kind of distributed address comparison method, it is characterised in that comprise the following steps:
Treat alignment site and carry out address specifications processing, obtain the standardization address participle of multiple levels of the standard marks;
The standardization address that the multiple levels of the standard mark is segmented to enter according to default burst key value according to default algorithm Row computing, burst address database is determined according to operation result, wherein, multiple standard level are included in the burst address database Other normal address node;
Save the normal address that the standardization address that the multiple levels of the standard mark is distributed to the multiple levels of the standard The normal address node of target criteria rank in point enters row address comparison, and obtains comparison result.
2. the method as described in claim 1, it is characterised in that address specifications processing is carried out in the alignment site for the treatment of, Before the standardization address participle for obtaining multiple levels of the standard marks, in addition to:
Pretreatment operation is carried out to the address to be compared, wherein, the pretreatment operation includes capital and small letter conversion, half full-shape turns Change, the one or more in preset characters cleaning.
3. the method as described in claim 1, it is characterised in that the alignment site for the treatment of carries out address specifications processing, obtains Taking the standardization address participle of multiple levels of the standard marks includes:
Cutting word processing is carried out to the address to be compared, obtains address participle;
Address participle is labeled according to default address rank;
According to default address specifications strategy, the address participle after the mark is subjected to address specifications, and with levels of the standard The participle of the address after the address specifications is marked, is segmented with obtaining the standardization address of multiple levels of the standard marks.
4. method as claimed in claim 3, it is characterised in that it is described according to default address specifications strategy, by the mark Address participle afterwards carries out address specifications, and the address participle after the address specifications is marked with levels of the standard, in addition to:
The context segmented according to the address, the address rank of supplement levels of the standard mark.
5. the method as described in claim 1, it is characterised in that in the standardization address for marking the multiple levels of the standard point The normal address node for the target criteria rank that word is distributed in the normal address node of the multiple levels of the standard enters row address During comparison,
If the address with the normal address node matching of the target criteria rank does not segment, by the address specifications Address afterwards is distributed to node all in all burst address databases and is compared, and obtains multiple comparison results;
According to default screening strategy, optimal comparison result is determined in the multiple comparison result.
6. the method as described in claim 1, it is characterised in that it is described according to operation result determine burst address database it Before, in addition to:
Address specifications processing is carried out to normal address, and according to default storage strategy to the study plot after standardization processing Location, split blade type are stored in multiple burst address databases.
A kind of 7. distributed address comparison device, it is characterised in that including:
Acquisition module, address specifications processing is carried out for treating alignment site, obtain the standardization of multiple levels of the standard marks Address segments;
Computing module, for being segmented the standardization address of the multiple levels of the standard standard according to default according to default algorithm Burst key value carries out computing;
Determining module, for determining burst address database according to operation result, wherein, included in the burst address database The normal address node of multiple levels of the standard;
Comparing module, the standardization address for the multiple levels of the standard to be marked are distributed to the multiple levels of the standard Normal address node in the normal address node of target criteria rank enter row address comparison, and obtain comparison result.
8. device as claimed in claim 7, it is characterised in that also include:
Pretreatment module, for carrying out pretreatment operation to the address to be compared, wherein, the pretreatment operation includes size Write the one or more in conversion, the conversion of half full-shape, preset characters cleaning.
9. device as claimed in claim 7, it is characterised in that the acquisition module includes:
Cutting word unit, for carrying out cutting word processing to the address to be compared, obtain address participle;
First mark unit, for being labeled according to default address rank to address participle;
Second mark unit, for according to default address specifications strategy, the address after the mark being segmented into row address and advised Generalized, and the address participle after the address specifications is marked with levels of the standard, to obtain the specification of multiple levels of the standard marks Change address participle.
10. device as claimed in claim 7, it is characterised in that also include:
Memory module, for normal address carry out address specifications processing, and according to default storage strategy to standardization at Normal address after reason, split blade type are stored in multiple burst address databases.
CN201710709020.XA 2017-08-17 2017-08-17 Distributed address comparison method and device Withdrawn CN107491525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710709020.XA CN107491525A (en) 2017-08-17 2017-08-17 Distributed address comparison method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710709020.XA CN107491525A (en) 2017-08-17 2017-08-17 Distributed address comparison method and device

Publications (1)

Publication Number Publication Date
CN107491525A true CN107491525A (en) 2017-12-19

Family

ID=60646511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710709020.XA Withdrawn CN107491525A (en) 2017-08-17 2017-08-17 Distributed address comparison method and device

Country Status (1)

Country Link
CN (1) CN107491525A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008625A (en) * 2019-12-06 2020-04-14 中国建设银行股份有限公司 Address correction method, device, equipment and storage medium
CN111177719A (en) * 2019-08-13 2020-05-19 腾讯科技(深圳)有限公司 Address category determination method, device, computer-readable storage medium and equipment
CN111414357A (en) * 2019-01-07 2020-07-14 阿里巴巴集团控股有限公司 Address data processing method, device, system and storage medium
CN112287671A (en) * 2020-09-29 2021-01-29 深圳市跨越新科技有限公司 Simhash-based address resolution method and system
CN112925922A (en) * 2019-12-06 2021-06-08 农业农村部信息中心 Method, device, electronic equipment and medium for obtaining address
CN114970518A (en) * 2022-02-15 2022-08-30 北京青萌数海科技有限公司 Method and device for correcting address data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008073502A2 (en) * 2006-12-11 2008-06-19 Google Inc. Viewport-relative scoring for location search queries
CN101350013A (en) * 2007-07-18 2009-01-21 北京灵图软件技术有限公司 Method and system for searching geographical information
CN104199860A (en) * 2014-08-15 2014-12-10 浙江大学 Dataset fragmentation method based on two-dimensional geographic position information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008073502A2 (en) * 2006-12-11 2008-06-19 Google Inc. Viewport-relative scoring for location search queries
CN101350013A (en) * 2007-07-18 2009-01-21 北京灵图软件技术有限公司 Method and system for searching geographical information
CN104199860A (en) * 2014-08-15 2014-12-10 浙江大学 Dataset fragmentation method based on two-dimensional geographic position information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨宗亮: "基于P2P的地理空间信息服务的架构及相关算法研究", 《中国博士学位论文全文数据库》 *
洪莹: "城市地名地址匹配方法研究与实验", 《中国优秀硕士学位论文全文数据库》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111414357A (en) * 2019-01-07 2020-07-14 阿里巴巴集团控股有限公司 Address data processing method, device, system and storage medium
CN111177719A (en) * 2019-08-13 2020-05-19 腾讯科技(深圳)有限公司 Address category determination method, device, computer-readable storage medium and equipment
CN111008625A (en) * 2019-12-06 2020-04-14 中国建设银行股份有限公司 Address correction method, device, equipment and storage medium
CN112925922A (en) * 2019-12-06 2021-06-08 农业农村部信息中心 Method, device, electronic equipment and medium for obtaining address
CN112287671A (en) * 2020-09-29 2021-01-29 深圳市跨越新科技有限公司 Simhash-based address resolution method and system
CN114970518A (en) * 2022-02-15 2022-08-30 北京青萌数海科技有限公司 Method and device for correcting address data
CN114970518B (en) * 2022-02-15 2022-12-16 北京青萌数海科技有限公司 Method and device for correcting address data

Similar Documents

Publication Publication Date Title
CN107491525A (en) Distributed address comparison method and device
CN106776523B (en) Artificial intelligence-based news quick report generation method and device
CN106874279A (en) Generate the method and device of applicating category label
CN107315772A (en) The problem of based on deep learning matching process and device
CN106844658A (en) A kind of Chinese text knowledge mapping method for auto constructing and system
CN109885824A (en) A kind of Chinese name entity recognition method, device and the readable storage medium storing program for executing of level
CN109800298A (en) A kind of training method of Chinese word segmentation model neural network based
CN108334528B (en) Information recommendation method and device
EP3940582A1 (en) Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN103559193B (en) A kind of based on the theme modeling method selecting unit
CN106897262A (en) A kind of file classification method and device and treating method and apparatus
CN109065173B (en) Knowledge path acquisition method
CN110276023A (en) POI changes event discovery method, apparatus, calculates equipment and medium
CN109933686A (en) Song Tag Estimation method, apparatus, server and storage medium
CN109408821A (en) A kind of corpus generation method, calculates equipment and storage medium at device
CN102722556A (en) Model comparison method based on similarity measurement
CN107122492A (en) Lyric generation method and device based on picture content
CN107273883A (en) Decision-tree model training method, determine data attribute method and device in OCR result
CN107608981B (en) Character matching method and system based on regular expression
CN102169591A (en) Line selecting method and drawing method of text note in drawing
CN108108346A (en) The theme feature word abstracting method and device of document
CN114911949A (en) Course knowledge graph construction method and system
CN106844508A (en) deformation word recognition method and device
CN106844743B (en) Emotion classification method and device for Uygur language text
CN108898439A (en) A kind of information recommendation method based on sight spot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190903

Address after: 100192 Dongsheng Science Park, Zhongguancun, 66 Xixiaokou Road, Haidian District, Beijing

Applicant after: Green Bay Network Technology Co., Ltd.

Address before: 100089 Beijing Haidian District Xixiaokou Road 66 Zhongguancun Dongsheng Science Park B-6 Building B 5 floors

Applicant before: Grass count language (Beijing) Technology Co., Ltd.

WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20171219