CN107491525A - Distributed address comparison method and device - Google Patents
Distributed address comparison method and device Download PDFInfo
- Publication number
- CN107491525A CN107491525A CN201710709020.XA CN201710709020A CN107491525A CN 107491525 A CN107491525 A CN 107491525A CN 201710709020 A CN201710709020 A CN 201710709020A CN 107491525 A CN107491525 A CN 107491525A
- Authority
- CN
- China
- Prior art keywords
- address
- standard
- multiple levels
- standardization
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Remote Sensing (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of distributed address comparison method and device, wherein, method includes:Treat alignment site and carry out address specifications processing, obtain the standardization address participle of multiple levels of the standard marks;The address participle marked according to the multiple levels of the standard of default algorithm carries out computing according to default burst key value, and burst address database is determined according to operation result, wherein, the normal address nodes of multiple levels of the standard is included in burst address database;The normal address node for the target criteria rank that the standardization address that multiple levels of the standard mark is distributed in the normal address node of multiple levels of the standard enters row address comparison, and obtains comparison result.Thus, the performance issue compared in address table up to magnanimity address date in the case of nonstandard is efficiently solved, while conversion effect will not be brought to lose, improves address comparison efficiency.
Description
Technical field
The present invention relates to microcomputer data processing field, more particularly to a kind of distributed address comparison method and dress
Put.
Background technology
At present, in the application changed to Text Address to gis, hundred million grades of data conversion demand is had daily, it is necessary to will
User's input address turns a coordinate points being mapped in map, to facilitate user to carry out visual analyzing on map.
In correlation technique, it is all unit solution that scheme is compared in address, and on the problem of magnanimity address compares, topic is main
There is some following shortcoming:1) address base storage problem, need to establish the address base number with gis information before calculating is compared
According to, the order of magnitude often several hundred million to tens rank, unit storage is spatially difficult to meet the needs of big data quantity.2) exist
When user input data and candidate site storehouse data compare, the contrast conting amount elapsed time of full dose is long, each address ratio
To being required for being compared calculating with more than one hundred million rank data, i.e., when calculating demand in real time in face of magnanimity address, real-time and big
The demand of criticizing, which is all measured, to be difficult to be met.
The content of the invention
It is contemplated that at least solves one of above-mentioned technical problem to a certain extent.
Therefore, first purpose of the present invention is to propose a kind of distributed address comparison method, this method effectively solves
In the performance issue that address table compares up to magnanimity address date in the case of nonstandard, while conversion effect will not be brought to damage
Lose, improve address comparison efficiency.
Second object of the present invention is to propose a kind of distributed address comparison device.
The 3rd purpose of the present invention proposes a kind of computer equipment.
The 4th purpose of the present invention proposes a kind of non-transitorycomputer readable storage medium.
The 5th purpose of the present invention proposes a kind of computer program product.
For the above-mentioned purpose, first aspect present invention embodiment proposes a kind of distributed address comparison method, including:It is right
Address to be compared carries out address specifications processing, obtains the standardization address participle of multiple levels of the standard marks;According to default
The standardization address participle that algorithm marks the multiple levels of the standard carries out computing according to default burst key value, according to fortune
Calculate result and determine burst address database, wherein, the normal addresses of multiple levels of the standard is included in the burst address database
Node;Save the normal address that the standardization address that the multiple levels of the standard mark is distributed to the multiple levels of the standard
The normal address node of target criteria rank in point enters row address comparison, and obtains comparison result.
The distributed address comparison method of the embodiment of the present invention, treat alignment site and carry out address specifications processing, obtain
The standardization address participle of multiple levels of the standard marks, the standardization address that multiple levels of the standard are marked according to default algorithm
Participle carries out computing according to default burst key value, burst address database is determined according to operation result, by multiple standard level
The standardization address not marked is distributed to the standard of the target criteria rank in the normal address node of multiple levels of the standard
Addressed nodes enter row address comparison, and obtain comparison result.Thus, efficiently solve and plunged into the commercial sea in address table up to nonstandard situation
The performance issue that address date compares is measured, while conversion effect will not be brought to lose, improves address comparison efficiency.
In addition, distributed address comparison method according to the above embodiment of the present invention, can also have skill additional as follows
Art feature:
Alternatively, address specifications processing is carried out in the alignment site for the treatment of, obtains the rule of multiple levels of the standard marks
Before the participle of generalized address, in addition to:Pretreatment operation is carried out to the address to be compared, wherein, the pretreatment operation bag
Include the one or more in capital and small letter conversion, the conversion of half full-shape, preset characters cleaning.
Alternatively, the alignment site for the treatment of carries out address specifications processing, obtains the specification of multiple levels of the standard marks
Changing address participle includes:Cutting word processing is carried out to the address to be compared, obtains address participle;According to default address rank pair
The address participle is labeled;According to default address specifications strategy, the address after the mark is segmented into row address and advised
Generalized, and the address participle after the address specifications is marked with levels of the standard, to obtain the specification of multiple levels of the standard marks
Change address participle.
Alternatively, it is described according to default address specifications strategy, the address participle after the mark is subjected to address specifications
Change, and the address participle after the address specifications is marked with levels of the standard, in addition to:Before and after being segmented according to the address
Text, the address rank of supplement levels of the standard mark.
Alternatively, the multiple levels of the standard are distributed in the standardization address for marking the multiple levels of the standard
Normal address node in target criteria rank normal address node enter row address compare when, if not with the target
Address after the address specifications, then be distributed to all by the address participle of the normal address node matching of levels of the standard
Burst address database in all node be compared, obtain multiple comparison results;According to default screening strategy, in institute
State and optimal comparison result is determined in multiple comparison results.
Alternatively, it is described burst address database is determined according to operation result before, in addition to:Normal address is carried out
Address specificationsization processing, and it is stored in multiple to the normal address after standardization processing, split blade type according to default storage strategy
In burst address database.
For the above-mentioned purpose, second aspect of the present invention embodiment proposes a kind of distributed address comparison device, including:Obtain
Modulus block, address specifications processing is carried out for treating alignment site, obtain the standardization address point of multiple levels of the standard marks
Word;Computing module, the standardization address for being marked the multiple levels of the standard according to default algorithm are segmented according to default
Burst key value carries out computing;Determining module, for determining burst address database according to operation result, wherein, the burst
The normal address node of multiple levels of the standard is included in address database;Comparing module, for by the multiple levels of the standard mark
The standardization address of note is distributed to the standard of the target criteria rank in the address standard nodes of the multiple levels of the standard
Addressed nodes enter row address comparison, and obtain comparison result.
The distributed address comparison device of the embodiment of the present invention, treat alignment site and carry out address specifications processing, obtain
The standardization address participle of multiple levels of the standard marks, the standardization address that multiple levels of the standard are marked according to default algorithm
Participle carries out computing according to default burst key value, burst address database is determined according to operation result, by multiple standard level
The standardization address not marked is distributed to the standard of the target criteria rank in the normal address node of multiple levels of the standard
Addressed nodes enter row address comparison, and obtain comparison result.Thus, efficiently solve and plunged into the commercial sea in address table up to nonstandard situation
The performance issue that amount address date compares will not bring conversion effect to lose simultaneously, improve address comparison efficiency.
In addition, distributed address comparison device according to the above embodiment of the present invention, can also have skill additional as follows
Art feature:
Alternatively, described device also includes pretreatment module, for carrying out pretreatment operation to the address to be compared, its
In, the pretreatment operation includes the one or more in capital and small letter conversion, the conversion of half full-shape, preset characters cleaning.
Alternatively, the acquisition module includes:Cutting word unit, for carrying out cutting word processing to the address to be compared, obtain
Address is taken to segment;First mark unit, for being labeled according to default address rank to address participle;Second mark
Unit, for according to default address specifications strategy, the address participle after the mark to be carried out into address specifications, and with standard
Rank marks the address participle after the address specifications, is segmented with obtaining the standardization address of multiple levels of the standard marks.
Alternatively, described device also includes:Memory module, for carrying out address specifications processing, and root to normal address
According to default storage strategy to the normal address after standardization processing, split blade type is stored in multiple burst address databases.
To reach above-mentioned purpose, third aspect present invention embodiment proposes a kind of computer equipment, including memory, place
Reason device and storage on a memory and the computer program that can run on a processor, during the computing device described program,
The method described in first aspect embodiment is realized, methods described includes:Treat alignment site and carry out address specifications processing, obtain
The standardization address participle of multiple levels of the standard marks;The standardization for being marked the multiple levels of the standard according to default algorithm
Address participle carries out computing according to default burst key value, and burst address database is determined according to operation result, wherein, described point
The normal address node of multiple levels of the standard is included in piece address database;The standardization that the multiple levels of the standard are marked
The normal address node that location is distributed to the target criteria rank in the normal address node of the multiple levels of the standard is carried out
Address compares, and obtains comparison result.
To reach above-mentioned purpose, fourth aspect present invention embodiment proposes a kind of non-transitory computer-readable storage medium
Matter, computer program is stored thereon with, the method as described in first aspect embodiment, institute are realized when the program is executed by processor
The method of stating includes:Treat alignment site and carry out address specifications processing, obtain the standardization address point of multiple levels of the standard marks
Word;The standardization address participle for being marked the multiple levels of the standard according to default algorithm is carried out according to default burst key value
Computing, burst address database is determined according to operation result, wherein, multiple levels of the standard are included in the burst address database
Normal address node;The standardization address that the multiple levels of the standard mark is distributed to the multiple levels of the standard
The normal address node of target criteria rank in the node of normal address enters row address comparison, and obtains comparison result.
To reach above-mentioned purpose, fifth aspect present invention embodiment proposes a kind of computer program product, when the meter
When instruction in calculation machine program product is by computing device, the method as described in first aspect embodiment, methods described bag are performed
Include:Treat alignment site and carry out address specifications processing, obtain the standardization address participle of multiple levels of the standard marks;According to pre-
If the algorithm standardization address participle that marks the multiple levels of the standard carry out computing according to default burst key value, according to
Operation result determines burst address database, wherein, the study plots of multiple levels of the standard is included in the burst address database
Location node;The standardization address that the multiple levels of the standard mark is distributed to the normal address of the multiple levels of the standard
The normal address node of target criteria rank in node enters row address comparison, and obtains comparison result.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments
Substantially and it is readily appreciated that, wherein:
Fig. 1 is the flow chart of distributed address comparison method according to an embodiment of the invention;
Fig. 2 is the flow chart of distributed address comparison method in accordance with another embodiment of the present invention;
Fig. 3 is the flow chart according to the distributed address comparison method of a specific embodiment of the invention;
Fig. 4 is the structural representation of distributed address comparison device according to an embodiment of the invention;
Fig. 5 is the structural representation of distributed address comparison device in accordance with another embodiment of the present invention;
Fig. 6 is the structural representation according to the distributed address comparison device of another embodiment of the invention;And
Fig. 7 is the structural representation according to the distributed address comparison device of a still further embodiment of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the distributed address comparison method and device of the embodiment of the present invention are described.
Fig. 1 is the flow chart of distributed address comparison method according to an embodiment of the invention, as shown in figure 1, the party
Method includes:
Step 101, treat alignment site and carry out address specifications processing, with obtaining the standardization of multiple levels of the standard marks
Location segments.
It is appreciated that under many application scenarios, the address to be compared of user's input of acquisition is complicated and changeable, causes to obtain
Address to be compared and inadequate specification, such as, it may appear that similar " Beijing Zhichun Road " and " Zhichun Road, Haidian District, Beijing City 180 "
This expression is lack of standardization, and the address table for causing directly to compare difficulty reaches, therefore, it is necessary to treating alignment site carries out districts and cities' standardization
Processing.
Specifically, treat alignment site and carry out address specifications processing, with obtaining the standardization of multiple levels of the standard marks
Location segments, and address participle is made standby.
It should be noted that according to the difference of concrete application scene, multiple marks can be got using different processing modes
The standardization adress analysis of quasi- rank mark:
As a kind of possible implementation, as shown in Fig. 2 the step 101 further comprises:
Step 201, treat alignment site and carry out cutting word processing, obtain address participle.
Specifically, alignment site is treated according to correlation between the part of speech of address to be compared, word etc. to carry out at cutting word
Reason, address participle is obtained, it is of course also possible to training in advance nerve outer network model etc., the model is inputted by address to be compared, with
Address corresponding to acquisition segments.
Step 202, according to default address rank, location participle is labeled over the ground.
Specifically, address rank, such as " province " rank, " city " rank, " county " rank etc. are pre-set, it is default according to this
Address rank, adress analysis is tentatively marked, the mark tentatively clearly goes out the address properties of adress analysis.
Step 203, according to default address specifications strategy, the address after mark is segmented and carries out address specifications, and with
Address participle after levels of the standard mark address specifications, segmented with obtaining the standardization address of multiple levels of the standard marks.
Specifically, due to the statement of address to be compared and nonstandard, therefore, the address based on the address acquisition to be compared point
Analysis may be also and nonstandard, therefore to according to default address specifications strategy, the address participle after mark is carried out into address specifications
Change.
Such as by address segment " Beijing ", " Haidian " carry out address specifications be processed into " Beijing ", " Haidian District ", enter
And compared for the ease of further entering row address, it will be segmented with the address after levels of the standard mark address specifications, it is more to obtain
The standardization address participle of individual levels of the standard mark, for example above-mentioned " Beijing ", " Haidian District " are labeled as in { " Beijing ":
" province " }, { " Haidian District ":" area " }.
Again for example, annotation results are " Heilungkiang [province], Suihua [city], northern woods [area], central street 180 [other] ",
It is after standardization " Heilongjiang Province [province], Suihua City [city], Beilin District [area], central street 180 [other] ".
It should be noted that levels of the standard can be upgraded in time and supplemented in this example, or, the levels of the standard
In may also include the levels of the standard of supplement, the address that the levels of the standard of the supplement can be directed to some specific properties is segmented into rower
Note, to realize that it is comprehensive that the address after address specifications are marked with levels of the standard segments.
Specifically, in one embodiment of the invention, the context segmented according to address, supplement levels of the standard mark
Address rank, after turning to " Beijing " to " Beijing " specification, can add that upper " city has jurisdiction over before " Haidian District " according to context specification
Area " supplements levels of the standard.
As alternatively possible implementation, the comparison library for including multiple normal addresses participle is pre-set, will wait to compare
The comparison library is inputted to address, if with some normal address analysis in certain in alignment site several coherent words and comparison library
Matching degree highest, then using several coherent words as address segment.
For example compared for address to be compared " Beijing Haidian ", " Beijing " with the matching degree of " Beijing " in comparison library
The matching degree highests such as " north ", " Bei Jinghai ", " Beijing " is thus subjected to cutting as address participle, based on same principle, " sea
Form sediment " other combinations are compared with the matching degree of " Haidian District " in comparison library, matching degree highest, therefore, " Haidian " is used as address
Analysis carries out cutting.
It should be noted that due under application scenes, in the address to be compared of acquisition except express it is lack of standardization with
Outside, also thus, in order to further improve comparison efficiency, can also be treated comprising some incoherent noise elements before comparison
Alignment site carries out dry processing.
As a kind of possible implementation, treat alignment site and carry out pretreatment operation, wherein, pretreatment operation includes
One or more in capital and small letter conversion, the conversion of half full-shape, preset characters cleaning.Wherein, preset characters can be known
Other spcial character, such as " * " etc..
In the present embodiment, by should the letter of small letter be changed into small letter, the letter that should capitalize is changed into capitalizing, should
The half-angle character of half-angle is converted to full-shape etc..
Step 102, the standardization address marked multiple levels of the standard according to default algorithm is segmented according to default key
Value carries out computing, and burst address database is determined according to operation result, wherein, multiple standard level are included in burst address database
Other normal address node.
It is appreciated that address specifications processing is carried out for all normal addresses in advance, and according to default storage plan
Slightly to the normal address after standardization processing, distributed storage is in multiple burst address databases, thus, by substantial amounts of standard
Address carries out distributed storage, substantially increases comparison efficiency.
Wherein, default storage strategy may include the address division rank of distributed storage, such as, if according to province
Location draws rank and carries out distributed storage, then hash algorithm etc. can be used to take number of all normal addresses based on province
The computing mode such as mould, realize using the normal address of each province as a burst address database, such as, if according to city address
Rank carries out distributed storage, then hash algorithm etc. can be used to carry out the fortune such as modulus to number of all normal addresses based on city
Calculation mode, realize using the normal address in each city as a burst address database, it should be appreciated that address divides rank
Lower, the burst address database of division is more, and comparison efficiency is higher.
Wherein, the normal address node of multiple levels of the standard is included in each burst address database, can be by highest level
Node index is set, or, using the node of other any specifics as indexing, wherein, the index and default burst key
Value is corresponding.
And then according to default algorithm, such as hash algorithm, the address of multiple levels of the standard is segmented according to default burst
Key value carries out computing, and burst address database is determined according to budget result, wherein, default mathematical algorithm and default burst
Key value is corresponding, you can and it is corresponding with the setting of the index based on burst address database, such as, the highest of burst address database
The node of rank is " province ", sets and indexes for the node, then according to default algorithm, based on " province " address rank, to multiple
The address participle of levels of the standard carries out computing, identifies the province belonging to it, with being distributed to burst corresponding to corresponding province
Location database is compared.
Wherein, the storage mode of the mark address in burst address database can be tree-like storage or enter one
The multilevel distributed storage of step.
For example, by normal address according to using city as burst key value, splitting data into 10 parts, and then after standardizing
10 modulus are pressed after carrying out hash by city in address to be compared, obtain burst address database corresponding to every address to be compared, will
Address date after burst is deployed on corresponding machine according to specified configuration, and starts address base loading, concordance program.
Step 103, the standardization address that multiple levels of the standard mark is distributed to the study plot of multiple levels of the standard
The normal address node of target criteria rank in the node of location enters row address comparison, and obtains comparison result.
Specifically, it is determined that in burst address database, the standardization address that multiple levels of the standard are marked segments distribution
The normal address node of target criteria rank into the normal address node of multiple levels of the standard enters row address comparison, and obtains
Comparison result.
That is, in actual mechanical process, the normal address section of the indexing criterion rank in burst address database
Point, it can be used only for determining which burst address database address to be compared belongs to, can be with base in specific be compared
(the normal address node of target criteria rank is highest level to the addressed nodes of highest level in burst address database
Addressed nodes) start to compare, the normal address node for other target criteria ranks that can also be directly distributed to specify is compared
It is right.
Certainly, in practical implementation, it is possible to which the standardization address for marking multiple levels of the standard is distributed to
The normal address node of target criteria rank in the normal address node of multiple levels of the standard enter row address compare when, not with
Address after address specifications, then be distributed to all by the address participle of the normal address node matching of target criteria rank
Burst address database in all node be compared, multiple comparison results are obtained, according to default screening strategy, more
Optimal comparison result is determined in individual comparison result.
Wherein, default screening strategy can be selection matching degree highest comparison result etc..
Certainly, in specific implementation process, in order to improve comparison efficiency, if the not study plot with target criteria rank
The address participle of location node matching, then can also continue in next node corresponding with the normal address node of target criteria rank
Matching, until have matched the node of predetermined number, still without matching, then the address after address specifications is distributed to institute
All nodes are compared in some burst address databases.
For example, when the five-star addressed nodes of the normal address node group of target criteria rank, for standardization
The addressed nodes of three-level before standard, such as western osculum Lu Dongsheng Technology Parks are not matched to, result is { " western osculum after standardization
Road ":" road " }, { " east rises Technology Park ":" cell " }, this will be calculated on address distribution to all addressed nodes, chosen after collecting
Optimal result.
In order that those skilled in the art, have to the distributed address comparison method of the embodiment of the present invention more clear
The understanding of Chu, is illustrated with reference to specific embodiment:
In this example, address to be compared is " 5 in the western osculum Lu Shihua dragons more two in Haidian District, Beijing City " burst address
Database carries out burst according to " city " address rank.
As shown in figure 3, prerinse, pretreatment, including capital and small letter, half full-shape are carried out to the normal address in the storehouse of normal address
Conversion, spcial character cleaning etc., and then, cutting word, mark, standardization processing are carried out to normal address, and then, address is divided
Piece forms burst address database, and the normal address Node distribution formula of multiple levels of the standard is included in burst address database
Establish index.
Located in advance " 5 in the western osculum Lu Shihua dragons more two in Haidian District, Beijing City " address to be compared for obtaining user's input
Reason, " 5 in the western osculum Lu Shihua dragons more two of Haidian District Beijing " cutting word generates in " Beijing ", " Haidian District ", " Xi little Kou Lu ",
" in generation Hua Longyue bis- ", " 5 " address participle.
Address rank mark is carried out to result after participle, annotation results are { " Beijing ":" province " }, { " Haidian District ":" area " },
{ " Xi little Kou Lu ":" road " }, { " in generation Hua Longyue bis- ":" cell " }, { " 5 ":" building number " }, further, enter row address rule
Generalized, address is changed into { " Beijing " after mark:" province " }, { " districts under city administration ", " city " }.{ " Haidian District ":" area " }, { " western osculum
Road ":" road " }, { " in generation Hua Longyue bis- ":" cell " }, { " 5 ":" building number " }, wherein, " Beijing can be turned to " Beijing " specification
City ", can be according to the address rank that " districts under city administration " supplement levels of the standard mark in the addition of context specification before Haidian District.
Further, input address is distributed to specified node according to " city " level, to " districts under city administration " carry out hash after according to
Nodes modulus, obtains corresponding burst address database, and the standardization address that multiple levels of the standard are marked is distributed to point
In piece address database, the normal address node of the target criteria rank in the address standard nodes of multiple levels of the standard carries out ground
Location compares, and obtains comparison result.
Thus, the distributed address comparison method of the embodiment of the present invention, can be effective by the way of to address library searching
Reduction address compare number, the mode of distributed storage+Distributed Calculation, address is standardized, according to fixed rank
Address distribution is carried out, the performance issue of magnanimity address date comparison is efficiently solved while conversion effect will not be brought to lose, can
To meet the address conversion gis demands of daily hundred million rank.
In summary, the distributed address comparison method of the embodiment of the present invention, treat alignment site and carry out address specifications
Processing, the standardization address participle of multiple levels of the standard marks is obtained, is marked multiple levels of the standard according to default algorithm
The address participle that standardizes carries out computing according to default burst key value, and burst address database is determined according to operation result, will
The standardization address of multiple levels of the standard marks is distributed to the target criteria in the normal address node of multiple levels of the standard
The normal address node of rank enters row address comparison, and obtains comparison result.Thus, efficiently solve in address table up to lack of standardization
In the case of the performance issue that compares of magnanimity address date, while conversion effect will not be brought to lose, improve address and compare effect
Rate.
In order to realize above-described embodiment, the invention also provides a kind of distributed address comparison device, Fig. 4 is according to this hair
The structural representation of the distributed address comparison device of bright one embodiment, as shown in figure 4, the distributed address comparison device bag
Include:Acquisition module 100, computing module 200, determining module 300 and comparing module 400.
Wherein, acquisition module 100, address specifications processing is carried out for treating alignment site, obtains multiple levels of the standard
The standardization address participle of mark.
It should be noted that according to the difference of concrete application scene, acquisition module 100 can obtain more in different ways
The standardization address participle of individual levels of the standard mark, as a kind of possible embodiment, as shown in figure 5, acquisition module 100 includes
Cutting word unit 110, first marks unit 120, second and marks unit 130.
Wherein, cutting word unit 110, cutting word processing is carried out for treating alignment site, obtains address participle.
First mark unit 120, for location participle to be labeled over the ground according to default address rank.
Second mark unit 130, for according to default address specifications strategy, the address after mark to be segmented into row address
Standardization, and the address participle after address specifications is marked with levels of the standard, to obtain the standardization of multiple levels of the standard marks
Address segments.
In one embodiment of the invention, as shown in fig. 6, on the basis of as shown in Figure 4, the distributed address compares
Device also includes pretreatment module 500, and pretreatment operation is carried out for treating alignment site, wherein, pretreatment operation includes big
One or more in small letter conversion, the conversion of half full-shape, preset characters cleaning.
Computing module 200, for the standardization address participle that is marked multiple levels of the standard according to default algorithm according to
Default burst key value carries out computing.
Determining module 300, for determining burst address database according to operation result, wherein, in burst address database
Normal address node comprising multiple levels of the standard.
It is appreciated that burst address database is pre-established, as shown in fig. 7, on the basis of as shown in Figure 4, the distribution
Formula address comparison device also includes memory module 600, for carrying out address specifications processing to normal address, and according to default
To the normal address after standardization processing, split blade type is stored in multiple burst address databases storage strategy.
Comparing module 400, the standardization address for multiple levels of the standard to be marked are distributed to multiple levels of the standard
Normal address node in the normal address node of target criteria rank enter row address comparison, and obtain comparison result.
It should be noted that the foregoing explanation to distributed address comparison method, is also applied for the embodiment of the present invention
Distributed address comparison device, its realization principle is similar, will not be repeated here.
In summary, the distributed address comparison device of the embodiment of the present invention, treat alignment site and carry out address specifications
Processing, the standardization address participle of multiple levels of the standard marks is obtained, is marked multiple levels of the standard according to default algorithm
The address participle that standardizes carries out computing according to default burst key value, determines burst address database according to operation result, will be more
The standardization address of individual levels of the standard mark is distributed to the target criteria level in the normal address node of multiple levels of the standard
Other normal address node enters row address comparison, and obtains comparison result.Thus, efficiently solve in address table up to nonstandard
In the case of the performance issue that compares of magnanimity address date, while conversion effect will not be brought to lose, improve address comparison efficiency.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or the spy for combining the embodiment or example description
Point is contained at least one embodiment or example of the present invention.In this manual, to the schematic representation of above-mentioned term not
Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office
Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area
Art personnel can be tied the different embodiments or example and the feature of different embodiments or example described in this specification
Close and combine.
In addition, term " first ", " second " are only used for describing purpose, and it is not intended that instruction or hint relative importance
Or the implicit quantity for indicating indicated technical characteristic.Thus, define " first ", the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the invention, " multiple " are meant that at least two, such as two, three
It is individual etc., unless otherwise specifically defined.
Any process or method described otherwise above description in flow chart or herein is construed as, and represents to include
Module, fragment or the portion of the code of the executable instruction of one or more the step of being used to realize custom logic function or process
Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable
Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system including the system of processor or other can be held from instruction
The system of row system, device or equipment instruction fetch and execute instruction) use, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicate, propagate or pass
Defeated program is for instruction execution system, device or equipment or the dress used with reference to these instruction execution systems, device or equipment
Put.The more specifically example (non-exhaustive list) of computer-readable medium includes following:Electricity with one or more wiring
Connecting portion (electronic installation), portable computer diskette box (magnetic device), random access memory (RAM), read-only storage
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device, and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium, which can even is that, to print the paper of described program thereon or other are suitable
Medium, because can then enter edlin, interpretation or if necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned
In embodiment, software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage
Or firmware is realized.Such as, if realized with hardware with another embodiment, following skill well known in the art can be used
Any one of art or their combination are realized:With the logic gates for realizing logic function to data-signal from
Logic circuit is dissipated, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene can compile
Journey gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method carries
Suddenly it is that by program the hardware of correlation can be instructed to complete, described program can be stored in a kind of computer-readable storage medium
In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can also
That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould
Block can both be realized in the form of hardware, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized in the form of software function module and as independent production marketing or in use, can also be stored in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..Although have been shown and retouch above
Embodiments of the invention are stated, it is to be understood that above-described embodiment is exemplary, it is impossible to be interpreted as the limit to the present invention
System, one of ordinary skill in the art can be changed to above-described embodiment, change, replace and become within the scope of the invention
Type.
Claims (10)
1. a kind of distributed address comparison method, it is characterised in that comprise the following steps:
Treat alignment site and carry out address specifications processing, obtain the standardization address participle of multiple levels of the standard marks;
The standardization address that the multiple levels of the standard mark is segmented to enter according to default burst key value according to default algorithm
Row computing, burst address database is determined according to operation result, wherein, multiple standard level are included in the burst address database
Other normal address node;
Save the normal address that the standardization address that the multiple levels of the standard mark is distributed to the multiple levels of the standard
The normal address node of target criteria rank in point enters row address comparison, and obtains comparison result.
2. the method as described in claim 1, it is characterised in that address specifications processing is carried out in the alignment site for the treatment of,
Before the standardization address participle for obtaining multiple levels of the standard marks, in addition to:
Pretreatment operation is carried out to the address to be compared, wherein, the pretreatment operation includes capital and small letter conversion, half full-shape turns
Change, the one or more in preset characters cleaning.
3. the method as described in claim 1, it is characterised in that the alignment site for the treatment of carries out address specifications processing, obtains
Taking the standardization address participle of multiple levels of the standard marks includes:
Cutting word processing is carried out to the address to be compared, obtains address participle;
Address participle is labeled according to default address rank;
According to default address specifications strategy, the address participle after the mark is subjected to address specifications, and with levels of the standard
The participle of the address after the address specifications is marked, is segmented with obtaining the standardization address of multiple levels of the standard marks.
4. method as claimed in claim 3, it is characterised in that it is described according to default address specifications strategy, by the mark
Address participle afterwards carries out address specifications, and the address participle after the address specifications is marked with levels of the standard, in addition to:
The context segmented according to the address, the address rank of supplement levels of the standard mark.
5. the method as described in claim 1, it is characterised in that in the standardization address for marking the multiple levels of the standard point
The normal address node for the target criteria rank that word is distributed in the normal address node of the multiple levels of the standard enters row address
During comparison,
If the address with the normal address node matching of the target criteria rank does not segment, by the address specifications
Address afterwards is distributed to node all in all burst address databases and is compared, and obtains multiple comparison results;
According to default screening strategy, optimal comparison result is determined in the multiple comparison result.
6. the method as described in claim 1, it is characterised in that it is described according to operation result determine burst address database it
Before, in addition to:
Address specifications processing is carried out to normal address, and according to default storage strategy to the study plot after standardization processing
Location, split blade type are stored in multiple burst address databases.
A kind of 7. distributed address comparison device, it is characterised in that including:
Acquisition module, address specifications processing is carried out for treating alignment site, obtain the standardization of multiple levels of the standard marks
Address segments;
Computing module, for being segmented the standardization address of the multiple levels of the standard standard according to default according to default algorithm
Burst key value carries out computing;
Determining module, for determining burst address database according to operation result, wherein, included in the burst address database
The normal address node of multiple levels of the standard;
Comparing module, the standardization address for the multiple levels of the standard to be marked are distributed to the multiple levels of the standard
Normal address node in the normal address node of target criteria rank enter row address comparison, and obtain comparison result.
8. device as claimed in claim 7, it is characterised in that also include:
Pretreatment module, for carrying out pretreatment operation to the address to be compared, wherein, the pretreatment operation includes size
Write the one or more in conversion, the conversion of half full-shape, preset characters cleaning.
9. device as claimed in claim 7, it is characterised in that the acquisition module includes:
Cutting word unit, for carrying out cutting word processing to the address to be compared, obtain address participle;
First mark unit, for being labeled according to default address rank to address participle;
Second mark unit, for according to default address specifications strategy, the address after the mark being segmented into row address and advised
Generalized, and the address participle after the address specifications is marked with levels of the standard, to obtain the specification of multiple levels of the standard marks
Change address participle.
10. device as claimed in claim 7, it is characterised in that also include:
Memory module, for normal address carry out address specifications processing, and according to default storage strategy to standardization at
Normal address after reason, split blade type are stored in multiple burst address databases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710709020.XA CN107491525A (en) | 2017-08-17 | 2017-08-17 | Distributed address comparison method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710709020.XA CN107491525A (en) | 2017-08-17 | 2017-08-17 | Distributed address comparison method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107491525A true CN107491525A (en) | 2017-12-19 |
Family
ID=60646511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710709020.XA Withdrawn CN107491525A (en) | 2017-08-17 | 2017-08-17 | Distributed address comparison method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107491525A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008625A (en) * | 2019-12-06 | 2020-04-14 | 中国建设银行股份有限公司 | Address correction method, device, equipment and storage medium |
CN111177719A (en) * | 2019-08-13 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Address category determination method, device, computer-readable storage medium and equipment |
CN111414357A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Address data processing method, device, system and storage medium |
CN112287671A (en) * | 2020-09-29 | 2021-01-29 | 深圳市跨越新科技有限公司 | Simhash-based address resolution method and system |
CN112925922A (en) * | 2019-12-06 | 2021-06-08 | 农业农村部信息中心 | Method, device, electronic equipment and medium for obtaining address |
CN114970518A (en) * | 2022-02-15 | 2022-08-30 | 北京青萌数海科技有限公司 | Method and device for correcting address data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008073502A2 (en) * | 2006-12-11 | 2008-06-19 | Google Inc. | Viewport-relative scoring for location search queries |
CN101350013A (en) * | 2007-07-18 | 2009-01-21 | 北京灵图软件技术有限公司 | Method and system for searching geographical information |
CN104199860A (en) * | 2014-08-15 | 2014-12-10 | 浙江大学 | Dataset fragmentation method based on two-dimensional geographic position information |
-
2017
- 2017-08-17 CN CN201710709020.XA patent/CN107491525A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008073502A2 (en) * | 2006-12-11 | 2008-06-19 | Google Inc. | Viewport-relative scoring for location search queries |
CN101350013A (en) * | 2007-07-18 | 2009-01-21 | 北京灵图软件技术有限公司 | Method and system for searching geographical information |
CN104199860A (en) * | 2014-08-15 | 2014-12-10 | 浙江大学 | Dataset fragmentation method based on two-dimensional geographic position information |
Non-Patent Citations (2)
Title |
---|
杨宗亮: "基于P2P的地理空间信息服务的架构及相关算法研究", 《中国博士学位论文全文数据库》 * |
洪莹: "城市地名地址匹配方法研究与实验", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111414357A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Address data processing method, device, system and storage medium |
CN111177719A (en) * | 2019-08-13 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Address category determination method, device, computer-readable storage medium and equipment |
CN111008625A (en) * | 2019-12-06 | 2020-04-14 | 中国建设银行股份有限公司 | Address correction method, device, equipment and storage medium |
CN112925922A (en) * | 2019-12-06 | 2021-06-08 | 农业农村部信息中心 | Method, device, electronic equipment and medium for obtaining address |
CN112287671A (en) * | 2020-09-29 | 2021-01-29 | 深圳市跨越新科技有限公司 | Simhash-based address resolution method and system |
CN114970518A (en) * | 2022-02-15 | 2022-08-30 | 北京青萌数海科技有限公司 | Method and device for correcting address data |
CN114970518B (en) * | 2022-02-15 | 2022-12-16 | 北京青萌数海科技有限公司 | Method and device for correcting address data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107491525A (en) | Distributed address comparison method and device | |
CN106776523B (en) | Artificial intelligence-based news quick report generation method and device | |
CN106874279A (en) | Generate the method and device of applicating category label | |
CN107315772A (en) | The problem of based on deep learning matching process and device | |
CN106844658A (en) | A kind of Chinese text knowledge mapping method for auto constructing and system | |
CN109885824A (en) | A kind of Chinese name entity recognition method, device and the readable storage medium storing program for executing of level | |
CN109800298A (en) | A kind of training method of Chinese word segmentation model neural network based | |
CN108334528B (en) | Information recommendation method and device | |
EP3940582A1 (en) | Method for disambiguating between authors with same name on basis of network representation and semantic representation | |
CN103559193B (en) | A kind of based on the theme modeling method selecting unit | |
CN106897262A (en) | A kind of file classification method and device and treating method and apparatus | |
CN109065173B (en) | Knowledge path acquisition method | |
CN110276023A (en) | POI changes event discovery method, apparatus, calculates equipment and medium | |
CN109933686A (en) | Song Tag Estimation method, apparatus, server and storage medium | |
CN109408821A (en) | A kind of corpus generation method, calculates equipment and storage medium at device | |
CN102722556A (en) | Model comparison method based on similarity measurement | |
CN107122492A (en) | Lyric generation method and device based on picture content | |
CN107273883A (en) | Decision-tree model training method, determine data attribute method and device in OCR result | |
CN107608981B (en) | Character matching method and system based on regular expression | |
CN102169591A (en) | Line selecting method and drawing method of text note in drawing | |
CN108108346A (en) | The theme feature word abstracting method and device of document | |
CN114911949A (en) | Course knowledge graph construction method and system | |
CN106844508A (en) | deformation word recognition method and device | |
CN106844743B (en) | Emotion classification method and device for Uygur language text | |
CN108898439A (en) | A kind of information recommendation method based on sight spot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20190903 Address after: 100192 Dongsheng Science Park, Zhongguancun, 66 Xixiaokou Road, Haidian District, Beijing Applicant after: Green Bay Network Technology Co., Ltd. Address before: 100089 Beijing Haidian District Xixiaokou Road 66 Zhongguancun Dongsheng Science Park B-6 Building B 5 floors Applicant before: Grass count language (Beijing) Technology Co., Ltd. |
|
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20171219 |