CN1879104A - Data structure and management system for a superset of relational databases - Google Patents

Data structure and management system for a superset of relational databases Download PDF

Info

Publication number
CN1879104A
CN1879104A CNA2003801108259A CN200380110825A CN1879104A CN 1879104 A CN1879104 A CN 1879104A CN A2003801108259 A CNA2003801108259 A CN A2003801108259A CN 200380110825 A CN200380110825 A CN 200380110825A CN 1879104 A CN1879104 A CN 1879104A
Authority
CN
China
Prior art keywords
data
preferred
database
another name
workpiece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2003801108259A
Other languages
Chinese (zh)
Other versions
CN100421107C (en
Inventor
蒂莫西·C·欧文斯
布鲁斯·E·哈里森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
United Parcel Service of America Inc
Original Assignee
United Parcel Service of America Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by United Parcel Service of America Inc filed Critical United Parcel Service of America Inc
Publication of CN1879104A publication Critical patent/CN1879104A/en
Application granted granted Critical
Publication of CN100421107C publication Critical patent/CN100421107C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data structure, database management system, and methods of validating data are disclosed. A data structure is described that includes a superset of interconnected relational databases containing multiple tables having a common data structure. The tables may be stored as a sparse matrix linked list. A method is disclosed for ordering records in hierarchical order, in a series of levels from general to specific. An example use with address databases is described, including a method for converting an input address having a subject representation into an output address having a preferred representation. Preferred artifacts may be marked with a token. Alias tables may be included. This abstract is provided to comply with the rules, which require an abstract to quickly inform a searcher or other reader about the subject matter of the application. This abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

Description

The data structure and the management system that are used for the superset of relational database
Technical field
Following discloses relate generally to relational database (relational database) management system relates in particular to and is used for utilizing sparse matrix linked list to handle the method and apparatus of the integrated data on a plurality of relational databases in computer network environment.
Background technology
Since digital Age began, database just was the theme that calculates always.Database generally is meant one or more large scale structure set of continuation data, and it is associated with software systems usually, with establishment, renewal and data query.In database, each data value is stored in the field (field); Set of fields forms record (record) together; Record group can be stored in the file (file) together.
Initial database is plane (flat); Its connotation is that all data all are stored in the single file text that is called as demarcation file (delimited file).In delimiting file, each field is separated by special character, and described special character for example is a comma.Each record is separated by different characters, and described different character for example is after-teeming number (^) or tabulation (tab) character.The simple file of delimiting seems it may is such:
Last,First,Age^Doe,John,26^Smith,Jane,43^Jones,David,34
Each field can be assigned with to be called as the title or the classification of attribute (attribute).In above exemplary file, attribute is Last, First and Age.The attribute indication will be stored in the type of the data in each field.For mass data, delimit body of an instrument and may develop into very long.Access particular data generally requires the whole tabulation of sequential search.Along with the increase of the capacity of computing machine and database, to visit more efficiently and more rapidly the demand of search technique caused the exploitation of new data structure.
Relational model of database is early stage description seventies in 20th century.In relational database, data are stored in the form (table).Form organizes data into row (row) and row (column), thereby provides ad-hoc location (for example go x, be listed as y) for each field.Every row comprises single record.Row are arranged in order according to attribute, and therefore all fields in every row comprise the data of same type.More than delimit file and can be expressed as table format, as follows:
The name age
Doe John 26
Smith Jane 43
Jones David 34
The set of attribute or column heading is called as the pattern (schema) of form sometimes.For example, above form can be described to the to have pattern form at (surname, name, age).
Feasible search of the table format of database file and visit data are rapider and more efficient.Also can based on row in (field) any one or a plurality ofly will write down (OK) and be classified into new sequences.Classification often is used to record is sorted, so that the data that need most more early occur hereof, thereby makes that search is rapider.
Along with the increase of computing velocity and capacity, database table can be stored more substantial data.Can add additional record (OK) and describe additional example.Can add additional attribute (row) and adapt to more eurypalynous data about each example.Along with the field number increases, the task of change tableau format (adding or the deletion row and column) becomes more complicated, and has increased the possibility of mistake.In addition, for large-scale form, the task of data being classified based on one or more row becomes more complicated and consuming time.Add in the single large-scale two-dimentional form data of different types to the final problem that produces such as redundancy, inconsistency, memory requirement increase and classification and computing velocity are slack-off.
Relational database with a plurality of formsIn order to adapt to the dissimilar field that comprises related data, relational model of database can comprise a plurality of forms.The a plurality of forms that comprise related data can utilize key field to be linked at together.Key field comprises the unique identifier of each record (or data line).Key field can comprise real data, and for example part number or social security number are as long as it is unique for this record.This is called as logic keys sometimes.Key field can also be to replace key word, for example writes down number, and it is not relevant with real data unique identifier.In addition, key word can be with single field or the incompatible definition of sets of fields.Simple key word is based on single field, and composite key is based on a plurality of fields.
In relational database, related data can be stored in a plurality of forms.The key field that is called as " primary key (primary key) " is served as the unique reference point that is used for searching at form specific record.For example, the attribute (or column heading) in the example " form A " can be (title, age, social security number, employee's number).The primary key of form A is social security number's field.
Data are stored in the relational database in a plurality of forms therein, and another is called as " foreign key (foreign key) " key field and is used to connect the reference point of form.For example, consider another example table " form B ", it has pattern (employee's number, department's number, employ date, salary).The primary key of form B is unique employee number code field.Return the attribute among the reference table A, the foreign key of form A is the employee number code field, and this is because it is linked the record among record among the form A and the form B.This relation between the form can utilize entity relationship scheme (Entity Relationship Diagram) to illustrate, and wherein each form comprises the data of sole entity or classification, for example " age " or " department ".
Relational database
Form A (age)+Name+Age+SSN+EmployeeNr Form B (department)+EmployeeNr+DepartmentName+HireDate+Salary
" EmployeeNr " that shares is that two forms are common, and it provides two links between the data in the form." EmployeeNr " field is the foreign key among the form A, but it is the primary key among the form B.
Form A and form B do not need to comprise same number of record.For example, the record among the form A can comprise everyone title, age, social security number and the employee's number in the tissue; Record among the form B can be limited to those in particular department just or the branch.
By discrete data acquisition is included in the form that is separated, relational database can be visited selected form for multiple purpose.Single relational database can comprise the form of any number, from several forms to thousands of forms.
Query language allows user and database alternately and analyze data in the form.Inquiry (query) is the compiling of instruction that is used for extracting from database data acquisition.Inquiry does not change the information in the form; They just are shown to the user with information.The result of inquiry is called as view (view) sometimes.
Known best query language is Structured Query Language (SQL) (SQL), and its pronunciation is " sequel ".SQL is the standard language that is used for the database interoperability.Inquiry the chances are the most normal aspect that is used of SQL, but sql command also can be used as programming tool, to create and maintenance data base.
Data base management system (DBMS)Data base management system (DBMS) (being abbreviated as DBMS sometimes) generally is meant interface and the one or more computer software programs that are specifically designed the information in management and the manipulation data storehouse.DBMS can comprise the software program cover group of the complexity of tissue, storage and the retrieval of control data and safeness of Data Bank and integrality.DBMS can also comprise the interface that is used to accept from the requests for data of applications.
Interface is to be designed to provide the user and to connect or the computer program of interface such as the operability between the such application of DBMS.The interface of DBMS can provide a series of orders, and these orders allow the user to create, read, upgrade and delete the data value that is stored in the database table.These functions (CRUD) are mentioned with acronym CRUD that sometimes the interface that therefore has these orders can be called as the CRUD interface.The database interface that comprises query function can be called as the CRUDQ interface.
Be meant software based on the interface of COM based on The Component Object Model.The Component Object Model is the open software architecture by Digital Equipment company and Microsoft exploitation, the interoperability between the various assemblies of its permission Database Systems.
In comprising the relational database of a plurality of forms, the generally responsible all-links of safeguarding between the key field in the various forms of data base management system (DBMS) (DBMS).This is called as maintenance data base " referential integrity (referential integrity) "
Safeguard that referential integrity normally comprises the challenge in the relational database of the form of big figure very.The character of the link of relational database table has many advantages, but it also can allow mistake to propagate between form and in the entire database, especially in record or key field when being changed or deleting.For wherein various users can be by the system of CRUD interface accessing database, the possibility of mistake is compound have been changed.
In computer network environment, large database can be hosted on the central server, and wherein many users or subscriber utilize communication link from the remote location access data.Access speed is limit by the type and the capacity of communication link usually.It generally is unpractical that the duplicate of entire database is distributed to remote location, especially for wherein data must be current all the more so with regard to available application.In addition, the large database of storing in this locality will produce sizable burden to the local user, and this is because remote system is generally less than central server.The storage large database usually causes the unacceptable increase of computing time on the local system that does not have enough capacity.For the cost possibility too expensive of each remote location renewal all hardware, especially all the more so for very big user network.
The data of upgrading in the large-scale relational database may be challenging and be consuming time that data must be by all the more so in the network environment of frequent updating especially therein technically.Duplicate after the renewal of transmission entire database is normally unrealistic and with high costs.In addition, the cost of distribution and delay may become the obstacle of renewal frequency.
Thereby, needing a kind of improved data base management system (DBMS) in this area, it can safeguard and protect mass data, distributes frequent renewal in the mode of calculating, and all positions in network are handled request or data rapidly and efficiently.
Address databaseThe U.S. comprises more than 145,000, but 000 destination address.Comprise about the database of information of all these street addresses is examples of very-large database.Address database can obtain from privately owned source or from the government source, for example obtains from United States postal service (USPS).
USPS provides multiple address database to the public, comprises city-state file (City-Statefile), five ZIP files (Five-Digit ZIP file) and ZIP+4 file.City-state file is the comprehensive list with zone code of corresponding city and county's title.Five ZIP files when with city-when the state file is used in combination, allow existing five zone codes of user rs authentication to distribute.The ZIP+4 file provides the comprehensive list of ZIP+4 code.
Delivering sequential file (DSF) is that it comprises complete, the standardized address of each delivery point that USPS serves by the computerized database of USPS exploitation, and these addresses are stored in the discrete record.Each independent record comprises street address, ZIP+4 code, distributes the route code, delivers serial number (walking serial number), delivers type code and the seasonal designator of delivering.DSF comprises and enough finishes address validation and standardized data.DSF is provided for the licensee of exploitation through the address of authentication health software.USPS has developed new delivery point checking (DPV) database recently, to replace DSF.The DPV database has basic format or enhanced formats, and enhanced formats is called as DSF 2, it comprises additional address properties.
The address standardizationDemand to standardization postal delivery address is modern relatively development.Early stage in nineteen sixties, the tremendous growth of mail (wherein great majority are professional mails) amount cause the knowing clearly serious crisis of mail service.Computing machine is the strength of unique maximum of supporting the sharp increase of mail amount.Computing machine has allowed enterprise to make the robotization of multiple postal delivery function, but mail service is not but got ready to the surge of mail amount.In response to this crisis, formulated regional improvement plan (ZIP).To in July, 1963, but five ZIP codes have been assigned to all destination address of the U.S..The ZIP code indicates the beginning in standardization epoch of existingization address.
After 20 years, introduced the ZIP+4 code, it has added hyphen and four extra numerals to the ZIP code.Now, normally come classification of mail with multirow optical character reader, described multirow optical character reader scans whole address, 11 delivery point bar codes (DPBC) is printed on the envelope, and with in the dish in the walking sequence of having set up of classification of mail on every delivery route.
The address standardization becomes to satisfy the best format of government's policy, for example those forms of being formulated by USPS with given address mapping.Standardization influences all the components of destination address, comprising form, font, spacing, printed words, punctuate and ZIP code or DPBC.For example, such as following non-standard address:
John Doe
123 East Main Street,N.W.
Oakland Center,Suite A-4
Atlanta,Georgia 30030
After standardization, may seem to differ widely:
JOHN DOE
123 E MAIN ST NW STE A4
DECATUR GA 30030-1549
Figure A20038011082500171
Its composition can be segmented or be resolved in the address, and these compositions are called as workpiece (artifact) sometimes.For example, the individual workpiece in the above address comprises occupant or consignee (JohnDoe), numeral (123), orientation (E), important name claim (Main), type (St), back directed (NW), inferior title (STE), inferior number (A4) and city, state and ZIP+4 code (Decatur GA 30030-1549) in advance.It all is useful in comprising many borders of postal classification and address validation that the address is divided into its individual workpiece.
Address validationThough standardization is meant the mode that the address is formatted, the process of address validation has confirmed but whether given address is effectively and is current address.Address database from privately owned or governmental sources is normally used for verifying the address.For example, above-mentioned USPS database can be used for comparison purposes, with the checking address.
Except government's mail service, usually can develop and safeguard such as such private firm of commercial package shipment company and be used to store unique and address database valuable Customer Information.The private data storehouse that is independent of government's mail service data mining can represent addressing accurately and the next generation of data storage aspect.In future, will have more kinds of governments and privately owned address database can be used.
The USPS address database is regularly upgraded with new data.Except regular, the periodic renewal, USPS has also developed a plurality of correction databases, comprising NCOA and LACS.Country's address change (NCOA) database comprises the address change record.But positioning address converting system (LACS) comprises the new address in the area that is used to experience the conversion from rural route to the urban type address.
Because Increase of population and variation, address database generally requires frequent renewal.As any other large database, it is normally challenging and be consuming time from technology to upgrade data in the ultra-large type address database.Thereby; in the border, field of address database, need a kind of improved data base management system (DBMS) in this area, it can safeguard and protect a large amount of address dates; distribute frequent renewal in the mode of calculating to user or subscriber, and handle request rapidly and efficiently address date.
Summary of the invention
Following summary of the invention is to summarize widely, and do not want recognition device, the key or the important element of method, system, process etc., or limit the scope of this element.Content of the present invention provides the concept nature medium in simplified form, with the preamble as following more detailed description.
Continuous the following description and drawings such as some illustrated examples device, method, system, process are described.These examples just representative adopt several in the variety of way of the principle of supporting these devices, method, system, process or the like, thereby want to comprise equivalent.When considering following detailed description in conjunction with the accompanying drawings, other favourable and novel characteristics will be found out obviously.
Consider broad teachings of the present invention, data structure, data base management system (DBMS), treating apparatus and correlation technique with favorable structure are provided.Exemplary means as described herein, method and system have helped prompting and efficient verification to represent the input data that provide with subjectivity, and produce and have the output data of preferred expression.
In one aspect of the invention, a kind of data structure can comprise superset, this superset comprises the major database that functionally is connected to one or more low priority datas storehouse, wherein each in major database and the one or more low priority datas storehouse comprises first form, this first form functionally is linked to one or more other forms, and each the shared common data structure in first form and one or more other forms.Database can be a relational database.The corporate data structure can comprise sparse matrix linked list.The corporate data structure can comprise data recording, and these records are based on data by from generally arranging with hierarchical sequence to concrete a series of ranks.
In data structure, major database can comprise the source form, wants database can comprise the another name form for the first time, wants database can comprise standardized tabular for the second time, and wants database can be configured to accept for the third time and storage input data.The source form can comprise the data recording that obtains from public or privately owned source, and the another name form can comprise that the one or more of record are equal to expression, and standardized tabular can comprise that one or more standardization of record represent.In aspect another of data structure, the source form can comprise the address record that obtains from government's mail service and commercial source.
In data structure, first form comprises preferred record, and first other forms can comprise main canonical name, and second other forms can comprise less important canonical name.Preferred record can comprise one or more preferred expressions, and main canonical name can comprise that the one or more of main workpiece are equal to expression, and less important canonical name can comprise less important workpiece one or morely be equal to expression.In related fields, preferred record can comprise the one or more preferred expression of address.
In another aspect of the present invention, provide a kind of method that is used to optimum search to prepare data, described data storage is in one or more databases of the record form that comprises a plurality of links.This method can comprise: based on data by from generally arranging the form record in each with hierarchical sequence to concrete a series of ranks; And in the form each is transformed into one or more sparse matrix linked list forms.When database was present in the client-server network environment, this method can also comprise that the duplicate with one or more sparse matrix linked list forms is distributed to one or more clients from server.Database can be that interconnection is to form the relational database of data superset.In one aspect, data can comprise the address workpiece.
In another aspect of the present invention, provide a kind of device that is used to optimum search to prepare data, described data storage is in one or more databases of the record form that comprises a plurality of links.This device can comprise CPU (central processing unit), storer, basic input/output and program storage device, and this program storage device comprises can be by the program module of CPU (central processing unit) execution.This program module can comprise: be used for based on data by from generally arranging the device of the record of form each with hierarchical sequence to concrete a series of ranks; And be used for form each be transformed into the device of one or more sparse matrix linked list forms.This device also comprises the one or more clients away from CPU (central processing unit).This program module can also comprise the device that is used for the duplicate of one or more sparse matrix linked list forms is distributed to from server one or more clients.
In another aspect of the present invention, provide a kind of method of using the database that links form subjectivity to be represented to convert to preferred expression.This method can comprise: seizure is subjective to be represented and it is stored in first in the link form to link in the form; Source data is stored in second in the link form to link in the form; By subjectivity being represented compare from source data the one or more candidates in location to represent with source data; Select preferred expression from one or more candidates represent, described preferred expression is the most similar to subjective expression; And deliver this and preferably represent.
This method can also comprise: check that source data comprises one or more selection records of preference data with identification; And add preferred token to one or more selection records;
Select the step of preferred expression can comprise that identification and one or more candidates one of represent the preferred token that is associated.
Locating the step that one or more candidates represent can also comprise: (a) subjectivity is represented to resolve to one or more discrete workpieces; (b) select one of one or more discrete workpieces: (1) is by comparing a discrete workpieces and source data come the one or more candidate's workpiece in location from source data; (2) select preferred workpiece from one or more candidate's workpiece, preferred workpiece is the most similar to a discrete workpieces; (3) this preferred workpiece of storage; (c) be each repeating step (b) in one or more discrete workpieces; (d) the preferred workpiece of combination is to form preferred expression.
Locating the step that one or more candidates represent can also comprise: will call in the 3rd link form of data storage in the link form; Check that the another name data comprise one or more selection canonical names that preferred another name is represented with identification; Add one or more selection canonical names to preferably calling token; By subjectivity being represented compare the one or more candidate's another names in location from the another name data with the another name data; From one or more candidates another name, select preferred another name, the tight association of described preferred another name and preferred another name token; And deliver this and preferably call and represent as the candidate.
The step of locating one or more candidate's another names also comprises: (a) subjectivity is represented to resolve to one or more discrete workpieces; (b) select one of one or more discrete workpieces: (1) the one or more candidates in location call workpiece from source data by a discrete workpieces and another name data are compared; (2) call the preferred another name of selection workpiece the workpiece from one or more candidates, preferably call workpiece and preferably call token the most related; (3) workpiece should be preferably called in storage; (c) be each repeating step (b) in one or more discrete workpieces; (d) add preferred another name to preferably calling workpiece.
In another aspect of the present invention, provide a kind of device that is used to carry out the method step of just having described.This device can comprise: CPU (central processing unit); Storer; Basic input/output; And program storage device, this program storage device comprises the program module that can be carried out by CPU (central processing unit), and wherein this program module can comprise the device of each step that is used for carrying out said method.
In another aspect of the present invention, provide a kind of method of one or more applications that be used to control to access of database.This method can comprise: establish and store a plurality of regular collections, wherein each with one or more applications in one relevant; Reception is from first request of using; Retrieval is used the first relevant regular collection with first; And use first regular collection with control first use and database between mutual.
In another aspect of the present invention, provide a kind of be used to control database in response to method from the degree of depth of the data capture of one or more applications.This method can comprise: establish and store a plurality of regular collections, wherein each with one or more applications in one relevant; In a plurality of regular collections each comprises the tabulation of the data that will catch from database; Reception is from first request of using; Retrieval is used the first relevant regular collection with first; And use first regular collection and use and to obtain data from database to limit first.
In another aspect of the present invention, provide a kind of data structure, it can comprise the database of link leading schedule and one or more less important forms, and each in the described form is shared common data structure; Described database is controlled by data base management system (DBMS), and this data base management system (DBMS) is configured to the one or more sparse matrix linked list that are transformed in leading schedule and the one or more less important form.Database can comprise the relational database of one or more interconnection.Data base management system (DBMS) can comprise interface and authentication module.Interface can be controlled one or more applications to access of database.Data base management system (DBMS) can be configured to data are represented to convert to preferred expression from subjectivity.
These and other purposes realize by disclosed device, method and system, and will be from below in conjunction with displaying the accompanying drawing detailed description of the preferred embodiment, similar mark indication similar elements in the accompanying drawing.
Description of drawings
Understand following the description in conjunction with the drawings, can more easily understand the present invention, in the accompanying drawing:
Fig. 1 is the block diagram of address superset according to an embodiment of the invention.
Fig. 2 is the block diagram of conventional data collection according to an embodiment of the invention.
Fig. 3 is the diagram of system architecture according to an embodiment of the invention.
Fig. 4 is the block diagram of stand-alone service pattern according to an embodiment of the invention.
Fig. 5 is the diagram of data form according to an embodiment of the invention.
Fig. 6 is the diagram of the value in the form according to an embodiment of the invention.
Fig. 7 is the block diagram that links according to an embodiment of the invention.
Fig. 8 is the block diagram of lists of links according to an embodiment of the invention.
Fig. 9 is the form of address date according to an embodiment of the invention.
Figure 10 is the diagram that comprises rank and node according to an embodiment of the invention.
Figure 11 has the form that makes the address date that is in according to an embodiment of the invention.
Figure 12 is the process flow diagram of matching module according to an embodiment of the invention.
Figure 13 is a form of calling data according to an embodiment of the invention.
Embodiment
With reference now to accompanying drawing,, in some views, similar mark is meant similar elements in the accompanying drawing.
1. draw opinion
The term that uses among the application " computer module " is meant the entity relevant with computing machine, no matter be hardware, firmware, software, its combination or executory software.For example, computer module can be but be not limited to be process, processor itself, object, executable program, execution thread, program, server and the computing machine that operates on the processor.For example, the application and service device itself that operates on the server can be called as computer module.One or more computer modules can reside in in-process and/or the execution thread, and computer module can be by localization on single computing machine, and/or are distributed between two or more computing machines.
Here employed " compunication " is meant the communication between two or more computer modules, and for example can be network transmission, file transmission, applet transmission, Email, hypertext transportation protocol (HTTP) message, datagram, object transmission, scale-of-two large object (BLOB) transmission or the like.Compunication for example can occur in one of wireless system (for example IEEE 802.11), Ethernet system (for example IEEE 802.3), token ring system (for example IEEE 802.5), Local Area Network, wide area network (WAN), point-to-point system, circuit switching system, packet switching system or the like.
Here employed " logic " includes but not limited to hardware, firmware, software and/or the wherein combination of each, to carry out one or more functions or action.For example, based on required application or demand, logic can comprise the microprocessor of software control, such as the such discreet logic of special IC (ASIC) or the logical device of other programmings.Logic also can be embodied as software fully.
Here employed " signal " includes but not limited to one or more electric signal or light signal, analog or digital, one or more computer instruction, bit or bit stream or the like.
Here employed " software " includes but not limited to cause computing machine, computer module and/or other electronic equipments to carry out the one or more computer-readables and/or the executable instruction of function, action and/or behavior in a desired manner.Instruction can realize in a variety of forms, for example the rules of routine, algorithm, storage, module, method, thread and/or program.Software also can be implemented as multiplely to be carried out and/or can load form, comprising but be not limited to stand-alone program, function call (local and/or long-range), servelet, applet, be stored in the part of instruction, operating system or browser in the storer, or the like.Should recognize, computer-readable and/or executable instruction can be on the computer modules and/or be distributed between two or more communications, cooperation and/or the computer module of parallel processing, thus can with serial, parallel, magnanimity is parallel and other modes are loaded and/or carry out.Those of ordinary skill in the art will appreciate that, the form of software for example can depend on the requirement of required application, environment that it moves therein and/or hope of deviser or programmer or the like.
" can be operatively connected " (or the common connection that it " is operably connected " of entity) is such connection, and in this connected, signal, physical communication stream and/or logic communication stream can be sent out and/or receive.Usually, can be operatively connected and comprise physical interface, electric interfaces and/or data-interface, but be noted that and to be operatively connected and to constitute by the various combination of the connection of these or the other types that are enough to allow to operate control.
Here employed " database " is meant physics and/or the logic entity that can store data.Database for example can be following one or more: data storage, relational database, form, field, tabulation, formation, heap or the like.Database can reside on a logic and/or the physical entity, and/or can be distributed between two or more logics and/or the physical entity.
Term " fuzzy (fuzzy) " or " dim (blurry) " are meant the superset of the Boolean logic of processing section authenticity; In other words, the true value between " true fully " and " false fully ".Any particular theory or system can be reduced continuous or fuzzy form from discrete or distinct form.System based on fuzzy logic or fuzzy matching can use the true value that has with the similar various degree of probability, and only not need summation be 1 to really degree.With regard to use fuzzy matching to alphanumeric character string with regard to, true value for example can be expressed as the number of the coupling character in the string.
System as described herein, method and object for example can be stored on the computer-readable medium.Medium can include but not limited to ASIC, CD, DVD, RAM, ROM, PROM, dish, carrier wave, memory stick or the like.Thereby computer readable media can be stored the computer executable instructions of the method that is used for managing transmission resource.This method comprises the route that calculates transfer resource based on the analysis data that go out from the propagation data library searching based on experience.This method comprises that also reception is from the real time data of transfer resource and based on real time data and the route that comprehensively upgrades transfer resource of analyzing data.
They will appreciate that it may be the electronics and/or the software application of dynamic or flexible process that the process of system and in the method some or all relate to, so that can be performed in proper order with other that are different from order as described herein.Those of ordinary skill in the art also will recognize, the element that is embodied as software can realize with various programmed methods, for example machine language, procedural technology, Object-oriented Technique and/or artificial intelligence technology.
Processing as described herein, analysis and/or other functions also can realize by the circuit that is equal on the microprocessor of digital signal processor circuit, software control or the function the special IC.The assembly that is embodied as software is not limited to any specific programming language.Or rather, the description here provides those skilled in the art can be used for making circuit or has generated computer software to carry out the information of processing of the present invention.Will appreciate that the function of native system and method and/or in the behavior some or all can be implemented as logic as defined above.
In addition, with regard to term " includes (comprising) " describe in detail or claims in regard to the degree that is used, it wants to have and the similar inclusive of term " comprising (comprising) ", because this term is interpreted as the transitional word in the claim when being used.In addition, with regard to the degree that is used in claims (for example A or B), it is to want to refer to " A or B or both " with regard to term " or (or) ".When the author wants indication when being not both " have only A or B but ", the author will adopt phrase " A or B but be not both ".Thereby, here to term " or " use be that inclusive is used, rather than removing property uses.See Bryan A.Garner, A Dictionary of Modern Legal Usage 624 (2d ed.1995).
2. exemplary embodiment
System of the present invention is normally described by way of example here, in the border, field of system of the present invention as the purposes of address management system.Though can describe in detail the example relevant with the address, applicant's intention is not that scope of the present invention is limited or be restricted to by any way this details.More multi-usage, application, advantage and the modification of inventive system all is easy to find out for a person skilled in the art.Therefore, the present invention is not limited to detail, representative device and illustrated examples shown and that describe with regard to the aspect of its broad.Therefore, can break away from this details, and can not break away from the spirit or scope of general creative notion.
Description is described exemplary means, method, system, process or the like, and similarly mark all is used to refer to similar elements in institute's drawings attached.In the following description, for purpose of explanation, many details have been set forth to help complete understanding device, method, system, process or the like.But, it is evident that device, method, system, process or the like can realize under the situation of these details not having.In other cases, known structure and equipment illustrate with the block diagram form, describe so that simplify.
3. data structure: superset
3.1 data superset
At an embodiment, as shown in Figure 2, system of the present invention can comprise data superset 30.Data superset 30 can comprise four or more a plurality of discrete relational database 31-35 ( comprise database 1,2,3,4 ...., N, as shown in the figure).Database 31-35 can be connected to other databases in the network of database link 36.In one embodiment, one of database 31-35 can be designated as major database, other can be designated as the low priority data storehouse.Several relational databases 31-35 can be subjected to the control of data base management system (DBMS) together, to create the individual data superset 30 that can store mass data in an orderly way on a plurality of relational database table and carry out complex query.
Relational database 31-35 can comprise form 40 set (comprise form A, B, C ..., N, as shown in the figure).The set that form 40 can comprise data field 44 (comprises field 1, field 2, field 3, field n, as shown in the figure).Can utilize one or more key words 48 that form 40 is linked at together in mode known in the relational database field.
In one embodiment, each database 31-35 can have common data structure.In aspect this, each relational database 31-35 can comprise the form 40 of similar number, and each form can comprise the field 44 of similar number.Corporate data structure between the various forms 40 in the data superset 30 can provide the storage of the data that allow any kind and the dirigibility to a certain degree of processing.
In one embodiment, the corporate data structure can comprise the value based on the data of being stored, by from general to concrete a series of ranks, arrange record in one or more forms 40 with hierarchical sequence, as more detailed description hereinafter.The corporate data structure can also comprise form 40 is stored as sparse matrix linked list.
3.2 address superset
An exemplary embodiment of data superset is shown in Figure 1.Address superset 130 can comprise the relational database that several are discrete, and these databases comprise postal data storehouse 131, carrier's database 132, standard database 133 and plan database 134 in one embodiment.As shown in the figure, database 131-134 can be connected to other databases in the network of database link 36, with calculated address superset 130.Relational database 131-134 can be controlled by the address date base management system.
Database 131-134 can comprise the set of data form 140, and in one embodiment, these forms comprise that preferred form 141, street another name form 142 and consignee call form 143, as described in more detail below.Preferred form 141 can also comprise one or more fields that are used to store token, to serve as the unique identifier of specific record.Form 141,142,143 can comprise data field 44 set (comprising field 1, field 2, field 3 ..., field n, as shown in the figure).Can utilize one or more key words 48 that form 141,142,143 is linked at together in the known mode in relational database field.
In one embodiment, each database 131-134 can have common data structure.In aspect this, each relational database 131-134 can comprise the form 141-143 of similar number, and each form can comprise the field 44 of similar number.Corporate data structure between the various forms in the address date superset 130 can provide the storage of the data that allow any kind and the dirigibility to a certain degree of processing.In one embodiment, the corporate data structure can comprise the value based on institute's address stored data, by from general to concrete a series of ranks, arrange record in one or more forms with hierarchical sequence, as more detailed description hereinafter.The corporate data structure can also comprise with form stores for or be reformatted as sparse matrix linked list.
4. system architecture
Fig. 3 is the presentation graphs of system 10 according to an embodiment of the invention.System 10 can comprise infrastructure services device 25, one or more computer network, application server 200 and the one or more client 655 that distributes by multilayer client-server relation.One or more computer networks help the communication between infrastructure services device 25, application server 200 and the one or more client 255.One or more computer networks can comprise polytype computer network, for example the network of internet, private intranet, private, public switch telephone network (PSTN), wide area network (WAN), Local Area Network or any other type as known in the art.
As shown in Figure 3, main AMS server 510 can reside on the infrastructure services device 25.Can communicate by letter with main AMS server 510 such as AMS GUI 324 such graphical user interface, as shown in the figure.
In one embodiment, the following one deck in the system 10 can comprise several AMS clients 655 and time AMS server 520.In the AMS client 655 some can comprise data capture workstation1 55 and the GUI 26 that is used for one or more users 28.In one embodiment, application server 200 can reside on the AMS client 655.
In one embodiment, from inferior AMS server 520 down, following one deck can comprise several AMS clients 655, and wherein each comprises data capture workstation1 55 and the GUI 26 that is used for one or more users 28.
In the exemplary embodiment, infrastructure services device 25 can comprise central processing unit, and this central processing unit is communicated by letter with other elements in the infrastructure services device 25 via system interface or bus.Infrastructure services device 25 also comprises input and display device, is used for receiving and video data.Input and display device for example can be keyboard or the indicating equipments that is used in combination with monitor.Infrastructure services device 25 can also comprise storer, and this storer can comprise ROM (read-only memory) (ROM) and random access storage device (RAM).ROM can be used to store basic input/output (BIOS), and it comprises help transmits information between the element of infrastructure services device 25 basic routine.
In addition, infrastructure services device 25 can comprise at least one memory device, for example hard disk drive, floppy disk, CD-ROM drive or CD drive are used for information stores at various computer-readable mediums, for example hard disk, moveable magnetic disc or CD-ROM dish.In the memory device of these types each can be connected to system bus by suitable interface.Memory device and the computer-readable medium that is associated with it can provide non-volatile memories.Notice that following this point is very important: aforementioned calculation machine computer-readable recording medium can be replaced by the computer-readable medium of any other type as known in the art.This medium for example can comprise tape, flash card, digital video disc and Bernoulli magnetic tape cassette.
A plurality of program modules can be stored by the various memory device in the RAM.This program module comprises operating system and one or more application.What be positioned at infrastructure services device 25 equally can also have network interface, be used for other element interfaces of computer network with communicate by letter.One or more assemblies of infrastructure services device 25 can be away from other processing components from the geography.In addition, one or more assemblies can be combined.Infrastructure services device 25 can comprise the additional assembly of the function that is used to carry out here.
4.1 data base management system (DBMS) (DBMS)
Refer again to Fig. 3, according to one embodiment of present invention, data base management system (DBMS) (DBMS) can reside on main AMS server 510 (infrastructure services device 25), application server 200 or the inferior AMS server 520.DBMS can comprise interface 600 and suite of programs group 500, and is similar with AMS 110 shown in Figure 4.
For example, can in the scene of data base management system (DBMS) (DBMS), data base management system (DBMS) of the present invention (DBMS) be described as the purposes of address management system (AMS) 110.Similar with DBMS, AMS 110 can reside on main AMS server 510 (infrastructure services device 25), application server 200 or the inferior AMS server 520.In one embodiment, AMS 110 can comprise interface 600 and suite of programs group 500, as shown in Figure 4.
Fig. 4 is the block diagram of system 10 according to an embodiment of the invention, and it is illustrated in the AMS 110 of operation in the stand-alone service pattern 640.System 10 as shown in the figure comprises computing machine 15, and it provides visit to one or more users 28 by AMS GUI 324.
4.2 address management system (AMS)
Address management system (AMS) 110 can be specifically designed tissue, storage and the retrieval of the data that are used for control address data superset 130, and is used for the security and the integrality of control address superset 130 and component database thereof.Interface 600 can be configured to be used to receive and handle the requests for data that receives from the applications (not shown).In one embodiment, interface 600 can be have establishment, read, the interface based on COM of the ability of renewal and deletion record.Interface 600 can also comprise query function, is used for being stored in the data executable operations of address superset 130.
5. find out preferred expression (preferred representation)
In one embodiment, system 10 of the present invention can comprise the data base management system (DBMS) (DBMS) that is used for data superset 30.DBMS also can be with the data base management system (DBMS) of the data that act on any kind that comprises address date.In the border, field of address date, DBMS can be called as address management system (AMS) 110.Under any circumstance, management system 110 can comprise interface 600 and suite of programs group 500.
In one embodiment, suite of programs group 500 can comprise one or more computer software programs, be used for receiving the raw data of " subjective expression (subjective representation) ", come the value of analyzing stored in database by carrying out one or more inquiries, and produce the output data of " preferred expression " with interface 600.
Here employed term " subjective expression " is meant and may has people's input of individual's understanding or the raw data of submitting to data.Subjective expression ambiguous often or incomplete, this just may be a problem when calculation procedure needs raw data carrying out.For example, the someone can utilize the subjective expression of input " 12-4-63 " the input birthday.In the U.S., this date may be meant " Dec 4 ", and it may represent " April 12 " in Europe.Computer module may be interpreted as 1963 or 63 years to the time.These ambiquities have for the accuracy of raw data and have a strong impact on.In order to remove ambiquity and imperfection, suite of programs group 500 can be designed to subjectivity be represented to convert to " preferred expression ".This suite of programs group 500 for example can comprise and is used for determining that the user is with U.S.'s form or in system or inquiry with the European form input date.Unless suite of programs group 500 can also comprise that the user imports four times always with 0 rule or logic routine as acquiescence century in time of all inputs.Design and construction procedures cover group 500 require about the type of the raw data of expecting in the particular system and the foreknowledge and the plan of form.
Subjective expression can be processed into preferred expression general and that raw data is irrelevant by suite of programs group 500.For example, client can utilize subjective expression " Acme LX-709 Color " to order ink-cases of printers, and wherein Acme is a printer manufacturer, and LX-709 is the model of printer, and wants color ink.In the system that is used for handling the ink-cases of printers order, for example, can come print cartridge is made a catalogue and stored with ten cartridge serial number.Sequence number is not directly related with text and numeral in the raw data; But sequence number is " the preferred expression " that will be printed on the buying order, thereby required print cartridge can be located and load and transport in dealer.For subjective raw data is matched correct sequence number, suite of programs group 500 can be write as the possible designator of explanation by any kind of of client's submission.Suppose preceding four tabulations of each cartridge serial number corresponding to the printer manufacturer that makes up the machine that can use such print cartridge.Suite of programs group 500 can comprise the rules of storage, is used for the printer manufacturer title of input is compared with the title of tabulation, and finds out the corresponding preceding 4-digit number of cartridge serial number.This has represented and has found out the first step that is printed on ten bit sequence number on the buying order.
Another example of subjective expression is common street address.On mailing list, the someone can write subjective expression " Doe, 123 East Main Street N.W.Suite A-4, Atl30030 ".The several sections of address is ambiguous or incomplete, comprising addressee " Doe ", and abbreviation " Atl " and the state title that lacks.If these data will be handled by computing machine or sorting device, then these ambiquities can cause the losing of mailing list, delay or incorrect delivery.In order to remove ambiquity and imperfection, suite of programs group 500 can be designed to subjectivity is represented to convert to preferred expression.This suite of programs group 500 for example can comprise the rules of program or storage, compares with the Computer Database of commercially available street address and ZIP code in the address that is used for being write.
Above-mentioned example has been referred to attribute or one date of parameter, part number, address.Parameter can characterize with multiple form, represents and depend on other expressions that make the use border comprising the subjectivity shown in above.In one embodiment, system of the present invention uses tabulated data to handle and revise the mode of characterization parameter, as more detailed description hereinafter.
In one embodiment, data base management system (DBMS) of the present invention (DBMS) can comprise suite of programs group 500, and it can comprise one or more in the following general purpose discipline: (1) strengthens module; (2) announcement and subscription module; And (3) matching module.Suite of programs group 500 can certainly comprise additional assembly and rules, is used for carrying out other functions that the application describes.
5.1 enhancing module
In one embodiment, suite of programs group 500 of the present invention can comprise the structure of the data of optimal Storage in the database 31-35 of data superset 30 and the enhancing module of order.Database 31-35 in the data superset 30 can comprise millions of records.In one embodiment, can improve and accelerate task that all or most of record among each database 31-35 are read, upgrade and search for by optimizing data structure.
Database table comprises a large amount of records, has taken a large amount of storeies and has needed very long computing time to classify, to search for and other analysis operations.A simple case that strengthens or optimize data is based on one or more attributes (row) and comes the book of final entry, places record with the order of pressing increasing or decreasing.But for the big form with a plurality of attributes, simple record sort significantly generation time is saved or search efficiency.
In one embodiment, a kind of module that strengthens in the suite of programs group 500 comprises the rules that are used for database is transformed into sparse matrix linked list.Lists of links comprises and is designed to inquiry is directed to the link of next field from a field, wherein uses chain to fetch sometimes and gets around or skip incoherent field.Sparse matrix does not comprise the pleonasm segment value in the trailer record.Do not repeat first value, follow-up field is left a blank, and follow-up value is assumed that and equals first value, unless and occur up to different values.
For example, in Fig. 9, the ZIP code field comprises the repeated clauses and subclauses (ZIP code 20001) in 13 records each.In one aspect, system 10 of the present invention uses the notion of sparse matrix to eliminate repeated clauses and subclauses, thereby has saved storer and shortened computing time.For example, in Fig. 9, the ZIP code of node 1 can be filled by five ZIP codes 20001.Can be transformed in the system 10 of sparse matrix at wherein form of the present invention, ZIP code field that can be follow-up is empty or is zero.In Fig. 9, node 2 to the ZIP code field of node 13 can or be zero for sky; Value in these fields can be assumed that it is 20001.
In sparse matrix, the value that runs in the records series is assumed that and keeps identical, occurs up to different values.Owing to so can eliminate the value of many repetitions, be sparse therefore with form or matrix description one-tenth.Be used to create the rule of sparse matrix by application, can make any attribute in the form become sparse.
The sub-fraction of model database form 40 is shown in Fig. 5.Each row comprises single record 42.Can number locate each field 44 by reference line number and row.For example, being arranged in the 2nd field that is listed as the 3rd row and can being described as field (3,2), is (3,2) just simply perhaps.This field naming convention all be valuable in the database manipulation of many needs sensing specific fields.
The form 40 of Fig. 6 is examples of sparse matrix.For example, row 2 start from the value " Smith " in the row 1, are the null value in the record (OK) subsequently then.Therefore, think that the value of row 2 is " Smith " in follow-up row 2,3 and 4.
When form was organized into lists of links, the ranks name constraints of field was helpful.In a class lists of links, link 340 can comprise field 44, value 46 and one or more pointer, as shown in Figure 7 and Figure 8.In class link 340 shown in Figure 7, next column (next-in-column) pointer 344 and next line (next-in-row) pointer 342 have been comprised.Pointer 344,342 is included in the indication that the next one comprises the field of nonzero value.Because they point to next field (rather than last field), so these pointers 344,342 are called as forwarding pointer.The lists of links of some type also comprises the back to pointer, and it has the indication of pointing to last one or previous non-zero field value.In one aspect, system 10 of the present invention can include only forwarding pointer.
Fig. 8 is the expression of the link 340 between the sparse matrix value shown in Figure 6.For example, the 1st indication that is listed as in the 4th link 340 of going will promptly be listed as the 4th next nonzero value of going to analysis guide to being arranged in the 3rd.Indication permission such as the such analytic process of search inquiry that comprises in the link 340 gets around or skips the null field in the sparse matrix.By skipping null field, significantly reduced search time, thereby produced Query Result quickly.
In one embodiment, comprise that the suite of programs group 500 that strengthens module can be used for any table transformation of data superset 30 is become sparse matrix linked list.Be stored as the storer possibility much less of data superset 30 consumption of sparse matrix linked list, therefore may be more suitable in being distributed to subscriber's client 255 as duplicating superset 330.When data form had been transformed into sparse matrix linked list (SMLL) form, the enhancing module can be determined the SMLL form at last or otherwise its " packing " be got up, so that it is ready to use elsewhere for distribution with for the other system assembly.
Shown in Fig. 5-8, duplicate superset 330 and can reside on one or more clients 255 in the system 10.In total system 10, can utilize announcement and subscription module to finish to transmission or " announcement " of duplicating superset 330, as described below.
Strengthen module can also monitor form when new data is added state in one embodiment, by repeated transformation rules where necessary form is maintained in the optimum state, and with regard to the state of form with and be shared or be distributed to the availability and the other system component communication of subscriber's client 255.In aspect this, the enhancing module of suite of programs group 500 can be configured to the other system component interaction with communicate by letter, so that data form is maintained in the optimum state, so that carry out rapidly and search efficiently.
5.2 announce and subscription module
In one embodiment, suite of programs group 500 of the present invention can comprise announcement and pre-booking process or rules, transmits data with control and help between the assembly of system 10 of the present invention.As shown in Figure 3, system 10 can comprise infrastructure services device 25, one or more computer network 230, application server 200 and the one or more client 255 that distributes by the client-server relation.
In the client-server network environment, for example in the environment shown in Fig. 5-9, duplicate superset 330 and can reside on one or more subscriber's clients 255 of system 10.Announcement and subscription module can be configured to monitor and be controlled in the total system 10 duplicate superset 330 to client 255 announcements as the subscriber.
5.3 matching module
In one embodiment, suite of programs group 500 of the present invention can comprise matching module 85, it is configured to receive the raw data that is in subjective expression 80, utilize the value of interface 600 analyzing stored in data superset 30 carrying out one or more inquiries, and produce the output data that is in preferred expression 90.General step in the exemplary match module 85 is illustrated as the process flow diagram among Figure 12.
In one embodiment, represent that based on subjectivity 80 search and show that being in its step of preferably representing 90 data can comprise following general utility functions: catch 300, resolve 305, standardization 310, checking 320, upgrade 380, combination 390 and deliver 395.Those skilled in the art will appreciate that according to one or more special algorithm these general step not necessarily need occur in sequence with this, and some step can repeat in case of necessity.
5.31. CatchIn one embodiment, be called as and catch 300 step and can comprise and catch or otherwise receive subjective expression 80 (input data).
5.3.2. ResolveIn one embodiment, be called as and resolve 305 step and can comprise subjectivity is represented that 80 resolve to its ingredient.The task of resolving generally comprises sentence or field string is divided into its ingredient.For example, in the border, field of street address, the address representative that writes on the envelope can represent 80 via the subjectivity that resolving is divided into many heterogeneities or workpiece.Analytical algorithm or program generally receive character string or character string as input, and the application rule set is to finish the division of category then.
An example of subjective expression 80 is street addresses.For example, a plurality of discrete workpiece be can comprise such as " 123 East MainStreet N.W.; Suite A-4 " such U.S.'s street address, (Main), type (Street), back directed (NW), inferior title (Suite), inferior number (A-4) claimed comprising numeral (123), pre-directed (East), important name.Can also perhaps for example can it be resolved to thinner details or granularity rank based on street address being resolved to composition such as city, the such administration segmentation in Hezhou, county based on the ZIP+4 code.
For example, by resolving subjective expression 80 and its ingredient being stored in the field that is separated of form, matching module 85 of the present invention can allow the user to visit and sum up (or " abstract ") data in many ways with using according to demand.For example, the user can be based on the summary or the summary of five ZIP code request address dates in specific state.If address date is resolved and the ZIP code has been stored in the discrete field, then the step based on ZIP code abstract data comprises simple relatively search and retrieval.Workpiece is stored in allows the user to utilize any other abstract search and retrieve data of level in the field that is separated.In aspect this, the present invention provides huge dirigibility to the various users with various demands.
5.3.3. StandardizationIn one embodiment, be called as and comprise that set represents that to subjectivity 80 carry out reformatting according to normalisation rule as the step 1 of standardization 310.Standardization generally can relate to the numerous characteristics of subjective expression 80, comprising font, spacing, printed words, punctuate, field may comprise that alphabetic character still is that numerical character still is for both, field length, field size or capacity and other aspects.
For example, in the border, field of street address, subjective expression 80 can be write:
John Doe
123 East Main Street,N.W.
Oakland Center,Suite A-4
Atlanta,Georgia 30030
The step that is called as standardization 310 can be changed font, spacing, punctuate and other aspects of above subjective expression 80, thereby makes it seem as follows after standardization:
JOHN DOE
123 E MAIN ST NW STE A4
DECATUR GA 30030-1549
Figure A20038011082500351
In one embodiment, normalization step 310 can comprise variable regular collection, and this depends on address style and area or national.For example, foreign address may have the rule of the canonical representation of the various addresses of constraint very inequality workpiece.For example, subjective expression 80 below can be by standardization:
Subjective expression 80: after the standardization:
Prielle Kelia U.19-15 BUDAPEST XI
Budapest H-2100 PRIELLE KELIA U.19-35
1117
Hungary HUNGARY
Subjective expression 80: after the standardization:
V.Delle Terme LARGO DELLE TERME
Rome 00100 00153-ROMA RM
Italy ITALY
Subjective expression 80: after the standardization:
103 New Oxford 103 NEW OXFORD ST
London WC1A 1 PG LONDON
Great Britain WC1A 1PG
UNITED KINGDOM
Normalization step 310 can combine execution with analyzing step 305, thereby the form after making workpiece through resolving with its standardization is stored in the form.In one embodiment, can be to each workpiece operative norm step 310 that is separated after resolving, and in another embodiment, analyzing step 305 can take place earlier.Other general step in matching module 85, standardization 310 and parsing 305 steps can occur in sequence with any, and can repeat.
5.3.4. Authentication moduleIn one embodiment, be called as the series of steps that checking 320 step can comprise the complexity of being taked to verify subjective expression 80, as more detailed description hereinafter.Checking 320 generally comprises the accuracy of the subjective expression 80 of retrieval and property recently.Checking 320 can also comprise represents that with subjectivity 80 compare with the value in the form that is stored in superset 30, thereby search preferably represents 90.
5.3.5. UpgradeIn one embodiment, be called as and upgrade 380 step and can comprise one of relational database that the data of newly obtaining is added in the superset 30.In aspect this, can based on new data by and constantly upgrade superset 30 via the operation of suite of programs group 500.Step of updating 380 can occur in any moment during the rules of being carried out by matching module 85.
In one embodiment, step of updating 380 can be added new data to one of form in the superset.Data can be held in place near the record of end of table.In one aspect of the invention, before next carrying out the task of strengthening module, can recompilate also and can not recompilate form.Desired form does not require frequent compiling.
5.3.6. CombinationIn one embodiment, the step that is called as combination 390 can comprise the reverse of analyzing step 305, and this is because the workpiece that is separated of subjective expression 80 is re-assemblied.In one embodiment, combination step 390 is to be performed after verification step 320 has produced the workpiece of preferred expression 90.
5.3.7. Deliver and showIn one embodiment, be called as and deliver 395 step and can comprise and to represent preferably that 90 (or preferred tokens) transmit or send to one or more assemblies of system 10 of the present invention.In aspect this, deliver step 395 and can be described as the result who returns or announce search inquiry.Deliver that step 395 can also comprise or the heel step display, in this step display, represent that preferably 90 can be displayed on the user display of monitor or other types.Deliver that step 395 can also comprise or the heel printing step, in this printing step, preferably represent 90 can be printed on the label, be printed in the tabulation, be printed as the part of report or otherwise send with readable text format according to systematic direction.
5.4 authentication module
In one embodiment, verification step 320 generally can comprise represents that with subjectivity 80 compare with the value in the form that is stored in superset 30, thus the preferred expression 90 of search.In the border, field of address management system 110, address validation 320 generally comprises represents that with the subjectivity of Input Address 80 compare with the value in the address database 131,132,133 that is stored in address superset 130 (as shown in Figure 1), and the preferred expression 90 of identification address.
As shown in Figure 1, in one embodiment, address superset 130 can comprise postal data storehouse 131, carrier's database 132, standard database 133 and plan database 134.In one embodiment, each linked database 131-134 can comprise that preferred form 141, street another name form 142 and consignee call form 143.Preferred form 141 can also comprise one or more fields, is used to store token to serve as the unique identifier of specific record.
Postal data storehouse 131Can comprise in one embodiment from address date such as the such mail service of United States postal service (USPS).The U.S. comprises more than 145,000, but 000 destination address.USPS provides multiple by the address database of regular update to the public, comprising delivering sequential file (DSF).DSF is that it comprises complete, the standardized address of each delivery point that USPS serves by the computerized database of USPS exploitation, and these addresses are stored in the discrete record.The record of each separation comprises street address, ZIP+4 code, distributes the route code, delivers serial number (walking serial number), delivers type code and the seasonal designator of delivering.USPS has developed new delivery point checking (DPV) database recently, to replace DSF.The DPV database has basic format or enhanced formats, and enhanced formats is called as DSF 2(it comprises additional address properties).Many foreign countries and foreign area provide similar postal address database of record, comprising according to the particular demands of country and the data of rule criterionization.Postal data of the present invention storehouse 131 can be configured to receive and store any in the multiple database that comprises postal address.
In postal data storehouse 131, preferred form 141.1 can be configured to accept and store the preferred expression of the delivery point of being served by serving postal activity.Preferred expression can be stored as whole work-piece or be stored as the workpiece that is separated, perhaps both.One of main source of the preferred expression 90 that postal preferred form 141.1 can be the address.
Serving postal activity can also provide can call the street another name data of accepting in the form 142.1 and storing in the street.Another name refers to the situation of same target as the term suggests be meant several different identification symbols.When the common example of street another name occurs in road and has a plurality of title: local street name, state route number and Federal Highway number.For example, U.S.Highway 1 can be called as State Route 16 in specific state, and also may be called as MapleStreet when its specific cities and towns of process.In the area that these three titles all are suitable for, street name Maple Street, StateRoute 16 and U.S.Highway 1 are the street another names.In addition, the street another name for example can also comprise S.R.16, Route 16, U.S.1, Route 1 or Maple Drive, if these titles are being used.The USPS database generally includes street another name data.Street another name form 142.1 can be configured to accept and store the street another name data that provided by serving postal activity.
Other features and workpiece also may experience another name to be used.For example, formal Business Name may comprise the general term not to be covered of the public.For example, Acme Shoe Corporation can be called as Acme Shoes or be exactly Acme in daily saying.When the user who occurs in database by different titles that will be stored in the value in the database or problem that another name produced wishes to retrieve this value especially.For example, the search of Acme Shoe Corporation may not can be found the record of only indicating Acme Shoes.
The consignee calls form 143.1 and can be configured to accept and the consignee who is provided by serving postal activity is provided calls data (but when its time spent).Serving postal activity can provide also can not provide the consignee to call data.In some area under one's jurisdiction, such as the U.S., mail service may not can be issued the data that expose occupant (consignee) identity with street address.Shown be used for the data field that the consignee calls form 143.1 (field 1, field 2, field 3 ..., field n) be before hyphen rather than+number, can be sky to indicate these fields.
Can link with one or more key fields or the form 141.1,142.1,143.1 in the postal data storehouse 131 that otherwise interconnects in mode known in the relational database field.
Carrier's database 132Can comprise the address date from privately owned source in one embodiment, described privately owned source for example is shipping company, parcel services or private data storehouse provider.Some delivery company and other service providers develop and safeguard address date, and wherein some may be available.Carrier of the present invention database 132 can be configured to receive and store any in the multiple private data storehouse that comprises address information.
In carrier's database 132, preferred form 141.2 can be configured to accept and store the privately owned preferred expression that comes the delivery point that comprises in the source database.Preferred expression can integrally be stored, perhaps as the workpiece storage that is separated, perhaps both.
Privately owned source can also provide the street that can accept and store in the another name form 142.2 of street another name data.Some delivery company and other service providers develop and safeguard the tabulation of the street another name in the zone that they are served.Street another name form 142.2 can be configured to accept and store the street another name data that provided by any privately owned source.
The consignee calls form 143.2 and can be configured to accept and the consignee who is provided by privately owned source is provided calls data.Except the another name of street, many delivery companies and other service providers exploitation and maintenance can comprise the user or the client's (consignee) of another name tabulation.The consignee calls form 143.2 and can be configured to accept and the consignee who is provided by any privately owned source is provided calls data.
Can link with one or more key fields or the form 141.2,142.2,143.2 of the carrier's database 132 that otherwise interconnects in mode known in the relational database field.Similarly, carrier's database 132 also can be linked with postal data storehouse 131 or otherwise interconnection.
Standard database 133Generally can comprise the another name data in one embodiment.Postal data storehouse 131 and carrier's database 132 upload and installation period between, system 10 of the present invention can comprise and is used for collecting street another name and the consignee calls information and it is stored in the instrument of standard database 133.Standard street another name form 142.3 can be configured to accept and storage street another name data.Standard consignee calls form 143.3 and can be configured to accept the storage consignee and call data.In aspect this, standard database 133 can serve as the warehouse of calling data in one embodiment.
Because standard database 133 generally is used to call data, so it can comprise any preference data that also can not comprise in the form 141.3.The data field of the preferred form 141.3 of standard (field 1, field 2, field 3 ..., field n) can be before hyphen rather than+number, can be sky to indicate these fields.
Can link with one or more key fields or the form 141.3,142.3,143.3 of interconnect standards database 133 otherwise in mode known in the relational database field.Similarly, standard database 133 also can with carrier's database 132 and postal data storehouse 131 is linked or otherwise the interconnection.
The data that are stored in the standard database 133 can be used for a process that is called as dim or fuzzy matching.Literal coupling requires definite coupling, for example Acme and Acme.Fuzzy matching has represented partly coupling, for example Acme, ACM, Acmed and Ch2Acme.The another name data generally can be used for allowing or needing the system of fuzzy matching, and this is but to represent same object because another name just in time comprises subtle difference by its character.For example, (Acme Shoe Corporation, Acme Shoes Acme) have also represented fuzzy matching each other to above-mentioned consignee's another name.
Fuzzy matching can be used for border, standardization field, address, and this is because the subjectivity of address represents that 80 can comprise one or more ambiguous or incorrect address workpiece.For example, and subjective expression 80 " Doe, 123 East Main Street N.W., Suite A-4, Atl 30030 " be exactly incomplete, and comprise some ambiguous parts.Can utilize the consignee who is stored in postal data storehouse 131 to call data in the form 143.3,, address " Doe " and preferred consignee " John W.Doe " are complementary via the fuzzy matching process.How the database 131-134 that this example shows address superset 130 works together, and this is because postal data storehouse 131 can not comprise any preference data in the form 141.3.Therefore, in order to finish address validation 320, address management system 110 can be configured to visit the related data in the form that is stored in other databases 131,132,234, to find out the preferred expression 90 of address.Because form 141,142,143 is linked, therefore the search to coupling can claim " Main " to use ZIP code " 30030 " individually or together with the street important name, to find out and subjective expression 80 similar records.In aspect this, address management system 110 of the present invention in one embodiment can be configured to comprise the program or the Structured Query Language (SQL) of the coupling between any data that are used for being stored in address superset 130.
Another instrument that comes in handy in an address standardization and the checking border is called as Soundex.Soundex provides the method for searching the word that sounds similar.Be filing system when Soundex begins, it uses simple phonetic algorithm that correct title and other words are reduced to four character words alphanumeric codes.In a class Soundex algorithm, first letter of code can be corresponding to first letter of word or correct name, and three numerals that the remainder of code can be drawn by the sound from all the other syllables constitute.So, the voice of word or title are quantized.The Soundex function is useful, this be because with comparand parent phase ratio, computing machine generally more is good at comparative figures.In one embodiment, verification step 320 of the present invention can comprise the Soundex algorithm.
Cross and draw database 134Can comprise the input data in one embodiment, comprising one or more subjective expressions 80.In aspect this, subjectivity is represented that the process that data are added in the plan form 141.4,142.4,143.4 can comprise seizure as described herein, parsing and normalization step, thereby the input data can suitably be divided and standardization, think the checking get ready.
In one embodiment, the input data can mainly be stored in the preferred form 141.4 of plan.Because plan database 134 generally is used to import data, so it can comprise and can not comprise that also street another name and consignee call any data in the form 142.4,143.4.Before the data field of these forms be hyphen rather than+number, can be sky to indicate these fields.
5.4.1 by the staging hierarchy array data.In one aspect, address management system 110 of the present invention can utilize the graded properties of address date, to locate rapidly and efficiently and subjective expression 80 similar records.In aspect this, address management system 110 can comprise the method for preparing or arranging the data of storage according to its inherent staging hierarchy.Data can be aligned to from general to concrete a series of ranks (as described below), perhaps are arranged in any order that is particularly suitable for this application.In use, address management system 110 can be configured to comprise the program that can search the coupling between the data that are stored in the address superset 130 or the inquiry rules of storage.
Generally speaking, inquiry can be used to extract required data from database, and does not change or change data itself.Because inquiry is generally found out desired data and it is shown to the user, therefore the result of inquiry is called as view sometimes.In addition, inquiry can also be used to create result's (view), and it is not shown to the user.In aspect this, inquiry can be used to data (normally interim) are arranged in the new construction that is different from tableau format.Inquiry can be used to create the new data structure with certain benefits, and described advantage for example is improved permutation logic, classify more rapidly and search for, or the particular data field is moved to main position.In one embodiment, verification step 320 of the present invention can comprise one or more inquiries, to arrange the data in the superset.A this arrangement comprises the process that is called as tokenization (tokenization).
5.4.2. TokenizationThe example of postal preferred form 141.1 is shown in Figure 9.Every row is represented single record and is comprised a plurality of fields.Each field that is separated is stored in the row that are separated that comprise like attribute.The attribute of form is shown the row title at the top.Preferred form 141.1 shown in Figure 9 can be described as pattern (ZIP, token, street, type, low, high, strange/idol, consignee, partly, low, high ,+4).
Token row as shown in the figure comprise postal token 71, as the unique identifier of each unique address.Notice that two records that comprise address " 440 First Street, Suite 600 " are assigned with postal token T6.Therefore other different addresses of street address record representative in other row of form have different tokens.
Address date is classification just in time just according to its character.The various workpiece of address are concrete from generally changing to.For example, five ZIP codes itself provide about the general notion in address location, and full address is understood to include occupant or consignee and all street data usually, and ZIP code or ZIP+4, and very concrete address location is provided.
In one embodiment, verification step 320 of the present invention can comprise inquiry or the algorithm that is used for City-State-ZIP is made up the top of the staging hierarchy that is placed on address date.The City-State combination certainly comprises a plurality of ZIP codes.On the next rank of exact level, be the street workpiece, directed comprising pre-orientation, street name, street type and back.This street address can similar 100 East Main Street, SW.Can also utilize one or more street address scopes further to segment the street workpiece, described scope can be a pure digi-tal, and for example scope 240-298 perhaps can be a letter, and this depends on range field.What exceed regular street another name is less important workpiece, comprising less important number, and for example Suite 100 or Apartment1C.Four additional numerals in the ZIP+4 code can provide another other exact level of level.Some database can also comprise two additional delivery serial numbers.
In one embodiment, verification step 320 of the present invention can comprise the method from general to concrete hierarchy with the one-tenth of the record ordering in the form of superset.In verification step 320, can define relation and the record grouping that is produced with regard to being called as the notion that comprises and comprise.Node number has been assigned to each record of form 141.1, as shown in Figure 9.Node number can help to prove the notion that comprises and comprise between the record of address.
5.4.3 Comprise rankAfter the record of 320 pairs of forms 141.1 of verification step was resequenced, the new classification of record is arranged can be as shown in figure 10.Node number among Figure 10 is to distribute according to the exact level rank that shows in the data.For example, the rank 1 among Figure 10 comprises node 1, its representative comprise address realm " 440-498 First Street " and record.In all records shown in Figure 9, the record that is positioned at node 1 place is the most general, thereby is placed in the rank 1.The next rank of exact level, rank 2 comprises node 2.The record at node 2 places comprises single street address (440 First Street), but does not comprise less important workpiece (not having the suite number).
Rank 3 among Figure 10 comprises the address that has suite number or scope but do not have consignee's title.These records comprise node 3,11,4,12,5 and 13.Node in the rank 3 is from left to right to arrange by the order that the suite number increases.In aspect this, system 10 can be configured to except data being placed different exact level ranks also address date is carried out from left to right ordering.
Rank 4 is included in the record that consignee's field has title.
The notion that comprises and comprise is proved by the connection between the various nodes among Figure 10.Node 10 is connected to node 3, because " Suite 310 " is the subclass of scope " Suite 100-400 ".Similarly, node 6,7 and 8 is connected to node 5, because their suite number " 500 is 600 " is the subclass of the scope (Suite 500-600) in the node 5.At last, node 9 is subclass of node 13, because the address is identical and node 9 comprises consignee's title.
Node shown in Figure 10 shown in an embodiment of verification step 320 of the present invention, can implement comprise and comprise notion.Node 1 on the rank 1 " comprises " all nodes under it, and this is all to drop in the scope of stating into node 1 because every other address allows.On the contrary, all nodes under the rank 1 all " are comprised " in node 1 (or comprised by node 1).Similarly, the node 2 on the rank 2 comprises all nodes under it, and node 3 comprises node 10.Point 5 comprises node 8,6 and 7, because they are subclass of the scope of statement in the node 4.Node 13 comprises node 9.
In one embodiment, verification step 320 of the present invention can distribute token to each unique record.Token has also proved the notion that comprises and comprise.Figure 11 is that the table shape of classification form shown in Figure 10 is represented.Form among Figure 11 shows node and the token that begins on each rank from rank 11.Token T1 can be described as the every other token that comprises in the classification form.But notice that the token number may be different from node number.Token T3 comprises token T9.Token T5 comprises token T6 and T7.Note, token T6 be used to node 6 and 7 both because the address is equal to.
In Figure 11, find out the notion that comprises and comprise easily.For example, the data at comparison node 3 and node 10 places, the reader will be noted that " Suite 310 " in the node 10 are between the scope of the suite number (100-400) that is stored in the node 3.This relation proved in Figure 10, also illustrate comprise and comprise notion.
In one embodiment, comprise other number of level without limits for what during verification step 320 of the present invention, use.The address record can comprise a large amount of workpiece.Form can comprise a large amount of records.A large amount of records that can comprise in the consideration form, the hierarchically organized speed that can be used for increasing visit greatly and analyze data of record.Comprise rank and token number and can be applied to millions of address record and scope in any one form in the address superset 130 at what 13 nodes shown in Figure 14,15 and 16 were described.According to according to the mode identical mode of staging hierarchy to the preferred form among Fig. 9 141.1 ordering, also can use node and comprise rank and organize other forms 141,142,143 in the address superset 130.
Rearrange the data except utilization comprises rank, each form can be transformed into a sparse matrix linked list as described herein, with further increase processing speed.
5.4.3. Preferred tokenRefer again to the form 141.1 among Fig. 9, node 6 and 7 all has been endowed identical token T6, because the identical physical location of they representatives.Notice that the consignee's title in the node 6 and 7 is respectively " APC " and " AM POLLING CMTE ".The alternative title of these of addressee is that the consignee calls.In other words, APC is the another name of AM POLLINGCMTE.Just as discussed herein, this consignee another name one or more consignees that can be stored in the address superset 130 call in the form 143.
Similar, street another name data can be stored in one or more streets another name form 142 of address superset 130.For example, the field in the street another name form 142 can be arranged by mode shown in Figure 13.Exemplary street another name form 142 among Figure 13 comprises several street another names of the Sixth Avenue of New YorkCity, and this street is also referred to as Avenue ofthe Americas.Street another name form 142 can be taked in this tabulation of comparing the form that is easy to visit when street address writes down.
In one aspect of the invention, can indicate address date base management system 10 will call one of expression and be labeled as " preferred expression ".Various streets another name and consignee's another name are being applied under the data conditions that is stored in the address date superset 130, and one of token T4081 (for example) can be marked as preferred expression.Like this, preferred token 70 can comprise a sign, and for example " p " is preferred to represent, thereby makes preferred token 70 look like T4081p.System 10 of the present invention can recognize that all records of the address with token T4081 all are equal to.In one embodiment, discerning its preferred workpiece (being marked as T4081p) that (for example T4081p) can help to guarantee particular street address of preferred token 70 and mark can be returned in response to inquiry all the time.
In this aspect of the invention, verification step 320 can be configured to utilize the hierachical data structure of inquiry with the data ordering Cheng Xin of storage in one embodiment.In one embodiment, one or more tokens can be marked as or otherwise be identified as preferred token 70, with the preferred expression of home address or specific workpiece.
In related aspect, management system of the present invention can be configured to transmit token (rather than text) between the various assemblies of system 10 of the present invention.The exchange token comes more efficiently and more to be difficult for makeing mistakes compared with the long location text of ploughing of exchange.In aspect this, token further accelerated the processing, report of inquiry as unique identifier and to the analysis of the other types that are stored in the data in the superset.
In one embodiment, verification step 320 part that can be used as the suite of programs group 500 of address management system 110 is performed (for example seeing Fig. 7).Can carry out verification step 320 to the result who duplicates superset 330 and deliver to AMS client 655.In address management system 110, under the situation of using one or more technology described herein, can be in 100 to 200 milliseconds of scopes from catching step 300 to delivering step 395 elapsed time.
5.4.5. RelativelyIn one embodiment, verification step 320 generally comprises represents that with subjectivity 80 compare with the value in the form that is stored in superset 30, thus the preferred expression 90 of search.In the border, field of address management system 110, address validation 320 relate generally to the subjectivity of Input Address represent 80 be stored in address superset 130 in address database 131,132,133 intermediate values compare (as shown in Figure 1) and the preferred expression 90 of identification address.
In block diagram shown in Figure 2, verification step 320 has occupied single.But as described herein-in, verification step 320 can relate to a large amount of steps and the rules that are used to verify the address.Part has before outline a plurality of data manipulation routines and searching method, and has described the process of relatively importing the data of data and storage synoptically.More specifically, in one embodiment, the comparison procedure of verification step 320 can comprise the step of the following numbering of listing.
(1) will import data (80) is stored in the preferred form 141.4 in the plan database 134 (referring to Fig. 1).
(2) the input data that will be stored in the preferred form 141.4 are compared with the data value in being stored in other preferred forms 141.1,141.2 and 141.3 (if any).Remember, in one embodiment, each form in the superset may all be transformed into sparse matrix linked list as mentioned above, utilizes node and classification to comprise rank and is rearranged, and/or by tokenization, in each form, carry out fast and search efficiently helping.Comparison procedure can comprise that the one or more candidates in location represent in the data value from be stored in other preferred forms 141.1,141.2,141.3.Find coupling generally can comprise to select and searched subjectivity represents that 80 the most similar candidates represent.
(a) if between input data and preferred list data, find coupling, the corresponding preferred token 70 in location then, and proceed to and carry out the renewal 380 shown in Figure 12, combination 390 and deliver 395 steps.
(b) if the coupling of not finding then proceeds to following step (3).
(3) will be stored in the preferred form 141.4 Street nameThe input data are compared with the street another name data value in being stored in street another name form 142.1,142.2 and 142.3.Comparison procedure can comprise one or more candidates street, location another name in the data value from be stored in street another name form 141.2,142.2,142.3.Find coupling generally can comprise to select candidate street another name with the tight association of preferred token.
(a) if find coupling between street name input data and street another name list data, then the preferred token 70 of the preferred street of positioning mark another name is called with the street class of street name ReplaceStreet name in the preferred form 141.4, and utilize this street another name to repeat above step (1).
(b) if the coupling of not finding then proceeds to following step (4).
(4) will be stored in the preferred form 141.4 Consignee's titleThe input data be stored in the consignee and call consignee in the form 143.1 (if any), 143.2 and 143.3 and call data value and compare.Comparison procedure can comprise from being stored in the consignee calls the one or more candidate consignees' another names in location in the data value the form 143.1,143.2,143.3.Find coupling generally can comprise to select candidate consignee's another name with the tight association of preferred token.
(a) if call consignee's title input data and consignee and to find coupling between the list data, the preferred token 70 of the preferred consignee's another name of positioning mark then is with corresponding consignee's another name of consignee's title ReplaceConsignee's title in the preferred form 141.4, and utilize this consignee's another name to repeat above step (1).
(b) if the coupling of not finding then proceeds to following step (5).
(5) return abnormality code 400 to user 28 or application.
(6) in one embodiment, verification step 320 can comprise the tabulation that shows possible coupling (address, street another name, consignee's another name) and allow user 28 to carry out visual comparison and manually select the step of one of (if suitably) possible coupling as preferred expression.
(a) if manually select, then comparison procedure will proceed to and carry out the renewal 380 shown in Figure 12, combination 390 and deliver 395 steps.
(b), then import data and abnormality code 400 will be transmitted out verification system, so that further handle if manually select.
The method of describing in the above step (2) that is used to find preferred address to represent can comprise following extra step:
(a) subjectivity is represented to resolve to one or more discrete workpieces;
(b) select one of one or more discrete workpieces:
(1) by a described discrete workpieces and source data being compared the one or more candidate's workpiece in location from source data;
(2) select preferred workpiece from one or more candidate's workpiece, this preferred workpiece is the most similar to a described discrete workpieces;
(3) this preferred workpiece of storage;
(c) be each repeating step (b) in one or more discrete workpieces;
(d) the preferred workpiece of combination is to form preferred expression.
Similarly, the method for describing in above step (3) and (4) that is used to find preferred another name to represent can comprise following extra step:
(a) subjectivity is represented to resolve to one or more discrete workpieces;
(b) select one of one or more discrete workpieces:
(1) by a described discrete workpieces and another name data are compared the one or more candidate's workpiece in location from source data;
(2) call the preferred another name of selection workpiece the workpiece from one or more candidates, this preferably calls workpiece and preferred another name token is the most related;
(3) workpiece should be preferably called in storage;
(c) be each repeating step (b) in one or more discrete workpieces;
(d) add preferred another name to preferably calling workpiece.
In one embodiment, the term " coupling " that uses in the above-mentioned comparison step can relate to one or more workpiece of analyzing way address and whether effectively arrive enough formations " coupling " with the similarity between the specified data.For example, following criterion can be suitable for:
1. require exact match for the main address that comprises street number and street name.
Only when secondary address be present in carrier's database 132 and it when being associated with main address, just require exact match for secondary address (for example suite number).
3. when the consignee is present in plan database 134 (input data), just require exact match for consignee's title.
Should be appreciated that according to using and processing target, can establish other matching criterior.
5.5. interface
In one embodiment, data base management system (DBMS) 110 of the present invention can comprise interface 600 and suite of programs group 500, shown in Fig. 3 and Fig. 5-9.In one embodiment, interface 600 can comprise the computer program that is designed to provide operability connection or interface between application (for example the suite of programs group 500) and user's (or Another Application).Interface 600 can provide a series of orders, and these orders have allowed user's establishment, read, upgrade and delete the data value that is stored in the database table.These functions (CRUD) are mentioned with abbreviation CRUD sometimes, are therefore provided the interface of these orders can be called as the CRUD interface.The database interface that comprises query function can be called as the CRUDQ interface.
In one embodiment, interface 600 can be configured to the interface based on COM; The meaning is that it is based on The Component Object Model.The Component Object Model be can satellite interface 600 and various other assemblies of system of the present invention 10 between the open software architecture of interoperability.Though the interface 600 based on COM can be provided, also can use other software models to finish required function.
According to one embodiment of present invention, in interface 600, can comprise query function.Inquiry is order or the instruction that is used for extracting required data acquisition from database.Known best query language is Structured Query Language (SQL) (SQL, pronunciation is " sequel "), though also can use other query languages.Inquiry can comprise individual command, or complicated command series.SQL comprises a variety of querying commands.The querying command set that can be used again can be stored among the SQL, as the rules of storage.Similar with working procedure, the rules of adjusting the storage among the sequel are more efficient compared with querying command of each transmission.In addition, the rules of storage are generally compiled in advance, and can be by the data base management system (DBMS) buffer memory.Aspect this, querying command can be used as powerful programming tool.
5.5.1 Application identifierIn one embodiment, interface 600 can be configured to the inside and outside multiple distinct programs of the data base management system (DBMS) in using 110 and use operation and mutual.Interface 600 can be configured to operate with each assembly of internal processes cover group 500.Interface 600 can also be configured to one or more external programs of data base management system (DBMS) outside or use operation, and database application, the satellite report that described external program or application examples are relevant in this way used, separate traffic is used or wish or need carry out mutual multiple other programs with the data that are stored in the superset 30,130 from business.
In one embodiment, interface 600 of the present invention can comprise one or more application identifier, and wherein each has corresponding regular collection.Application identifier can be used to the application that identification request is visited data base management system (DBMS) of the present invention.Application identifier can be individual command or complicated algorithm.In general, application identifier is operated with identification request and the mutual application of database.
Each application identifier can comprise the mutual corresponding regular collection that can be used for retraining between application-specific 270 and the data base management system (DBMS).This query requests, reservation renewal, data transmission or other communications, the output format of can comprising alternately indicated or any other behavior.Application identities meeting and regular collection can be stored in the database, perhaps otherwise are saved into addressable form.
For example, in the border, field of address management system 110, application-specific 270 can be asked reference address superset 130 by sending inquiry.In response, interface 600 can be configured to identification and use 270, retrieves suitable application identifier, and retrieves corresponding regular collection again.Interface 600 can be delivered to regular collection address management system 110 then, be used to handle inquiry or with use 270 other are mutual.Address management system 110 can be handled inquiry or take other actions relevant with the application 270 that produces output data.Output data can be returned to interface 600, and here regular collection is used to confirm that output data taked to be employed the forms of 270 visits.In aspect this, address management system 110 and interface 600 thereof can collaborative work to come the request of self-application 270 by the service regeulations process of aggregation.
In aspect this, interface 600 of the present invention is general; Its meaning is that interface 600 can be configured to any application 270 operations and mutual.By safeguarding the regular collection that is separated with interface itself, the programming in the interface 600 does not need to comprise the rule that is used for all various application 270.On the contrary, by using application identifier, interface 600 can include only and be used to search and the relative simply order of retrieving the respective rule set.
When management system 110 requires with new application 270 alternately, may not need to revise interface 600.The action of unique needs is to be new 270 interpolation application identifier and the corresponding regular collection used.Interface 600 can be provided for importing the system of this fresh information.
5.5.2 The data capture degree of depthIn one embodiment, the regular collection of application-specific 270 can be configured to control which specific workpiece of seizure from data superset 30.In use, for example, first application may be only required the ZIP code data, and second application may require ZIP+4, Hezhou, city.Regular collection of the present invention can comprise the information about the data demand of the application-specific 270 in using of storage.By the degree or the degree of depth that control data is caught, regular collection can increase the efficient and the speed of the data in interface 600 access system 10.
6. conclusion
The described embodiment of the invention is only wanted as example.For those skilled in the art, many variations and modification all are conspicuous.All this variations and revise are all wished to drop within the scope of the present invention that appended claims limits.
Several examples that comprises described above.Certainly, for the system that adopts in the data of description base management system, method, computer-readable medium or the like, the combination of each assembly that can conceive out or method can not be described.But those of ordinary skill in the art can recognize that other combination and permutation are possible.Therefore, the present invention wants to comprise change, the modifications and variations within the scope that drops on appended claims.In addition, more than description is not wanted to limit the scope of the invention.On the contrary, scope of the present invention is only determined by appended claims and equivalent thereof.
Though by describing clear system, the method and apparatus here of example, though and described in detail these examples, and be that the applicant does not want the scope of appended claims to be limited to or be restricted to by any way on this details.Those skilled in the art, extra advantage and modification all are very conspicuous.Therefore, the present invention is not limited to detail, representative system and method or illustrated examples shown and that describe with regard to the aspect of its broad.Therefore, may under the situation of the spirit or scope of the general creative notion that does not break away from the inventor, break away from this details.

Claims (45)

1. data structure comprises:
Superset, it comprises the major database that functionally is connected to one or more low priority datas storehouse, wherein
In described major database and the one or more low priority datas storehouse each comprises first form, and this first form functionally is linked to one or more other forms, and
In described first form and one or more other forms each is shared common data structure.
2. data structure as claimed in claim 1, each in wherein said major database and the one or more low priority datas storehouse is a relational database.
3. data structure as claimed in claim 1, wherein said corporate data structure comprises sparse matrix linked list.
4. data structure as claimed in claim 1, wherein said corporate data structure comprises a plurality of records that comprise data, described record is based on described data by from generally arranging with hierarchical sequence to concrete a series of ranks.
5. data structure as claimed in claim 1, wherein:
Described major database comprises the source form,
For the first time want database to comprise the another name form,
For the second time want database to comprise standardized tabular, and
Want for the third time database to be configured to accept and storage input data.
6. data structure as claimed in claim 5, wherein:
Described source form comprises the data recording that obtains from public or privately owned source,
Described another name form comprises record one or morely is equal to expression, and
Described standardized tabular comprises that one or more standardization of record represent.
7. data structure as claimed in claim 6, wherein said source form comprise from the address record of government's mail service and commercial source acquisition.
8. data structure as claimed in claim 1 is used to store the form that comprises one or more workpiece, wherein:
Described first form comprises preferred record,
First other forms comprise main canonical name, and
Second other forms comprise less important canonical name.
9. data structure as claimed in claim 8, wherein:
Described preferred record comprises one or more preferred expressions,
Described main canonical name comprises main workpiece one or morely is equal to expression, and
Described less important canonical name comprises less important workpiece one or morely is equal to expression.
10. data structure as claimed in claim 9, wherein said preferred record comprises the one or more preferred expression of address.
11. the method for optimum search preparation data, described data storage is in comprising one or more databases of a plurality of chained record forms, and described method comprises:
Based on described data by from generally arranging the described form described record in each with hierarchical sequence to concrete a series of ranks; And
In the described form each is transformed into one or more sparse matrix linked list forms.
12. method as claimed in claim 11, wherein said one or more databases are present in the client-server network environment, and described method also comprises:
The duplicate of described one or more sparse matrix linked list forms is distributed to one or more clients from server.
13. method as claimed in claim 11, wherein said one or more databases are that interconnection is to form the relational database of data superset.
14. method as claimed in claim 11, wherein said data comprise the address workpiece.
15. a device that is used to optimum search to prepare data, described data storage is in comprising one or more databases of a plurality of chained record forms, and described device comprises:
CPU (central processing unit);
Storer;
Basic input/output; And
Program storage device, it comprises can be by the program module of described CPU (central processing unit) execution, and described program module comprises:
Be used for based on described data by from generally arranging the device of the described record of described form each with hierarchical sequence to concrete a series of ranks; And
Each that is used for described form is transformed into the device of one or more sparse matrix linked list forms.
16. device as claimed in claim 15 also comprises:
Away from one or more clients of described CPU (central processing unit), described program module also comprises:
Be used for the duplicate of described one or more sparse matrix linked list forms is distributed to from server the device of one or more clients.
17. one kind is used the database of link form that subjectivity is represented to convert to the method for preferred expression, comprising:
Catch described subjective expression and it is stored in first in the described link form and link in the form;
Source data is stored in the link of second in the described link form form;
The one or more candidates in location represent from described source data by described subjective expression and described source data are compared;
Select preferred expression from described one or more candidates represent, described preferred expression is the most similar to described subjective expression; And
Deliver described preferred expression.
18. method as claimed in claim 17 also comprises:
Check that described source data comprises one or more selection records of preference data with identification; And
Add preferred token to described one or more selection record;
19. method as claimed in claim 17, the step that wherein said selection is preferably represented comprise that identification and described one or more candidates one of represent the preferred token that is associated.
20. method as claimed in claim 17, the step that the one or more candidates in wherein said location represent also comprises:
(a) described subjective expression is resolved to one or more discrete workpieces;
(b) select one of described one or more discrete workpieces:
(1) by a described discrete workpieces and described source data being compared the one or more candidate's workpiece in location from described source data;
(2) select preferred workpiece from described one or more candidate's workpiece, described preferred workpiece is the most similar to a described discrete workpieces;
(3) the described preferred workpiece of storage;
(c) be each repeating step (b) in described one or more discrete workpieces;
(d) the described preferred workpiece of combination is to form preferred expression.
21. method as claimed in claim 17, the step that the one or more candidates in wherein said location represent also comprises:
In the three link form of another name data storage in described link form;
Check that described another name data comprise one or more selection canonical names that preferred another name is represented with identification;
Add described one or more selection canonical name to preferably calling token;
By described subjective expression and described another name data being compared the one or more candidate's another names in location from described another name data;
From described one or more candidates another name, select preferred another name, the tight association of described preferred another name and described preferred another name token; And
Delivering described preferred another name represents as the candidate.
22. method as claimed in claim 21, the step of the one or more candidate's another names in wherein said location also comprises:
(a) described subjective expression is resolved to one or more discrete workpieces;
(b) select one of described one or more discrete workpieces:
(1) by a described discrete workpieces and described another name data being compared the one or more candidate's workpiece in location from described source data;
(2) call the preferred another name of selection workpiece the workpiece from described one or more candidates, described preferred another name workpiece is the most related with described preferred another name token;
(3) the described preferred another name workpiece of storage;
(c) be each repeating step (b) in described one or more discrete workpieces;
(d) add described preferred another name workpiece to described preferred another name.
23. a database that is used to use the link form represents to convert to the device of preferred expression with subjectivity, comprising:
CPU (central processing unit);
Storer;
Basic input/output; And
Program storage device, it comprises can be by the program module of described CPU (central processing unit) execution, and described program module comprises:
Be used for catching described subjective expression and it is stored in first device that links in the form of described link form;
Be used for source data is stored in second device that links in the form of described link form;
Be used for by described subjective expression and described source data being compared the device of representing from the described source data one or more candidates in location;
Be used for representing to select the preferred device of representing from described one or more candidates, described preferred expression is the most similar to described subjective expression; And
Be used to deliver the device of described preferred expression.
24. device as claimed in claim 23, described program module also comprises:
Be used to check that described source data comprises the device of one or more selection records of preference data with identification; And
Be used for preferred token is added to the device of described one or more selection records;
25. also comprising, device as claimed in claim 23, wherein said program module be used to discern the device of one of representing the preferred token that is associated with described one or more candidates.
26. device as claimed in claim 23 wherein saidly is used to locate the device that one or more candidates represent and also comprises:
(a) be used for the described subjective device of representing to resolve to one or more discrete workpieces;
(b) be used to select the device of one of described one or more discrete workpieces:
(1) is used for by a described discrete workpieces and described source data being compared from the device of the one or more candidate's workpiece in described source data location;
(2) be used for from the device of the preferred workpiece of described one or more candidate's workpiece selections, described preferred workpiece is the most similar to a described discrete workpieces;
(3) be used to store the device of described preferred workpiece;
(c) be used to the device of each repeating step (b) in described one or more discrete workpieces;
(d) be used to make up described preferred workpiece to form the device of preferred expression.
27. device as claimed in claim 23 wherein saidly is used to locate the device that one or more candidates represent and also comprises:
Be used for the device of another name data storage in the 3rd link form of described link form;
Be used to check that described another name data comprise the device of one or more selection canonical names that preferred another name represents with identification;
Be used for and preferably call the device that token adds described one or more selection canonical names to;
Be used for by described subjective expression and described another name data are compared from the device of the one or more candidate's another names in described another name data location;
Be used for selecting the device of preferred another name, the tight association of described preferred another name and described preferred another name token from described one or more candidates another name; And
Be used to deliver the device that described preferred another name is represented as the candidate.
28. device as claimed in claim 27, the wherein said device that is used to locate one or more candidates' another names also comprises:
(a) be used for the described subjective device of representing to resolve to one or more discrete workpieces;
(b) be used to select the device of one of described one or more discrete workpieces:
(1) is used for by a described discrete workpieces and described another name data being compared from the device of the one or more candidate's workpiece in described source data location;
(2) be used for calling the device that workpiece is selected preferred another name workpiece from described one or more candidates, described preferred another name workpiece is the most related with described preferred another name token;
(3) be used to store the device of described preferred another name workpiece;
(c) be used to the device of each repeating step (b) in described one or more discrete workpieces;
(d) be used for described preferred another name workpiece is added to the device of described preferred another name.
29. a method of controlling one or more applications to access of database comprises:
Establish and store a plurality of regular collections, wherein each is relevant with one of described one or more applications;
Reception is from first request of using;
Retrieval is used the first relevant regular collection with described first; And
Use described first regular collection with control described first use and described database between mutual.
30. comprising, method as claimed in claim 29, wherein said first regular collection can catch so that by described first tabulation of using the data of using from described database.
31. one kind control database in response to method from the degree of depth of the data capture of one or more applications, comprising:
Establish and store a plurality of regular collections, wherein each is relevant with one of described one or more applications;
In described a plurality of regular collection each comprises the tabulation of the data that will catch from described database;
Reception is from first request of using;
Retrieval is used the first relevant regular collection with described first; And
Using described first regular collection can be from the data of described database acquisition to limit described first application.
32. a data structure comprises:
The database of link leading schedule and one or more less important forms, each in the described form is shared common data structure;
Described database is controlled by data base management system (DBMS), and described data base management system (DBMS) is configured to the one or more sparse matrix linked list that are transformed in the described form.
33. data structure as claimed in claim 32, wherein said database comprises the relational database of one or more interconnection.
34. data structure as claimed in claim 32, wherein said data base management system (DBMS) comprises interface and authentication module.
35. data structure as claimed in claim 34, wherein said interface are controlled one or more applications to described access of database.
36. data structure as claimed in claim 32, wherein said data base management system (DBMS) also are configured to data are represented to convert to preferred expression from subjectivity.
37. a data structure that is used for data base management system (DBMS) comprises:
The first value form, the preferred characteristics of its representation parameter;
The second value form, it represents the input data of characterization parameter;
The 3rd value form, it presses the staging hierarchy arrangement, with the process that helps described input data and corresponding preferably sign are complementary,
In the wherein said form each comprises sparse matrix linked list.
38. a method that is used for characterization parameter comprises:
Receive the input data that characterize the parameter in first form;
Revise described input data according to the form that the another name that is stored in second form characterizes; And
Amended input data and the preferred sign that is stored in the 3rd form are complementary.
39. an address management system comprises:
Superset, it comprises the major database that functionally is connected to one or more low priority datas storehouse, and each in the described database comprises the form of a plurality of links, and each in the described form is shared common data structure;
Strengthen module, it is configured to the one or more sparse matrix linked list that are transformed in the described form;
Announce and subscription module, be used for the distribution of Control Server-client network environment data;
Coupling and authentication module are used for the subjectivity of address is represented to convert to the preferred expression of described address; And
Interface is used to control the visit of one or more applications to described superset.
40. system as claimed in claim 39, wherein said enhancing module also are configured to based on described data by from generally arranging record one or more described forms to concrete a series of ranks with hierarchical sequence.
41. system as claimed in claim 39, in the base:
Described major database comprises the source form,
For the first time want database to comprise the another name form,
For the second time want database to comprise standardized tabular, and
Want for the third time database to be configured to accept and storage input data.
42. system as claimed in claim 41, wherein:
Described source form comprises the data recording that obtains from public or privately owned source,
Described another name form comprises record one or morely is equal to expression, and
Described standardized tabular comprises that one or more standardization of record represent.
43. system as claimed in claim 42, wherein said source form comprises from the address record of government's mail service and commercial source acquisition.
44. system as claimed in claim 40 is used to store the record that comprises one or more addresses workpiece, wherein:
Described first form comprises preferred record,
Second form comprises main canonical name, and
The 3rd form comprises less important canonical name.
45. system as claimed in claim 44, wherein:
Described preferred record comprises one or more preferred expressions,
Described main canonical name comprises main address workpiece one or morely is equal to expression, and
Described less important canonical name comprises the secondary address workpiece one or morely is equal to expression.
CNB2003801108259A 2003-10-21 2003-10-21 Data structure and management system for a superset of relational databases Expired - Lifetime CN100421107C (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2003/033349 WO2005050481A1 (en) 2003-10-21 2003-10-21 Data structure and management system for a superset of relational databases

Publications (2)

Publication Number Publication Date
CN1879104A true CN1879104A (en) 2006-12-13
CN100421107C CN100421107C (en) 2008-09-24

Family

ID=34618841

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003801108259A Expired - Lifetime CN100421107C (en) 2003-10-21 2003-10-21 Data structure and management system for a superset of relational databases

Country Status (7)

Country Link
EP (1) EP1687741A1 (en)
JP (1) JP2007535009A (en)
CN (1) CN100421107C (en)
AU (1) AU2003284305A1 (en)
CA (1) CA2543159C (en)
MX (1) MXPA06004481A (en)
WO (1) WO2005050481A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373613A (en) * 2009-04-16 2016-03-02 泰必高软件公司 Policy-based storage structure distribution
CN107609406A (en) * 2017-08-09 2018-01-19 南京邮电大学 A kind of express delivery address encryption method based on geocoding
CN110998542A (en) * 2017-05-24 2020-04-10 东新软件开发株式会社 Data exchange system, data exchange method, and data exchange program

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7548935B2 (en) * 2002-05-09 2009-06-16 Robert Pecherer Method of recursive objects for representing hierarchies in relational database systems
CN102841916B (en) 2005-01-28 2016-12-14 美国联合包裹服务公司 The registration of address date and the method and system of maintenance about each service point in area
CN100367280C (en) * 2005-11-07 2008-02-06 西安工程科技学院 System for sharing 3D data of measuring human body on Internet, and method of data fusion
US8204856B2 (en) 2007-03-15 2012-06-19 Google Inc. Database replication
US7822729B2 (en) 2007-08-15 2010-10-26 International Business Machines Corporation Swapping multiple object aliases in a database system
US7788305B2 (en) * 2007-11-13 2010-08-31 Oracle International Corporation Hierarchy nodes derived based on parent/child foreign key and/or range values on parent node
US8538934B2 (en) * 2011-10-28 2013-09-17 Microsoft Corporation Contextual gravitation of datasets and data services
CN103093218B (en) * 2013-01-14 2016-04-06 西南大学 The method of automatic identification form types and device
US10223637B1 (en) 2013-05-30 2019-03-05 Google Llc Predicting accuracy of submitted data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5387783A (en) * 1992-04-30 1995-02-07 Postalsoft, Inc. Method and apparatus for inserting and printing barcoded zip codes
WO1996034354A1 (en) * 1995-04-28 1996-10-31 United Parcel Service Of America, Inc. System and method for validating and geocoding addresses
US5881169A (en) * 1996-09-13 1999-03-09 Ericsson Inc. Apparatus and method for presenting and gathering text entries in a pen-based input device
US6542896B1 (en) * 1999-07-20 2003-04-01 Primentia, Inc. System and method for organizing data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373613A (en) * 2009-04-16 2016-03-02 泰必高软件公司 Policy-based storage structure distribution
CN105373613B (en) * 2009-04-16 2019-05-14 泰必高软件公司 Memory structure based on strategy is distributed
CN110998542A (en) * 2017-05-24 2020-04-10 东新软件开发株式会社 Data exchange system, data exchange method, and data exchange program
CN110998542B (en) * 2017-05-24 2023-12-29 东新软件开发株式会社 Data exchange system, data exchange method, and data exchange program
CN107609406A (en) * 2017-08-09 2018-01-19 南京邮电大学 A kind of express delivery address encryption method based on geocoding

Also Published As

Publication number Publication date
EP1687741A1 (en) 2006-08-09
CA2543159A1 (en) 2005-06-02
JP2007535009A (en) 2007-11-29
CN100421107C (en) 2008-09-24
CA2543159C (en) 2010-08-10
AU2003284305A1 (en) 2005-06-08
WO2005050481A1 (en) 2005-06-02
MXPA06004481A (en) 2006-07-10

Similar Documents

Publication Publication Date Title
CN1194319C (en) Method for retrieving, listing and sorting table-formatted data, and recording medium recorded retrieving, listing or sorting program
CN1155906C (en) data processing method, system, processing program and recording medium
CN1822003A (en) Database
CN1144145C (en) Method and apparatus for selecting aggregate levels and cross product levels for a data warehouse
CN1109994C (en) Document processor and recording medium
US7305404B2 (en) Data structure and management system for a superset of relational databases
CN1204515C (en) Method and apparatus for processing free-format data
CN1178164C (en) Information taking method, equipment, weighted method and receiving equipment for graphic and character television transmission
CN1271558C (en) Apparatus and method for identifying form shape
CN1317116A (en) Value-instance-connectivity computer-implemented database
CN1306437C (en) Method for combining table data
CN1132564A (en) Method and appts. for data storage and retrieval
CN1053852A (en) Name resolution in the catalog data base
CN1604082A (en) Mapping architecture for arbitrary data models
CN1761956A (en) Systems and methods for fragment-based serialization
CN101036141A (en) A database management system with persistent, user- accessible bitmap values
CN1190477A (en) Method and apparatus for modifying existing relational database schemas to reflect changes made in corresponding object model
CN1728142A (en) Phrase identification in an information retrieval system
CN1728141A (en) Phrase-based searching in an information retrieval system
CN1749999A (en) Durable storage of .NET data types and instances
CN1666196A (en) Method and mechanism of storing and accessing data and improving performance of database query language statements
CN1728143A (en) Phrase-based generation of document description
CN1752963A (en) Document information processing apparatus, document information processing method, and document information processing program
CN1292901A (en) Database apparatus
CN1856783A (en) Data management structure associated with general data item

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20080924

CX01 Expiry of patent term