WO1998047081A1

WO1998047081A1 - Digital data selection method

Info

Publication number: WO1998047081A1
Application number: PCT/EP1998/000932
Authority: WO
Inventors: Michael Buchstaller; Michael Mohr
Original assignee: Dci Datenbank Für Wirtschaftsinformation Gmbh
Priority date: 1997-04-15
Filing date: 1998-02-18
Publication date: 1998-10-22
Also published as: AU6297198A; DE19715723A1

Abstract

The invention relates to a method which can be carried out on a digital computer in order to optimize the access speed of digital data which is filed in a storage medium. The data is filed in a structured manner in a data field. Initially, search vectors are linked logically to each other. Subsequently, they are linked to the data field.

Description

Selection process for digital data

The invention relates to a selection method for execution on a digital computer to optimize the selection and access speed of digital data stored on a storage medium, in particular address data.

The current epoch is often called

"Information Age" referred to. This is related to the possibility provided by electronic data processing to process large amounts of information within a comparatively short time. Symptomatic is the constantly increasing number of databases, which should enable the user to get the information he needs or wants.

In this context, address databases are particularly noteworthy. They are becoming increasingly important, especially in business life, when it comes to establishing contacts with potential suppliers, customers, sales and cooperation partners or the like. which have a very specific requirement profile, i.e. meet specific criteria.

A high level of flexibility in the database system is particularly important in the area of address databases. The reason for this is that, especially with extensive address databases, any changes to the entries have to be made practically permanently. For example, a registered company can expand its product range, merge with another - registered or unregistered - company, a company relating to a certain product group. Outsource part, relocate, change company name, go bankrupt and much more. Only if the address database is always up-to-date by reacting to corresponding changes at short notice is it of interest to the user. In addition to the up-to-dateness of the database, other important criteria are the number of data records entered, the specificity of the information or selection criteria provided for each data record, and the selection and access speed.

The majority of the databases used today are based on the hierarchical data model, the network data model or the relational data model. While the first two data models have a tree or network structure, the relational data model is based on tables. All information in a database, i.e. both the objects and their relationships, are represented in the same way by tables. Relationships between objects exist when a value occurs in several relations. Existing relationships are only activated when the database is queried. The execution of queries is relatively complex because the relations do not support a fast search algorithm. Either the individual tuples have to be searched sequentially for a feature, or the algorithm itself has to generate efficiency-enhancing auxiliary data structures.

Existing address databases work with indexed fields. The dynamic maintenance of the database of such databases is difficult and complex. Changes within an entry, for example expanding the product range of a registered one Manufacturers, often require the creation of new fields; The same often applies in the case of new entries. This makes it necessary to change the programming in order to be able to query the new fields. An up-to-dateness corresponding to the high demands of the users cannot be achieved with such address databases, especially not with a large number of entries.

As far as the selection and access speed is concerned, when using previous database systems, the computing time is generally reduced by using powerful high-performance computers which, however, have a natural performance limit and can only be used in a few cases for economic reasons. In particular, when querying database information online in network databases, it is essential to minimize the computing time in order to avoid unnecessary online waiting times and thus increased query costs.

It is therefore an object of the present invention to provide an improved selection method for digital data, in particular for address data, which places relatively low demands on the hardware even with large amounts of data and, through a particularly short selection and access speed, in particular for online operation in Network databases is suitable.

This object is achieved with the features of claim 1.

This provides a selection process suitable for execution on a digital computer to optimize the Selection and access speed of digital data stored on a storage medium before that the data is stored in a structured manner in a data field and that search vectors relating to individual search criteria are logically linked to one another before the resulting result vector for the selection of data records which the cumulatively meet the desired search criteria with which the data field is linked. In this way, initially only search vectors, which are each assigned to a specific search criterion or search term and contain the information as to which data record fulfills this criterion, are logically linked to one another, in particular via the AND, OR and NOT functions. The amount of data to be moved is initially minimal, in particular if only those search vectors are moved that are actually of interest or meaning in the query or selection in question. However, even if all search vectors available for selection are transmitted to the user of the database in online operation at the beginning of a query, this requires only a minimal transmission time due to the comparatively small amount of data. Only at the end of the linking operation between the search vectors is the resulting vector linked to the data field and the corresponding data or data records, which are only available once, are read from the stored data field. The fact that only small amounts of data are moved and that only logical links and no sequential queries take place make it possible to obtain a desired query result in a very short time, even with large amounts of data. In this way, less powerful computing systems can be managed and queried. complex data sets can be used. The data field can have a cellular structure, with each line containing all the desired information about an entry. It is advantageous that the data fields are static, ie the maximum number of data records that can be entered has already been specified. For a data field that has not yet reached its final state, the remaining free lines are marked with zeros. Search vectors are created by grouping data, such as country affiliation, etc. The search vectors can, but do not have to, be created in advance - possibly automatically - and are then already available when queried.

In the sense of the above, a preferred development of the invention is characterized by a regular automatic creation of search vectors. All data records - primarily at night - are read automatically, and all those data records that meet a specific search criterion are noted in the search vector corresponding to this criterion. In the case of an address database, for example, the search vector "country affiliation Taiwan" is given a one at all the places where the assigned data record gives some indication of Taiwan, for example in the form of the prefix number ++ 886 for the telephone or fax number, by the Country code ROC or TW when specifying the address or other characteristics. According to a further example, the search vector “association” is given a one at all those places where the word “association” is contained in the corresponding data record in the description. During the same run, several lists (or search vectors) can be created or be updated. With a search vector routine that runs automatically every day, the most up-to-date search vectors are therefore always available for carrying out any queries. The time required for each query remains minimal. Instead of carrying out corresponding queries during the query, the search vectors, which are the result of that automatically running search vector routine, are outsourced and are available at the start of the query so that they can be downloaded into the main memory of the data processing system of the requester with minimal effort. This applies even if thousands of lists or search vectors have been prepared.

The advantage of the described method is that a conventional digital computer can be used as a fast online database. No "supercomputers" are required, since the access speed is very high even with complex queries and the memory requirement is very small compared to conventional databases. This is based, among other things, on make sure that all data appear only once, which is why the amount of data moved is small. Thus, using the invention, standard address hardware can be used, for example, to operate an address database which, with 3 million addresses entered and 1000 search criteria available, of which ten are combined in a query, for example, delivers the query result within seconds. Comparable is not possible in the use of known database systems, even if the hardware expenditure is many times greater.

An address database working according to the method according to the invention can also be extremely flexible respond to any change; It is not necessary to change the programming even when adding search vectors. This even allows the user of an address database to create his own, self-defined search vectors that can be taken into account in his query; For example, a search vector "black sheep", in which a one is assigned to all entries with which bad experiences have been made in the past, can be linked to the other search vectors when queried via the NICH function, as a result of which "black sheep""be excluded from the outset. The result vectors of previous queries can also be saved as user-specific search vectors for future queries. All of this documents the unimagined flexibility of the method according to the invention. Another advantage is that search vectors can be created without blocking the database. That is, while, in conventional databases, at least some of the data is blocked for a certain period of time when key fields are created for data retrieval, normal multiuser operation continues in the present method, while new search vectors are created or updated. The method is therefore ideally suited for operation in a multiuser network database. In addition, the method according to the invention can be used to carry out queries that are not possible in a conventional database that works with key fields. For example, all exhibitors at a trade fair can be identified in a very short time using the corresponding search vector, without this information necessarily being available in the data field itself. An advantageous development of the method according to the invention provides that in each case a search vector consisting of zeros and ones is assigned to a specific search criterion or search term. For example, in the case of an address database, the search vector “headquarters in Germany” has ones where the corresponding entries are present in the data field, whereas other country entries lead to a zero in the corresponding search vector. Given a processor word width of b bits, b queries can be queried with a single CPU instruction. The data sets of the search vectors are namely on average a factor of nx 10 ° bit smaller than those data sets for sequential queries of complex terms in conventional databases. The logical combination of zeros and ones is much closer to the machine than the use of complex search terms. Finally, the very good ability to coordinate the search vectors is advantageous for network queries. Thus, the use of such search vectors increases the computing and transmission speed and reduces the memory requirement.

Furthermore, an advantageous embodiment provides that when the search vectors are logically linked, a validation vector is first connected upstream. This means that in the validation vector, there are ones only in places where there are valid entries and otherwise zeros appear. This is necessary, for example, if old entries in a data record are to be invalidated. Then there would be a zero in front of this data record in the validation vector. The data record can also be changed by changing the validation vector accordingly only make it temporarily invalid, for example if - in the case of a supplier database - the company concerned has temporary delivery difficulties. Here again the particularly high flexibility of the method according to the invention becomes clear.

It is also advantageous if all the search vectors that are currently required are loaded into the main memory. This increases the calculation speed and thus shortens the calculation time. The method can even be carried out with standard hardware equipment, since the search vectors require only a very small storage capacity.

In addition, an advantageous development of the method according to the invention provides that the search vectors are automatically adapted when new entries are made in the database. This means that a check routine is run with each new entry, which queries the search criteria or search terms corresponding to the search vectors for this one entry and assigns one or zero to the respective search vector at the position corresponding to this entry, depending on the input. However, the result vectors which resulted from a combination of search vectors in an earlier query can also be used, e.g. Mailing list '97, are saved as a new search vector.

Furthermore, an advantageous embodiment of the method according to the invention provides that the digital data are stored on a read-only memory as a non-volatile storage medium. This can be, for example, a hard disk, a data tape, CD-ROM, EPROM or floppy disk. Finally, the use of such a method on a digital computer (server) in a network for online query of a network database is advantageously provided, since high access and query speeds are required especially in the case of online queries in order to avoid unnecessary costs due to computing time or hardware use . This also ensures efficient multi-user operation. The search vectors can be created without intervention in the data field, which means that downtimes are low.

The invention is explained in more detail below with the aid of two examples. An address field with n data records is assumed in each case. The query is preceded by the validation vector with n lines, which has a one for each valid entry.

In example 1, the address field is fully populated with n entries. When queried, the validation vector is first connected in front and logically linked by "and" to a first search vector that lists the subscribers listed in the address field. Furthermore, a search vector which indicates the participants in the direct debit procedure listed in the address field is also connected by "and". The resulting vector is then linked to the address field and the result is the desired data sets. Example

Address field:

1 Hans Mustermann, 2 Paul Meyer, ... 3 Karl Müller, ... 4 Rudi Hörig, ...

n Egon Zecher, ...

Search vectors:

Validation search vector Search vector Search vector

"valid" subscriber "" supplier "" direct debit entry "proceed"

Query:

valid entry and subscription and direct debit

1 Hans Mustermann, Hans Mustermann, 0 Paul Meyer, ... 0 0 Karl Müller, ... 0

1 and Rudi Hörig, ... Rudi Hörig, ...

Egon Zecher

If the various search vectors are already filled with zeros at "invalid" places, no validation vector is required.

In example 2, the address field prepared for n entries is not fully occupied. The validation vector has the entry 0 at the vacant positions. The example illustrates the query of a dental laboratory to an address database to prepare a mailing that is to be used to establish new business relationships. The mailing is intended for all dentists whose practice is in the German postcode area 8, unless it is a dentist with an attached laboratory; furthermore, existing own customers should not receive the mailing. To perform this query the validation vector, the search vector "dentist" and the search vector "German postcode area 8" are linked to one another via the AND function and via the NOT function to the search vector "dental laboratory" and the search vector "own customer". The result vector is then linked to the address field and the result is the desired data records.

Example

Address field:

1 A.B.

2 CD.

3 E.F.

4 G.H.

5 I.K.

6 L.M.

7 N.O.

8 P.Q.

n-1 empty n empty

Search vectors:

Validation search vector Search vector Search vector Search vector ve tor

"valid" dentist "" dental laboratory "" DE-P Z- "own entry" area 8 "customer"

1 2 3 4 5 6 7 8

n-1 n

Query:

Valid entry of a dentist without his own laboratory in postcode area 8 who is not his own customer

Result vector address field

The mailing is sent to A.B. and ai i P.Q. sent.

Claims

claims

1. Selection process for execution on a digital computer to optimize the selection and access speed of digital data stored on a storage medium, in particular address data, which are stored in a structured manner in a data field, with search vectors relating to individual search criteria initially being logically linked to one another in the context of a query , before the resulting result vector for selecting data records that meet the desired search criteria cumulatively is linked to the data field.

2. The method according to claim 1, characterized in that a search vector consisting of zeros and ones is assigned to a search criterion or a search term.

3. The method according to claim 1 or claim 2, characterized in that a validation vector specifies whether a data set is valid or not.

4. The method according to claim 3, characterized in that the logic vector of the search vectors is first upstream of the validation vector.

5. The method according to any one of claims 1 to 4, characterized in that all search vectors are loaded into the working memory.

6. The method according to any one of claims 1 to 5, characterized in that the search vectors are automatically adjusted for new entries in the database.

7. Method according to one of claims 1 to 6, characterized in that the digital data are stored on a read-only memory as a storage medium.

8. The method according to any one of claims 1 to 7, characterized in that at the end of the method, the result vector is stored as a new search vector.

9. Use of a method according to claim 1 on a digital computer in a network for online query in a network database.