WO2001001345A1 - Data processors - Google Patents
Data processors Download PDFInfo
- Publication number
- WO2001001345A1 WO2001001345A1 PCT/GB2000/002303 GB0002303W WO0101345A1 WO 2001001345 A1 WO2001001345 A1 WO 2001001345A1 GB 0002303 W GB0002303 W GB 0002303W WO 0101345 A1 WO0101345 A1 WO 0101345A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- separator
- tuples
- correlation matrix
- input data
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90339—Query processing by using parallel associative memories or content-addressable memories
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
Definitions
- This invention relates to data processors, and is concerned particularly although not exclusively with the matching of and subsequent processing of data.
- known data matching processors In known data matching processors, one enters query data to be matched with existing known data. To find a match, the query data is compared with all known existing data until a match is found. This can be a slow process, even with substantial processors.
- processors organise stored data in fields (as in a database), so that in order to find a stored record, the data in the query must also be organised into fields (e.g. street name, town, postcode, etc.). In many cases, the field names may be unknown. For example, in entering as query data a postal address to be matched, one may not know if a word belongs to a "street” field or a "town” field.
- Preferred embodiments of the invention aim to provide data processors that provide rapid matching of query data that can be of much greater length, and which can remove the need for additional identifiers such as field names.
- query data can be entered, largely irrespective of order.
- a data processor comprising: a) a correlation matrix memory arranged to store data;
- a sampler arranged to derive, from each set of input data, a respective set of tuples
- a separator generator arranged to generate for each set of input data a respective, associated, unique separator
- h addressing means arranged to applying to the correlation matrix memory, for each set of input data, the respective combined coded tuples as a row address and the respective unique separator as a column address, or vice-versa.
- the combined coded tuples for each set of input data are in the form of a binary coded word; the data processor further comprises a translator arranged to translate each such binary coded word into a translated word comprising index values representing which bits of the binary coded word are set; and said addressing means is arranged to apply the translated word to the correlation matrix memory.
- said separator generator is arranged to generate separators in a random manner.
- each tuple comprises three successive elements of a respective set of input data, and each successive tuple is offset by one such element from the preceding tuple.
- said coder is arranged to code said tuples by tensoring.
- said combiner is arranged to combine the coded tuples for a respective set of input data, by superimposition.
- At least some of the rows (or columns) of the correlation matrix memory are represented by binary words, each of which represents the positions of each bit in the respective row (or column) which is set.
- said correlation matrix memory comprises a plurality of sub-correlation matrix memories; said addressing means is arranged to access a first one of said sub-correlation matrix memories and apply the combined coded tuples of a respective set of input data to that sub- correlation matrix memory unless a respective row (or column) of that sub- correlation matrix memory will become saturated by application of those tuples; and in the event of such prospective saturation, access successive ones of the sub-correlation matrix memories until those tuples can by applied to a respective one of the sub-correlation matrix memories without such saturation.
- a data processor may be arranged to receive sets of query data to be matched with sets of input data stored in the correlation matrix memory, and to derive, for each set of query data, a respective set of coded tuples analogous to those derived for the original input data, and to apply to the correlation matrix memory, for each set of query data, the respective combined coded tuples as a row (or column) address: the data processor further comprising:
- output means for outputting a raw superimposed separator which represents, for a respective set of query data, the number of rows (or columns) having a bit set by the applied combined coded tuples in each column (or row) represented by the raw superimposed separator;
- threshold means arranged to convert the raw superimposed separator into a binary superimposed separator
- said thresholding means sets an absolute threshold value, and provides said binary superimposed separator as a word in which bits represent respective columns (or rows) of the correlation matrix memory, and each of those bits is set if the number of rows (or columns) having a bit set by the applied combined coded tuples in the respective column (or row) equals or is greater than said absolute threshold value.
- Said thresholding means may determine a value k, and provide said binary superimposed separator as a word in which bits represent respective columns (or rows) of the correlation matrix memory, and are set for the k respective columns (or rows) having the highest number of rows (or columns) which have a bit set by the applied combined coded tuples in the correlation matrix memory.
- a data processor as above may further comprise back-checking means arranged to compare sets of recalled data, identified by respective separators extracted by said extractor, with original query data, in order to identify the set or sets of recalled data which matches best the original query data.
- a data processor may be arranged to process sets of input data and query data in the form of postal addresses.
- a raw superimposed separator which represents, for a respective set of query data, the number of rows (or columns) having a bit set by the applied combined coded tuples in each column (or row) represented by the raw superimposed separator; e) converting the raw superimposed separator into a binary superimposed separator;
- FIG 1 is an illustration of a correlation matrix memory (referred to herein for convenience as a "CMM");
- Figure 2 illustrates tupling, tensoring and coding of a simple input word
- Figure 3 is an illustration similar to that of Figure 1, but showing an output array of summed values;
- Figure 4 is a block diagram of a data processor comprising one example of an embodiment of the invention.
- Figure 5 illustrates an example of steps carried out by the data processor of Figure 4, to locate data stored in the correlation matrix memory of the data processor;
- Figure 6a shows a conventional implementation of a CMM line
- Figure 6b shows an alternative implementation of a CMM line
- Figure 7 shows line usage thresholds for compressed CMMs
- Figure 8 shows memory requirement for various sizes of CMM
- Figure 9 shows relative memory savings for various sized CMMs, using compression
- FIG 10 is an illustration of an MBI ("Middle-Bit-Indexing") extractor.
- the basic principle of operation of a correlation matrix memory is illustrated in Figure 1.
- the CMM is illustrated as having a plurality of rows and columns.
- the intersection of each row and column in Figure 1 represents one bit of memory so that, conceptually, the CMM comprises many one-bit cells, each initially set to O'.
- every cell at the intersection of an active row and an active column is set to T, regardless of its current state. Other cells are left unchanged.
- the intersections with a solid dot • have previously been set to '1', whilst additional intersections with a hollow dot O are newly set by the illustrated row address pattern 001000010 and column address pattern 010000100.
- a row address pattern is applied, and a row is addressed only when a '1' appears in the row address pattern.
- a sum is generated of all intersections set at '1' in all rows addressed by the applied row address pattern, and the sum is subsequently processed.
- Figure 1 is a very simple example to illustrate the mode of operation. In practice, one would expect the numbers of intersections set at '1' to be rather higher, for at least some of the columns.
- the address data used in this example which has been the subject of our experimental research, is in a data file containing postal addresses.
- the first step is to pre-process the address data file to combine some redundant elements representing small variations in basically the same address, and to remove exact duplications so that each record is unique.
- the main objective here is to reduce the size of file needed to represent the data, so that smaller CMMs can be used.
- the current implementation of preprocessing reduces the number of records that must be stored from 26 million in the address data file to just over 4.3 million records.
- the address data file already contains some groups of multiple addresses in abbreviated form, stored in a single record.
- the even or odd-numbered houses in a street may appear as "N- M Argyle Street" (say) in a single record.
- the procedures described here additionally allow consecutive numbers to be combined into an address range in a single record as well.
- N records are identical, (N-l) of those records are removed. If a consecutive range of two or more records differs in only one word, and that word is not the postcode, then the records are replaced by a new combined record using the following syntax:
- Alphanumeric strings (but not postcode strings) are replaced by a new combined ("folded") record but separated by whitespace, and delimited by " > " and " ⁇ " characters. Thus, for example.
- records are transferred to the input database file in a pseudorandom order.
- the intended purpose of this re-ordering is to reduce the occurrence of clusters of similar text strings, by distributing these more uniformly through the input database. That is, the original address database is supplied in what is called 'Postcode Area Order', where the file is sorted according to the postcode. Addresses which share the same postcode are then sorted according to other fields in the address database such as street name, building name and locality. This means that, for example, the first 3000 or so addresses in the address data file all belong to the AB10 postcode area - somewhere in Aberdeen. All of these records will therefore have a much higher degree of similarity than 3000 addresses taken at random from the database. By taking the records in random order, there will tend to be a much wider variation of data presented to any particular CMM before it begins to get saturated. Saturation of CMMs will be discussed again below.
- each character is assigned a unique binary code.
- Figure 2(a) shows an example of this, where six letters are each assigned a respective six-bit binary code, in each of which only a single bit is set to .
- n-tuple sampling By this it is meant that each string of characters is divided into a succession of samples of n characters, each sample being one character on from the previous sample. Another way to look at this is as a "sliding window” n characters wide, which moves across a stream of input characters, such that the "window” advances one character at a time.
- a unique binary code is then assigned to each 2-tuple or pair of characters, as the result of combining the binary codes of the individual characters of the pair, using a binary tensor product operation.
- FIG 2(c) An example of this is shown in Figure 2(c), where the tensor product of the first 2-tuple of Figure 2(b) is formed as the cross product of the two binary values of the letters C S' and T'.
- the resulting matrix is then converted back into a single binary number simply by taking each row of the matrix in turn and concatenating them together. The order in which the rows are taken is not important, provided that it is consistent for all tensor products. It is important to note that the final single binary number resulting from the tensoring of each 2-tuple depends upon the order of the letters in the tuple - i.e. the result from "PS" is different to that from "SP".
- Figure 2(d) illustrates the final binary pattern or number for the input word "SPOTTER”.
- Figure 2(e) shows the corresponding final binary pattern or number for the slightly mis-spelt input word "SPOTER”. Note the similarity between the binary patterns of Figures 2(d) and 2(e).
- Figure 2(f) shows the final binary pattern or number for the input word "PROTEST”. Note the marked difference between the binary patterns of Figures 2(d) and 2(f), even though the word PROTEST is an anagram of the word SPOTTER, and therefore comprises exactly the same characters.
- each record i.e. a complete postal address
- each record is sampled (typically in 3-tuples or "triples") and tensored to give a resultant binary number or pattern for that record.
- a second binary pattern which we refer to as a separator, which is associated with the binary pattern of the record.
- the purpose of the separator is to represent the record which may later be found by a search operation, and so a record is maintained of the original data record represented by each separator.
- the binary pattern for each record is then entered into the CMM with its respective separator. That is, the binary pattern for the record is applied as a row address and its respective separator as a column address (or vice-versa), and the intersections of the CMM which have both row and column addressed as '1' are set as '1'. This is generally in the manner as described above with reference to Figure 1.
- query data - that is, as much of an original record as is available, to identify the full record.
- the query data is then processed in the same way as original data was entered in the CMM in the first place - that is, it is sampled in tuples, the tuples are binary coded and tensored, and the tensored products combined to form a final binary pattern of the query data.
- the final binary pattern thus formed is then applied as a row address to the CMM. Then, for each column of the CMM, for each row of the address set to '1', the number of intersections of that row and column which are set to '1' are counted, to give a sum for that column.
- the sequence of sums for all of the columns gives a 1-dimensional output array of summed values.
- Figure 3 shows the CMM of Figure 1, with the same row address as in Figure 1 applied, and showing the array of summed values of the columns as 020100210. This represents a number of combined separators that may match the input query represented by the row address applied to the CMM.
- a thresholding step is applied. For example, referring to Figure 2, if a threshold value of "2" is applied to the output array of summed values, each bit of the array which is equal to or greater than "2" is set to T, and all other bits are set to '0'. This then reduces the original output array of summed values to a binary output array.
- the binary output array is 01000010. Due to the simplicity of this example, this is identical to the original separator code applied in Figure 1, which then links to the original record. However, as a general rule, the binary output array will be more complicated, representing a number of superimposed separators, and will require further processing by an extractor to extract the correct separator for the query data and therefore link to the original record that is sought.
- the extractor will extract a number of individual separators, which are then linked to their respective records, which in turn can be listed, preferably in a ranking order.
- FIG. 4 The above description outlines a data processor as one example of an embodiment of the invention.
- the essential parts of such a data processor 1 are illustrated diagrammatically in Figure 4.
- query data is input at 10, having been subject to preprocessing as outlined above.
- the data is then passed to a sampler 2, which forms n-tuples.
- the tuples are then coded.
- a tensoring means 3 forms tensor patterns from the binary codes of the letters of the tuples, and the tensor patterns are combined by a superimposing coder 4.
- the output of the coder 4 is translated by an index- value coder 5, which outputs a series of values representing the positions of each respective bit which is set to '1' in the output of the superimposing coder 4.
- the output of the index-value coder 5 is then applied as a row address pattern to the CMM 6, which already has data stored in it.
- the CMM outputs a column address pattern, the value of each column representing the number of row intersections with that column, that are set to '1'.
- the CMM output is fed to a threshold device 7, which provides an output in binary form, indicating the columns that meet the threshold value.
- the output is then fed to an MBI processor 8 (an example of which is described below), which extracts all separator codes that match the output of the threshold device 7.
- the extracted separators are then matched with their respective input data, to provide a result list 20.
- This result list 20 can then be subjected to a Back-Check operation (also described below), to match final results more closely with the original query data.
- Figure 4 also shows a separator generator 9, which generates a separator code for each record that is entered into the CMM for storage, as mentioned above.
- the separator generator 9 generates separator codes each having M bits, and each having the same number N (N > 1 and N ⁇ M) of the M bits set to '1'.
- the N set bits of each separator code are randomly chosen (although no two separator codes are allowed to be identical), so that the various separator codes share a minimum number of set bits in common. (In some other applications, two or more separator codes may be allowed to be identical.)
- An alternative method to extract matches from a CMM query is "k- point" thresholding. Instead of selecting from the CMM output array of summed values those bits representing column values that are equal to or greater than a predetermined numerical value ('2' in the above simple example), the k highest-value bits are selected, whatever those numerical values might be.
- An improvement to assist finding an exact match of an input query is as follows. When binary codes representing tensored tuples are OR-ed (as in the above example for SPOTTER - Figure 2), the system can "lose" bits, for example:
- the summed separator values that are output as an array in a subsequent recall or search process are a result of the above SIB being used to teach the CMM.
- the summed values are thresholded to obtain the separators of the possible matching data.
- the threshold can be set to the number of bits set in the input. But because of the "loss" of bits shown above, the system will give a lower threshold than "should" be given. In this case 3 instead of 4. This results in many more false hits from the memory.
- a solution to this is to "multiple activate" in the teaching stage bits which have two bits (or more) OR-ed on top of each other - i.e. those lines are counted more than once in the CMM teaching or access stage.
- the threshold count then includes these multiple counts in the number of bits set in the input.
- skewed data data in which certain items recur with a very high frequency, as compared to other items which occur very rarely. Storage of items of data with a high frequency of recurrence can cause saturation of the CMM. This will be explained in more detail below.
- One solution to this problem is to split the address data file into a number of smaller files according to some criteria and put each small file in its own smaller CMM.
- the total number of different address lengths is 313 (319-6), and by dividing this by the number of CMMs that it was planned to use, we arrive at a set of bands into which each address can be placed based on its length. For example, if we decide to use 3 CMMs, we get 313/3 ⁇ 104. Therefore the bands will be 6-110, 111-215 and 216-319 triples. An address is placed in one of these bands according to how many triples it contains.
- This process will ensure that no particular CMM line exceeds a chosen level of saturation, and that each address is stored in the first available CMM without exceeding this saturation level.
- the conventional method for implementing a CMM line has been to hold the binary pattern for that line as an array of words, as shown in Figure 6a.
- Figure 6b an alternative representation is shown in Figure 6b.
- This new method of implementing a CMM line is to store a list of the positions of the bits set to 1.
- the CMM is 24 bits wide. This means the highest bit position is 23 (starting from 0), and the number 23 can be represented in 5 binary bits. So in this case, a 24-bit CMM line with a single 2-bit separator stored in it can be represented using only 10 bits - the two 5-bit numbers indicating the positions of the set bits in the CMM line.
- the amount of memory required by the compressed CMM lines in this example is constant, no matter what the width of the CMM. This is the main reason that the 10000-bit wide CMM shows such a large memory saving. In effect, only 10% of the CMM has grown to 10000 bits wide, while the other 90% is still using the same amount of memory as it was for a 1000 bit wide CMM.
- the number of times each input line is activated is unlikely to fall into 2 groups as in the example here.
- the usage count for each input line can be seen to climb very slowly until it suddenly grows very rapidly indeed.
- the CMM width should be chosen to be as large as is practical, so that most of the lines will be stored compressed, and only the very commonly used lines will be stored as binary CMM lines.
- the total memory requirement for this configuration is 291.2Mb.
- the total memory requirement would be 2.4Gb.
- the compression technique has therefore achieved over an eight-fold memory saving in this particular case.
- Further pre-processing of the text may be carried out, to help in reducing the incidence of saturation of CMM lines.
- this may be achieved by removing hyphen and number characters from an input data string, except for those characters forming part of the postcode. That is, the string is processed to remove unwanted characters and generate a set of tokens (i.e. sub-strings deemed to be valid inputs for the purpose of generating CMM input codes). The potential loss of information can then be made up for by more accurate matching which takes place in a Back-Check function.
- a unique binary code is then assigned to each unique input character. It is useful to provide access to the mapping between characters and binary codes so that applications can "look-up" the character/string for a given code, and vice-versa.
- the actual form of the code is determined by three parameters. One parameter specifies the width of the bit field (in bits) for all codes to be used in a particular CMM. A second parameter specifies the number of bits which are to be set to logical '1' in the code, which is normally a small, fixed number. A third parameter may be provided which specifies the minimum permitted Hamming distance between any pair of codes used in a particular CMM. This provides some control over the amount of "overlap" between codes used and helps to minimise spurious outputs during subsequent recall. The codes may then be processed and stored in the CMM, as described above. Embodiments of the invention may use Middle-Bit-Indexing (MBI) extraction to extract separators from a superimposed separator
- MBI Middle-Bit-Indexing
- an MBI extractor (conveniently referred to as an SIS) as may be produced as the binary output from the CMM, as described above.
- An example of an MBI extractor is illustrated in Figure 10.
- the MBI extractor illustrated in Figure 10 is very efficient in maintaining mappings between separators and the records they represent and, most importantly, to process an SIS by extracting any individual separators that are present, and retrieving the corresponding records.
- the MBI extractor uses the middle bits of the SIS to determine which buckets of a separator database to access during search. Each separator is stored in a bucket corresponding to the location of the middle bit of the bits that are set (to '1') in that separator.
- the "middle bits" of a separator are used to identify the bucket where that separator would be stored in the separator database if the separator in question existed.
- the problem is that, during a recall operation, an SIS is obtained which contains a number of possible separator codes, of which some may not exist in the system. (Recall that a separator is only brought into existence if it is created when entering a record into the CMM.)
- MBI uses an index into the separator database based on the position of the middle bit of existing separator codes, to find the genuine separators in an SIS and hence the records represented by those genuine separators.
- FIG 10 shows the type of data structure used in MBI extractors.
- the buckets are enumerated such that the ordinal number of the bucket corresponds with the position of the middle set bit for those separators stored within that bucket (counting from left to right) in the separator database.
- Buckets 0 and 6 are not shown, since they are not used in this MBI-based example. This is simply because the separators are seven bits wide and bits 0 and 6 could not be "middle bits".
- bucket 5 For example. This bucket has just one entry for the separator 1000011, which has the 5th bit set. If the separator 0010011 existed in this system, it too would be stored in bucket 5. Note that the separator is not stored explicitly in this extractor (that is, as the full binary number such as 1100001), but rather as an array of integers representing the set bit positions in the separator (that is, the corresponding shorter array such as [0,6]). Since the bucket number is the same as the bit position of the middle bit for all separators in a particular bucket, the integer representing the fact that the middle bit is set is omitted from the array of bits set to avoid redundancy.
- the SIS generated from the CMM is inspected to identify each bit position which represents a potential middle bit in a separator.
- a group of one or more set bits on the extreme left and right are discounted immediately, because they cannot assume a middle position. For example, if the separators are fixed to have always 5 bits set, then the 2 bits on the extreme left and right of the SIS can never be middle bits.
- Every other set bit is a potential middle bit and the corresponding buckets must be checked.
- One implementation of this uses AND separator checking. This uses a bitwise logical AND between each separator stored in each selected bucket and the SIS. If a stored separator is unchanged by the AND operation, the SIS must contain that separator. The number (identifier) of each found separator is added to a list so that the records represented by the separators can be subsequently recovered.
- a Back- Checker is a device which aims to verify whether each result record is a true possible match with the input or whether it is a spurious result (non- match).
- One implementation of a Back-Checker operates by counting the number of words in every result record which match a word in the input query.
- the score is modified by including the result of comparing the soundex code of words in the address.
- a soundex code represents the sound of a word such that similar sounding words are meant to have the same soundex code.
- Increasing the score for a matching soundex code is intended to improve tolerance to minor spelling errors in the query.
- the resulting count is used to rank the results according to how well they match the input.
- a particular advantage of the illustrated data processors is that a significant amount of data compression takes place in the storage of indexes in a CMM. This is a consequence of using multiple set bits in the separator code (i.e. more than 1 bit is set in each separator code), and the columnwise summation of bits from selected rows.
- the former means that separator codes may be overlapped during the training phase (i.e. more than one separator code can have the same bit set), and the latter means that any aliasing introduced by this overlapping can be resolved during recall. It may be noted that summation results in a kind of voting system, where selected rows cast a vote for a particular separator/record at the places where a bit is set.
- rows and columns of a matrix can readily be interchanged.
- binary patterns and separators that are illustrated as being entered respectively as rows and columns could equally be entered respectively as columns and rows, provided that all data entry and recall is consistent in the convention chosen.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Mobile Radio Communication Systems (AREA)
- Complex Calculations (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002377765A CA2377765C (en) | 1999-06-26 | 2000-06-26 | Data processors |
EP00940541A EP1196890B1 (en) | 1999-06-26 | 2000-06-26 | Data processor and method therefor |
AT00940541T ATE233924T1 (en) | 1999-06-26 | 2000-06-26 | DATA PROCESSING DEVICE AND PROCESS |
US10/019,172 US7065517B1 (en) | 1999-06-26 | 2000-06-26 | Data processors |
AU55465/00A AU763064B2 (en) | 1999-06-26 | 2000-06-26 | Data processors |
DE60001585T DE60001585T2 (en) | 1999-06-26 | 2000-06-26 | DATA PROCESSING DEVICE AND METHOD |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9914876.9 | 1999-06-26 | ||
GB9914876A GB2351572B (en) | 1999-06-26 | 1999-06-26 | Data procesors |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001001345A1 true WO2001001345A1 (en) | 2001-01-04 |
Family
ID=10856056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2000/002303 WO2001001345A1 (en) | 1999-06-26 | 2000-06-26 | Data processors |
Country Status (8)
Country | Link |
---|---|
US (1) | US7065517B1 (en) |
EP (1) | EP1196890B1 (en) |
AT (1) | ATE233924T1 (en) |
AU (1) | AU763064B2 (en) |
CA (1) | CA2377765C (en) |
DE (1) | DE60001585T2 (en) |
GB (1) | GB2351572B (en) |
WO (1) | WO2001001345A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7426520B2 (en) * | 2003-09-10 | 2008-09-16 | Exeros, Inc. | Method and apparatus for semantic discovery and mapping between data sources |
US7610397B2 (en) * | 2005-02-28 | 2009-10-27 | International Business Machines Corporation | Method and apparatus for adaptive load shedding |
US9720971B2 (en) | 2008-06-30 | 2017-08-01 | International Business Machines Corporation | Discovering transformations applied to a source table to generate a target table |
US7882143B2 (en) * | 2008-08-15 | 2011-02-01 | Athena Ann Smyros | Systems and methods for indexing information for a search engine |
US20100042589A1 (en) * | 2008-08-15 | 2010-02-18 | Smyros Athena A | Systems and methods for topical searching |
US8965881B2 (en) * | 2008-08-15 | 2015-02-24 | Athena A. Smyros | Systems and methods for searching an index |
US7996383B2 (en) * | 2008-08-15 | 2011-08-09 | Athena A. Smyros | Systems and methods for a search engine having runtime components |
US9424339B2 (en) | 2008-08-15 | 2016-08-23 | Athena A. Smyros | Systems and methods utilizing a search engine |
US11062001B2 (en) * | 2019-04-02 | 2021-07-13 | International Business Machines Corporation | Matrix transformation-based authentication |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0295876A2 (en) * | 1987-06-15 | 1988-12-21 | Digital Equipment Corporation | Parallel associative memory |
US4958377A (en) * | 1987-01-20 | 1990-09-18 | Nec Corporation | Character string identification device with a memory comprising selectively accessible memory areas |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6014661A (en) * | 1996-05-06 | 2000-01-11 | Ivee Development Ab | System and method for automatic analysis of data bases and for user-controlled dynamic querying |
JP2001519070A (en) * | 1997-03-24 | 2001-10-16 | クイーンズ ユニバーシティー アット キングストン | Method, product and device for match detection |
WO2002073230A2 (en) * | 2001-03-14 | 2002-09-19 | Mercury Computer Systems, Inc. | Wireless communications methods and systems for short-code and other spread spectrum waveform processing |
-
1999
- 1999-06-26 GB GB9914876A patent/GB2351572B/en not_active Expired - Fee Related
-
2000
- 2000-06-26 US US10/019,172 patent/US7065517B1/en not_active Expired - Fee Related
- 2000-06-26 DE DE60001585T patent/DE60001585T2/en not_active Expired - Lifetime
- 2000-06-26 AT AT00940541T patent/ATE233924T1/en not_active IP Right Cessation
- 2000-06-26 EP EP00940541A patent/EP1196890B1/en not_active Expired - Lifetime
- 2000-06-26 WO PCT/GB2000/002303 patent/WO2001001345A1/en active IP Right Grant
- 2000-06-26 AU AU55465/00A patent/AU763064B2/en not_active Ceased
- 2000-06-26 CA CA002377765A patent/CA2377765C/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4958377A (en) * | 1987-01-20 | 1990-09-18 | Nec Corporation | Character string identification device with a memory comprising selectively accessible memory areas |
EP0295876A2 (en) * | 1987-06-15 | 1988-12-21 | Digital Equipment Corporation | Parallel associative memory |
Non-Patent Citations (1)
Title |
---|
YANG GUOQING ET AL: "Multilayer parallel distributed pattern recognition system model using sparse RAM nets", IEE PROCEEDINGS E (COMPUTERS AND DIGITAL TECHNIQUES), MARCH 1992, UK, vol. 139, no. 2, pages 144 - 146, XP002146690, ISSN: 0143-7062 * |
Also Published As
Publication number | Publication date |
---|---|
US7065517B1 (en) | 2006-06-20 |
ATE233924T1 (en) | 2003-03-15 |
EP1196890A1 (en) | 2002-04-17 |
CA2377765A1 (en) | 2001-01-04 |
GB9914876D0 (en) | 1999-08-25 |
EP1196890B1 (en) | 2003-03-05 |
AU5546500A (en) | 2001-01-31 |
DE60001585T2 (en) | 2004-01-08 |
AU763064B2 (en) | 2003-07-10 |
GB2351572B (en) | 2002-02-06 |
DE60001585D1 (en) | 2003-04-10 |
CA2377765C (en) | 2009-08-18 |
GB2351572A (en) | 2001-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0834138B1 (en) | System and method for reducing the search scope in a lexicon | |
Hull | Document image matching and retrieval with multiple distortion-invariant descriptors | |
US5752051A (en) | Language-independent method of generating index terms | |
US5692177A (en) | Method and system for data set storage by iteratively searching for perfect hashing functions | |
CN102142038B (en) | Multi-stage query processing system and method for use with tokenspace repository | |
EP2014054B1 (en) | Method and apparatus for approximate pattern matching | |
US5748953A (en) | Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols | |
US5319779A (en) | System for searching information using combinatorial signature derived from bits sets of a base signature | |
KR101153033B1 (en) | Method for duplicate detection and suppression | |
US20100174741A1 (en) | Bit string search apparatus, search method, and program | |
AU763064B2 (en) | Data processors | |
US20030220771A1 (en) | Method of discovering patterns in symbol sequences | |
CN1252584A (en) | On-line hand writing Chinese character distinguishing device | |
Thorup | Faster deterministic sorting and priority queues in linear space | |
Zhou et al. | Enhanced locality-sensitive hashing for fingerprint forensics over large multi-sensor databases | |
CN114491594A (en) | Multi-encryption data encryption system | |
CN116821053B (en) | Data reporting method, device, computer equipment and storage medium | |
JPH0869476A (en) | Retrieval system | |
JP2000231559A (en) | Information processor | |
US5913216A (en) | Sequential pattern memory searching and storage management technique | |
Campbell | The Design of Text Signatures for Text Retrieval Systems | |
Shishibori et al. | An efficient compression method for Patricia tries | |
EP0595539A1 (en) | A sequential pattern memory searching and storage management technique | |
JPH10177582A (en) | Method and device for retrieving longest match | |
JPH0335697B2 (en) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2377765 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 55465/00 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000940541 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10019172 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2000940541 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWG | Wipo information: grant in national office |
Ref document number: 2000940541 Country of ref document: EP |
|
WWG | Wipo information: grant in national office |
Ref document number: 55465/00 Country of ref document: AU |
|
NENP | Non-entry into the national phase |
Ref country code: JP |