EP0633537B1 - Method and system for searching compressed data - Google Patents
Method and system for searching compressed data Download PDFInfo
- Publication number
- EP0633537B1 EP0633537B1 EP94110014A EP94110014A EP0633537B1 EP 0633537 B1 EP0633537 B1 EP 0633537B1 EP 94110014 A EP94110014 A EP 94110014A EP 94110014 A EP94110014 A EP 94110014A EP 0633537 B1 EP0633537 B1 EP 0633537B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- compressed
- document
- variable length
- query request
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
Definitions
- This invention relates generally to a computer method and system for searching data in a computer.
- the electronic collection and storage of information presents problems in many environments. For example, when data is downloaded from an information service to a remote computer, vast amounts of data are transferred in a short interval.
- the remote computer typically stores this data in a computer memory for later processing.
- the information service typically compresses the data before transmitting the data to the remote computer.
- Compression methods generally fall into two categories: fixed length compression methods and variable length compression methods.
- Fixed length compression methods represent each symbol by the same number of bits.
- the LZ 78 type of Ziv-Lempel coding is a fixed length compression method.
- Variable length compression methods represent each symbol by a varying number of bits.
- Variable length compression methods typically assign shorter codes to more frequently used symbols.
- the well-known Morse code is a variable length compression method.
- the present invention provides a method and system for searching a compressed document as defined in claims 1 and 2.
- Figure 1 is a block diagram of a system embodying the present invention for searching for character strings in a compressed document while the character strings are stored in a compressed code.
- Figure 2 is a block diagram of a sample document.
- Figure 3 is a block diagram of the uncompressed codes of the sample document.
- Figure 4 is a block diagram of a query request displayed on a display device for the system of Figure 1.
- Figure 5 is a block diagram of the character string parsed from the query request.
- Figure 6 is a block diagram of a variable length compressed code for the system of Figure 1.
- Figure 7A is a block diagram of a dictionary with variable length compressed codes for a host computer of the system of Figure 1.
- Figure 7B is a block diagram of a dictionary with variable length compressed codes for a remote computer of the system of Figure 1.
- Figure 8 is a block diagram of a compressed document of the variable length compressed codes as it is stored in the memory of the system of Figure 1.
- Figure 9 is a block diagram of variable length compressed codes representing the character string from the query request.
- Figures 10, 11, 12, and 13 are block diagrams illustrating searching for the query request in the variable length compressed code.
- the present invention is embodied in a computer system 100 and a method executed in the computer system which converts character strings in an uncompressed document 101 from an uncompressed code to a compressed code, stores the character strings in their compressed code in a compressed document 103 in response to a user request, and searches for character strings in the compressed document 103 while the character strings remain stored in their compressed code.
- the System The System:
- the computer system 100 comprises a host computer keyboard 105, a host computer mouse 107, a host computer 109, a remote computer keyboard 111, a remote computer mouse 113, a remote computer 115, and a communications channel 117 to transmit data between the host computer 109 and the remote computer 115.
- documents can be stored in their uncompressed form and then when requested by the remote computer 115, the documents can be compressed and transmitted via the communications channel 117 to the remote computer 115.
- the documents need not be uncompressed before a query can be processed against the documents.
- the host computer 109 further includes a central processing unit 119, an input/output unit 121, and a host computer memory 123.
- the host computer memory 123 stores a dictionary 125, the uncompressed document 101, a parser 127, when executing on the central processing unit 119, a compression engine 129, and the compressed document 103.
- the parser 127 retrieves character strings from the uncompressed document 101.
- the retrieved character strings from the uncompressed document 101 are in the uncompressed code.
- a typical uncompressed code is the ASCII code which represents each character as an 8 bit string of zeros and ones.
- the compression engine 129 when executing on the central processing unit 119, converts the uncompressed character string to a compressed character string using the dictionary 125.
- the compressed character string is represented as a variable length compressed code.
- the compression engine 129 when executing on the central processing unit 119, then stores the compressed character string in the compressed document 103.
- Figure 1 An example of the general operation of the computer system 100 ( Figure 1) will illustrate how the present invention searches for character strings in the compressed document 149 on the remote computer 115 while the character strings are stored in the compressed code of the present invention.
- An embodiment of the present invention converts character strings in an uncompressed document 101 from an uncompressed code to a variable length compressed code, stores the character strings in their variable length compressed code in a compressed document 103, and searches for character strings in the compressed document 103 while the character strings remain in their variable length compressed code.
- Figure 6 shows the preferred variable length compressed code of the present invention which includes a length indicator 2601 and a compressed code 2603.
- the length indicator 2601 indicates the number of bits in the compressed code 2603.
- the length indicator 2601 shown in Figure 6 contains a binary number equivalent to the decimal value three, indicating that the compressed code 2603 illustrated in Figure 6 is three bits in length.
- Figures 2-3, 4-5, and 6-13 will help illustrate the preferred method and system for searching variable length compressed codes of the compressed document 103.
- a user of the remote computer 115 inputs a request to transfer the uncompressed document 101 from the host computer 109 to the remote computer 115.
- Figure 2 shows the uncompressed document 101 displayed on the host computer display device 163.
- Figure 3 shows the uncompressed document 101 as it is stored in the host computer memory 123.
- the host computer 109 invokes the compression engine 129 which converts the uncompressed document 101 to the compressed document 103 containing variable length compressed codes.
- the compression engine 129 invokes the parser 127 which parses uncompressed character strings stored in the uncompressed document 101 and sends the parsed character strings back to the compression engine 129.
- the compression engine 129 receives from the parser 127 the parsed character string in its uncompressed code. Then the compression engine 129 finds a match for the uncompressed code in the dictionary 125 ( Figure 2A) and retrieves from the dictionary 125 a variable length compressed code associated with the matched uncompressed code. Finally, the compression engine 129 stores the retrieved variable length compressed code from the dictionary 125 in the compressed document 103.
- the host computer 109 transfers the compressed document 103 from the host computer 109, over the communications channel 117, to the remote computer 115 where it is stored as the compressed document 149.
- Figure 8 shows the compressed document 149 as it is stored in the remote computer memory 143.
- a user of the remote computer 115 typically inputs a query in order to retrieve information about the compressed document 149.
- Figure 4 illustrates a typical query request 165 which was entered on the remote computer keyboard 111 and later stored in the remote computer memory 143.
- the query request 165 asks "what documents contain the word 'ready'?"
- the remote computer 115 invokes the query engine 147.
- the query engine 147 retrieves the word to search for from the query request 165; in this case the query engine 147 retrieves the word "ready" from the query request 165 (see Figure 5).
- the query engine 147 invokes the convert program 161 which converts the word "ready” to the variable length compressed code of the present invention.
- the convert program 161 searches the dictionary 145 to see if there is a match with the word "ready”.
- the convert program 161 determines that the dictionary 145 does not contain the word "ready”, it searches the dictionary 145 for the first two characters from the word "ready", i.e., the character string "re".
- the convert program 161 determines that the character string "re” is stored in the pairs section 153 of the dictionary 145, it retrieves the compressed code 0110000010 associated with the character string "re”.
- the convert program 161 then stores the retrieved compressed code in the remote computer memory 143 (see Figure 9).
- the convert program 161 removes the first two unprocessed characters from the word "ready", i.e., the character string "ad”.
- the convert program 161 determines that the character string "ad” is stored in the pairs section 153 of the dictionary 145, it returns the compressed code 10100000000010 associated with the character string "ad”.
- the convert program 161 then stores the retrieved compressed code in the remote computer memory 143 (see Figure 9).
- the convert program 161 determines that only one unprocessed character remains from the word “ready” , i.e., the character "y”.
- the convert program 161 retrieves the compressed code 1111000000000000100 stored in the ASCII section 155 of the dictionary 145.
- the convert program 161 then stores the retrieved compressed code in the remote computer memory 143 (see Figure 9).
- the query engine 147 Upon completion of the conversion of the word "ready" to the variable length compressed code of the present invention, the query engine 147 invokes the search program 159 which searches the compressed document 149 for occurrences or compressed codes representing the word "ready".
- the search program 159 first retrieves the compressed code for the character string "re” ( Figure 9). Then the search program 159 determines the length of the compressed code 2901 for the character string "re", from the value of the length indicator 2903 for the character string "re". In this case the length indicator 2903 indicates that the compressed code 2901 is 6-bits in length.
- the search program 159 sets an indicator named Match-ptr to the first variable length compressed code in the compressed document 149 (see Figure 10).
- the search program 159 need only compare the compressed code 2901 against compressed codes in the compressed document 149 with identical lengths. The search program 159 then compares the value of the length indicator 3001 for the character string "Firstcap” with the value of the length indicator 2903 for the character string "re". Because the value of the length indicator 2903 is equal to six and the value of the length indicator 3001 is equal to twelve, the search program 159 determines that the compressed code 2901 is not a match for the compressed code 3003. Therefore, the search program 159 increments Match-ptr and compares the length indicator 2903 of the character string "re” with the length indicators of the character strings pointed to by Match-ptr, until a match is found.
- Figure 11 illustrates the position of Match-ptr within the compressed document 149 when a match with the length indicator 2903 for the character string "re” occurs.
- the search program 159 compares the compressed code 2901 for the character string "re” with the compressed code 3103 for the character string pointed to by Match-ptr, and determines that a match is found.
- the search program 159 proceeds to determine if the next two variable length compressed codes stored in the compressed document 149 match the character string "ad” and the character string "y”, respectively.
- the search program 159 increments Match-ptr and compares the length indicator 2905 ( Figure 9) for the character string "ad” with the length indicator 3201 ( Figure 12) for the character string pointed to by Match-ptr. Because the value of the length indicator 2905 equals the value of the length indicator 3201, the search program 159 compares the compressed code 2907 of the character string "ad” with the compressed code 3203 for the character string pointed to by Match-ptr.
- the search program 159 increments Match-ptr and compares the length indicator 2909 of the character string "y" with the length indicator 3301 for the character string pointed to by Match-ptr ( Figure 13). Because the value of the length indicator 2909 for the character string "y” equals the value of the length indicator 3301 for the character string pointed to by Match-ptr, the search program 159 compares the compressed code 2911 for the character string "y” with the compressed code 3303 for the character string pointed to by Match-ptr. Because a match occurs, the query engine 147 informs the initiator of the request that the compressed document 149 contains the word "ready".
Description
- This invention relates generally to a computer method and system for searching data in a computer.
- The electronic collection and storage of information presents problems in many environments. For example, when data is downloaded from an information service to a remote computer, vast amounts of data are transferred in a short interval. The remote computer typically stores this data in a computer memory for later processing. To reduce the amount of data transferred by the information service, the amount of data stored by the remote computer, and the time required for transmission of the data, the information service typically compresses the data before transmitting the data to the remote computer.
- Compression methods generally fall into two categories: fixed length compression methods and variable length compression methods. Fixed length compression methods represent each symbol by the same number of bits. For example, the LZ 78 type of Ziv-Lempel coding is a fixed length compression method. Variable length compression methods represent each symbol by a varying number of bits. Variable length compression methods typically assign shorter codes to more frequently used symbols. For example, the well-known Morse code is a variable length compression method.
- Existing systems convert the compressed code representing the data back into its uncompressed form and then process the uncompressed data if it is desired to search through the data. Searching through uncompressed data is known from "The Art of Computer Programming -
Volume 3/Sorting and Searching" by Donald E.Knuth (ISBN 0-201-03803-X). This requires a time consuming decompression step. - It would be desirable to have a search method which allows for the more efficient searching of data that has been compressed.
- The present invention provides a method and system for searching a compressed document as defined in
claims - Figure 1 is a block diagram of a system embodying the present invention for searching for character strings in a compressed document while the character strings are stored in a compressed code.
- Figure 2 is a block diagram of a sample document.
- Figure 3 is a block diagram of the uncompressed codes of the sample document.
- Figure 4 is a block diagram of a query request displayed on a display device for the system of Figure 1.
- Figure 5 is a block diagram of the character string parsed from the query request.
- Figure 6 is a block diagram of a variable length compressed code for the system of Figure 1.
- Figure 7A is a block diagram of a dictionary with variable length compressed codes for a host computer of the system of Figure 1.
- Figure 7B is a block diagram of a dictionary with variable length compressed codes for a remote computer of the system of Figure 1.
- Figure 8 is a block diagram of a compressed document of the variable length compressed codes as it is stored in the memory of the system of Figure 1.
- Figure 9 is a block diagram of variable length compressed codes representing the character string from the query request.
- Figures 10, 11, 12, and 13 are block diagrams illustrating searching for the query request in the variable length compressed code.
- As shown in Figure 1, the present invention is embodied in a
computer system 100 and a method executed in the computer system which converts character strings in anuncompressed document 101 from an uncompressed code to a compressed code, stores the character strings in their compressed code in acompressed document 103 in response to a user request, and searches for character strings in thecompressed document 103 while the character strings remain stored in their compressed code. - The
computer system 100 comprises ahost computer keyboard 105, ahost computer mouse 107, ahost computer 109, aremote computer keyboard 111, aremote computer mouse 113, aremote computer 115, and acommunications channel 117 to transmit data between thehost computer 109 and theremote computer 115. In thehost computer 109, documents can be stored in their uncompressed form and then when requested by theremote computer 115, the documents can be compressed and transmitted via thecommunications channel 117 to theremote computer 115. At theremote computer 115 the documents need not be uncompressed before a query can be processed against the documents. - The
host computer 109 further includes acentral processing unit 119, an input/output unit 121, and ahost computer memory 123. In addition, thehost computer memory 123 stores adictionary 125, theuncompressed document 101, aparser 127, when executing on thecentral processing unit 119, acompression engine 129, and thecompressed document 103. Theparser 127 retrieves character strings from theuncompressed document 101. The retrieved character strings from theuncompressed document 101 are in the uncompressed code. A typical uncompressed code is the ASCII code which represents each character as an 8 bit string of zeros and ones. Thecompression engine 129, when executing on thecentral processing unit 119, converts the uncompressed character string to a compressed character string using thedictionary 125. In an embodiment, the compressed character string is represented as a variable length compressed code. Thecompression engine 129, when executing on thecentral processing unit 119, then stores the compressed character string in thecompressed document 103. - An example of the general operation of the computer system 100 (Figure 1) will illustrate how the present invention searches for character strings in the
compressed document 149 on theremote computer 115 while the character strings are stored in the compressed code of the present invention. - An embodiment of the present invention converts character strings in an
uncompressed document 101 from an uncompressed code to a variable length compressed code, stores the character strings in their variable length compressed code in acompressed document 103, and searches for character strings in thecompressed document 103 while the character strings remain in their variable length compressed code. Figure 6 shows the preferred variable length compressed code of the present invention which includes alength indicator 2601 and a compressedcode 2603. Thelength indicator 2601 indicates the number of bits in thecompressed code 2603. For example, thelength indicator 2601 shown in Figure 6 contains a binary number equivalent to the decimal value three, indicating that thecompressed code 2603 illustrated in Figure 6 is three bits in length. - A specific example using Figures 2-3, 4-5, and 6-13 will help illustrate the preferred method and system for searching variable length compressed codes of the
compressed document 103. Typically, a user of theremote computer 115 inputs a request to transfer theuncompressed document 101 from thehost computer 109 to theremote computer 115. Figure 2 shows theuncompressed document 101 displayed on the hostcomputer display device 163. Figure 3 shows theuncompressed document 101 as it is stored in thehost computer memory 123. - In response to the transfer request, the
host computer 109 invokes thecompression engine 129 which converts theuncompressed document 101 to thecompressed document 103 containing variable length compressed codes. To convert theuncompressed document 101 into thecompressed document 103, thecompression engine 129 invokes theparser 127 which parses uncompressed character strings stored in theuncompressed document 101 and sends the parsed character strings back to thecompression engine 129. - The
compression engine 129 receives from theparser 127 the parsed character string in its uncompressed code. Then thecompression engine 129 finds a match for the uncompressed code in the dictionary 125 (Figure 2A) and retrieves from the dictionary 125 a variable length compressed code associated with the matched uncompressed code. Finally, thecompression engine 129 stores the retrieved variable length compressed code from thedictionary 125 in thecompressed document 103. - After every character string from the
uncompressed document 101 has been converted to the variable length compressed code of the present invention and stored in thecompressed document 103, thehost computer 109 transfers thecompressed document 103 from thehost computer 109, over thecommunications channel 117, to theremote computer 115 where it is stored as thecompressed document 149. Figure 8 shows thecompressed document 149 as it is stored in theremote computer memory 143. - A user of the
remote computer 115 typically inputs a query in order to retrieve information about thecompressed document 149. Figure 4 illustrates atypical query request 165 which was entered on theremote computer keyboard 111 and later stored in theremote computer memory 143. Thequery request 165 asks "what documents contain the word 'ready'?" In response to receiving thequery request 165, theremote computer 115 invokes thequery engine 147. Thequery engine 147 retrieves the word to search for from thequery request 165; in this case thequery engine 147 retrieves the word "ready" from the query request 165 (see Figure 5). - Next, the
query engine 147 invokes theconvert program 161 which converts the word "ready" to the variable length compressed code of the present invention. Theconvert program 161 searches thedictionary 145 to see if there is a match with the word "ready". When theconvert program 161 determines that thedictionary 145 does not contain the word "ready", it searches thedictionary 145 for the first two characters from the word "ready", i.e., the character string "re". When theconvert program 161 determines that the character string "re" is stored in thepairs section 153 of thedictionary 145, it retrieves thecompressed code 0110000010 associated with the character string "re". Theconvert program 161 then stores the retrieved compressed code in the remote computer memory 143 (see Figure 9). - Next, the
convert program 161 removes the first two unprocessed characters from the word "ready", i.e., the character string "ad". When theconvert program 161 determines that the character string "ad" is stored in thepairs section 153 of thedictionary 145, it returns thecompressed code 10100000000010 associated with the character string "ad". Theconvert program 161 then stores the retrieved compressed code in the remote computer memory 143 (see Figure 9). - Finally, the
convert program 161 determines that only one unprocessed character remains from the word "ready", i.e., the character "y". Theconvert program 161 retrieves thecompressed code 1111000000000000100 stored in theASCII section 155 of thedictionary 145. Theconvert program 161 then stores the retrieved compressed code in the remote computer memory 143 (see Figure 9). - Upon completion of the conversion of the word "ready" to the variable length compressed code of the present invention, the
query engine 147 invokes thesearch program 159 which searches thecompressed document 149 for occurrences or compressed codes representing the word "ready". Thesearch program 159 first retrieves the compressed code for the character string "re" (Figure 9). Then thesearch program 159 determines the length of thecompressed code 2901 for the character string "re", from the value of thelength indicator 2903 for the character string "re". In this case thelength indicator 2903 indicates that thecompressed code 2901 is 6-bits in length. Thesearch program 159 then sets an indicator named Match-ptr to the first variable length compressed code in the compressed document 149 (see Figure 10). - By determining the length of the
compressed code 2901, thesearch program 159 need only compare thecompressed code 2901 against compressed codes in the compresseddocument 149 with identical lengths. Thesearch program 159 then compares the value of thelength indicator 3001 for the character string "Firstcap" with the value of thelength indicator 2903 for the character string "re". Because the value of thelength indicator 2903 is equal to six and the value of thelength indicator 3001 is equal to twelve, thesearch program 159 determines that thecompressed code 2901 is not a match for thecompressed code 3003. Therefore, thesearch program 159 increments Match-ptr and compares thelength indicator 2903 of the character string "re" with the length indicators of the character strings pointed to by Match-ptr, until a match is found. Figure 11 illustrates the position of Match-ptr within the compresseddocument 149 when a match with thelength indicator 2903 for the character string "re" occurs. - The
search program 159 then compares thecompressed code 2901 for the character string "re" with thecompressed code 3103 for the character string pointed to by Match-ptr, and determines that a match is found. - Now that a match for character string "re" has been found, the
search program 159 proceeds to determine if the next two variable length compressed codes stored in the compresseddocument 149 match the character string "ad" and the character string "y", respectively. Thesearch program 159 increments Match-ptr and compares the length indicator 2905 (Figure 9) for the character string "ad" with the length indicator 3201 (Figure 12) for the character string pointed to by Match-ptr. Because the value of thelength indicator 2905 equals the value of thelength indicator 3201, thesearch program 159 compares thecompressed code 2907 of the character string "ad" with thecompressed code 3203 for the character string pointed to by Match-ptr. - Because a match occurs between the compressed codes, the
search program 159 increments Match-ptr and compares thelength indicator 2909 of the character string "y" with thelength indicator 3301 for the character string pointed to by Match-ptr (Figure 13). Because the value of thelength indicator 2909 for the character string "y" equals the value of thelength indicator 3301 for the character string pointed to by Match-ptr, thesearch program 159 compares thecompressed code 2911 for the character string "y" with thecompressed code 3303 for the character string pointed to by Match-ptr. Because a match occurs, thequery engine 147 informs the initiator of the request that thecompressed document 149 contains the word "ready". - Those of ordinary skill in the art will understand that other system architectures can be used to implement the method of the present invention described above, including, but not limited to, astand alone computer which compresses an uncompressed document and which performs a search on the compressed document.
- Those of ordinary skill in the art will also understand that the method and apparatus of the present invention can be used in conjunction with other compression methods.
- It will be appreciated that, although a specific embodiment of the invention has been described herein for purposes of illustration, various modifications may be made without departing from the scope of the invention. Accordingly, the invention is not limited except as by the appended claims.
Claims (2)
- An apparatus for searching a compressed document comprised of a plurality of variable length compressed codes which represent an uncompressed document, each variable length compressed code having a length indicator which indicates the length of the variable length compressed code, the apparatus comprising:means for receiving a query request in an uncompressed code;means for converting the query request from the uncompressed code into a converted query request of one of the plurality of variable length compressed codes;means for determining the length of the converted query request;means for comparing the converted query request with the variable length compressed codes of the compressed document, to determine if variable length compressed codes of the compressed document match the converted query request, wherein the converted query request is only compared with those variable length compressed codes whose length is equal to the length of the converted query request, andmeans for responding to the determination that there is a match between the variable length compressed codes of the document and the converted query request.
- A method for searching a compressed document comprised of a plurality of variable length compressed codes which represent an uncompressed document, each variable length compressed code having a length indicator which indicates the length of the variable length compressed code, the method comprising the steps of:receiving a query request in an uncompressed code;converting the query request from the uncompressed code into a converted query request of one of the plurality of variable length compressed codes;means for determining the length of the converted query request;comparing the converted query request with the variable length compressed codes of the compressed document, to determine which variable length compressed codes of the compressed document match the converted query request, wherein the converted query request is only compared with those variable length compressed codes whose length is equal to the length of the converted query request; andresponding to the determination that there is a match between the variable length compressed codes of the compressed document and the converted query request.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US8548193A | 1993-06-30 | 1993-06-30 | |
US85481 | 1993-06-30 |
Publications (3)
Publication Number | Publication Date |
---|---|
EP0633537A2 EP0633537A2 (en) | 1995-01-11 |
EP0633537A3 EP0633537A3 (en) | 1995-08-23 |
EP0633537B1 true EP0633537B1 (en) | 1999-12-08 |
Family
ID=22191893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP94110014A Expired - Lifetime EP0633537B1 (en) | 1993-06-30 | 1994-06-28 | Method and system for searching compressed data |
Country Status (5)
Country | Link |
---|---|
US (1) | US5737733A (en) |
EP (1) | EP0633537B1 (en) |
JP (1) | JP3234104B2 (en) |
CA (1) | CA2125337A1 (en) |
DE (1) | DE69421966T2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011095345A1 (en) | 2010-02-04 | 2011-08-11 | Bienert Joerg | Method and system for compressing data records and for processing compressed data records |
Families Citing this family (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5625711A (en) * | 1994-08-31 | 1997-04-29 | Adobe Systems Incorporated | Method and apparatus for producing a hybrid data structure for displaying a raster image |
US7362775B1 (en) * | 1996-07-02 | 2008-04-22 | Wistaria Trading, Inc. | Exchange mechanisms for digital information packages with bandwidth securitization, multichannel digital watermarks, and key management |
US5613004A (en) * | 1995-06-07 | 1997-03-18 | The Dice Company | Steganographic method and device |
US6205249B1 (en) * | 1998-04-02 | 2001-03-20 | Scott A. Moskowitz | Multiple transform utilization and applications for secure digital watermarking |
US7664263B2 (en) | 1998-03-24 | 2010-02-16 | Moskowitz Scott A | Method for combining transfer functions with predetermined key creation |
US5889868A (en) * | 1996-07-02 | 1999-03-30 | The Dice Company | Optimization methods for the insertion, protection, and detection of digital watermarks in digitized data |
US7177429B2 (en) * | 2000-12-07 | 2007-02-13 | Blue Spike, Inc. | System and methods for permitting open access to data objects and for securing data within the data objects |
US7346472B1 (en) | 2000-09-07 | 2008-03-18 | Blue Spike, Inc. | Method and device for monitoring and analyzing signals |
US7457962B2 (en) * | 1996-07-02 | 2008-11-25 | Wistaria Trading, Inc | Optimization methods for the insertion, protection, and detection of digital watermarks in digitized data |
US7159116B2 (en) | 1999-12-07 | 2007-01-02 | Blue Spike, Inc. | Systems, methods and devices for trusted transactions |
US7095874B2 (en) | 1996-07-02 | 2006-08-22 | Wistaria Trading, Inc. | Optimization methods for the insertion, protection, and detection of digital watermarks in digitized data |
US5893102A (en) * | 1996-12-06 | 1999-04-06 | Unisys Corporation | Textual database management, storage and retrieval system utilizing word-oriented, dictionary-based data compression/decompression |
US7730317B2 (en) * | 1996-12-20 | 2010-06-01 | Wistaria Trading, Inc. | Linear predictive coding implementation of digital watermarks |
US5946692A (en) * | 1997-05-08 | 1999-08-31 | At & T Corp | Compressed representation of a data base that permits AD HOC querying |
JP3666541B2 (en) * | 1997-09-11 | 2005-06-29 | 富士電機システムズ株式会社 | Data transfer device |
US6105021A (en) * | 1997-11-21 | 2000-08-15 | International Business Machines Corporation | Thorough search of document database containing compressed and noncompressed documents |
JP2000201080A (en) * | 1999-01-07 | 2000-07-18 | Fujitsu Ltd | Data compressing/restoring device and method using additional code |
US7664264B2 (en) | 1999-03-24 | 2010-02-16 | Blue Spike, Inc. | Utilizing data reduction in steganographic and cryptographic systems |
US6665838B1 (en) | 1999-07-30 | 2003-12-16 | International Business Machines Corporation | Web page thumbnails and user configured complementary information provided from a server |
US6405192B1 (en) | 1999-07-30 | 2002-06-11 | International Business Machines Corporation | Navigation assistant-method and apparatus for providing user configured complementary information for data browsing in a viewer context |
US6356908B1 (en) | 1999-07-30 | 2002-03-12 | International Business Machines Corporation | Automatic web page thumbnail generation |
US7475246B1 (en) | 1999-08-04 | 2009-01-06 | Blue Spike, Inc. | Secure personal content server |
US20040102197A1 (en) * | 1999-09-30 | 2004-05-27 | Dietz Timothy Alan | Dynamic web page construction based on determination of client device location |
US20040243540A1 (en) * | 2000-09-07 | 2004-12-02 | Moskowitz Scott A. | Method and device for monitoring and analyzing signals |
US7127615B2 (en) | 2000-09-20 | 2006-10-24 | Blue Spike, Inc. | Security based on subliminal and supraliminal channels for data objects |
US6649567B2 (en) * | 2001-10-11 | 2003-11-18 | Isp Investments Inc. | Controlled release microbiocide for porous surfaces |
EP1340351B1 (en) * | 2000-10-11 | 2007-12-12 | Broadcom Corporation | Dynamic delta encodijng for cable modem header suppression |
JP3729759B2 (en) * | 2001-08-07 | 2005-12-21 | 株式会社ルネサステクノロジ | Microcontroller that reads compressed instruction code, program memory that compresses and stores instruction code |
US6909384B2 (en) * | 2002-01-31 | 2005-06-21 | Microsoft Corporation | Generating and searching compressed data |
US7287275B2 (en) | 2002-04-17 | 2007-10-23 | Moskowitz Scott A | Methods, systems and devices for packet watermarking and efficient provisioning of bandwidth |
AU2003284118A1 (en) * | 2002-10-14 | 2004-05-04 | Battelle Memorial Institute | Information reservoir |
US7310769B1 (en) | 2003-03-12 | 2007-12-18 | Adobe Systems Incorporated | Text encoding using dummy font |
US6980949B2 (en) * | 2003-03-14 | 2005-12-27 | Sonum Technologies, Inc. | Natural language processor |
US7430560B1 (en) * | 2005-07-22 | 2008-09-30 | X-Engines, Inc. | Multi-level compressed lock-up tables formed by logical operations to compress selected index bits |
US20070067155A1 (en) * | 2005-09-20 | 2007-03-22 | Sonum Technologies, Inc. | Surface structure generation |
WO2008016742A1 (en) * | 2006-08-01 | 2008-02-07 | Topix Llc | Cap-sensitive text search for documents |
US7730088B2 (en) * | 2006-09-14 | 2010-06-01 | International Business Machines Corporation | Queriable hierarchical text data |
US7827218B1 (en) | 2006-11-18 | 2010-11-02 | X-Engines, Inc. | Deterministic lookup using hashed key in a multi-stride compressed trie structure |
US8166041B2 (en) * | 2008-06-13 | 2012-04-24 | Microsoft Corporation | Search index format optimizations |
CN105893337B (en) * | 2015-01-04 | 2020-07-10 | 伊姆西Ip控股有限责任公司 | Method and apparatus for text compression and decompression |
US10140033B2 (en) * | 2015-06-15 | 2018-11-27 | Xitore, Inc. | Apparatus, system, and method for searching compressed data |
JP6737117B2 (en) * | 2016-10-07 | 2020-08-05 | 富士通株式会社 | Encoded data search program, encoded data search method, and encoded data search device |
JP6931442B2 (en) * | 2017-05-16 | 2021-09-08 | 富士通株式会社 | Coding program, index generator, search program, coding device, index generator, search device, coding method, index generation method and search method |
US10528556B1 (en) * | 2017-12-31 | 2020-01-07 | Allscripts Software, Llc | Database methodology for searching encrypted data records |
CA3126089C (en) * | 2019-03-01 | 2023-06-20 | Cyborg Inc. | System and method for statistics-based pattern searching of compressed data and encrypted data |
US11636100B2 (en) * | 2020-11-27 | 2023-04-25 | Verizon Patent And Licensing Inc. | Systems and methods for compression-based search engine |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3651483A (en) * | 1969-01-03 | 1972-03-21 | Ibm | Method and means for searching a compressed index |
US3593309A (en) * | 1969-01-03 | 1971-07-13 | Ibm | Method and means for generating compressed keys |
US3643226A (en) * | 1969-06-26 | 1972-02-15 | Ibm | Multilevel compressed index search method and means |
JPS58184646A (en) * | 1982-04-22 | 1983-10-28 | Kokusai Denshin Denwa Co Ltd <Kdd> | Message communication system |
US4617663A (en) * | 1983-04-13 | 1986-10-14 | At&T Information Systems Inc. | Interface testing of software systems |
EP0160672A4 (en) * | 1983-10-19 | 1986-05-12 | Text Sciences Corp | Method and apparatus for data compression. |
US4701851A (en) * | 1984-10-24 | 1987-10-20 | International Business Machines Corporation | Compound word spelling verification |
US4650927A (en) * | 1984-11-29 | 1987-03-17 | International Business Machines Corporation | Processor-assisted communication system using tone-generating telephones |
US4843389A (en) * | 1986-12-04 | 1989-06-27 | International Business Machines Corp. | Text compression and expansion method and apparatus |
CA1341310C (en) * | 1988-07-15 | 2001-10-23 | Robert Filepp | Interactive computer network and method of operation |
DE4031421C2 (en) * | 1989-10-05 | 1995-08-24 | Ricoh Kk | Pattern matching system for a speech recognition device |
US5274805A (en) * | 1990-01-19 | 1993-12-28 | Amalgamated Software Of North America, Inc. | Method of sorting and compressing data |
US5276868A (en) * | 1990-05-23 | 1994-01-04 | Digital Equipment Corp. | Method and apparatus for pointer compression in structured databases |
US5333313A (en) * | 1990-10-22 | 1994-07-26 | Franklin Electronic Publishers, Incorporated | Method and apparatus for compressing a dictionary database by partitioning a master dictionary database into a plurality of functional parts and applying an optimum compression technique to each part |
US5253341A (en) * | 1991-03-04 | 1993-10-12 | Rozmanith Anthony I | Remote query communication system |
US5163094A (en) * | 1991-03-20 | 1992-11-10 | Francine J. Prokoski | Method for identifying individuals from analysis of elemental shapes derived from biosensor data |
US5414838A (en) * | 1991-06-11 | 1995-05-09 | Logical Information Machine | System for extracting historical market information with condition and attributed windows |
JPH0546675A (en) * | 1991-08-12 | 1993-02-26 | Mitsubishi Electric Corp | Information compression and retrieval system |
FR2681966A1 (en) * | 1991-09-27 | 1993-04-02 | Euro Cp Sarl | Process for compressing/decompressing (expanding) textual data in a home automation network |
US5337233A (en) * | 1992-04-13 | 1994-08-09 | Sun Microsystems, Inc. | Method and apparatus for mapping multiple-byte characters to unique strings of ASCII characters for use in text retrieval |
JPH05324730A (en) * | 1992-05-27 | 1993-12-07 | Hitachi Ltd | Document information retrieving device |
US5325091A (en) * | 1992-08-13 | 1994-06-28 | Xerox Corporation | Text-compression technique using frequency-ordered array of word-number mappers |
-
1994
- 1994-06-07 CA CA002125337A patent/CA2125337A1/en not_active Abandoned
- 1994-06-28 DE DE69421966T patent/DE69421966T2/en not_active Expired - Lifetime
- 1994-06-28 JP JP14653694A patent/JP3234104B2/en not_active Expired - Lifetime
- 1994-06-28 EP EP94110014A patent/EP0633537B1/en not_active Expired - Lifetime
-
1996
- 1996-09-26 US US08/721,558 patent/US5737733A/en not_active Expired - Lifetime
Non-Patent Citations (1)
Title |
---|
DONALD E. KNUTH: "The Art of Computer Programming - Volume 3/Sorting and Searching", ADDISON WESLEY * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011095345A1 (en) | 2010-02-04 | 2011-08-11 | Bienert Joerg | Method and system for compressing data records and for processing compressed data records |
EP2690565A1 (en) | 2010-02-04 | 2014-01-29 | Parstream GmbH | Method and system for compressing data records and for processing compressed data records |
Also Published As
Publication number | Publication date |
---|---|
JP3234104B2 (en) | 2001-12-04 |
US5737733A (en) | 1998-04-07 |
EP0633537A3 (en) | 1995-08-23 |
DE69421966T2 (en) | 2000-04-27 |
CA2125337A1 (en) | 1994-12-31 |
JPH0756955A (en) | 1995-03-03 |
EP0633537A2 (en) | 1995-01-11 |
DE69421966D1 (en) | 2000-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0633537B1 (en) | Method and system for searching compressed data | |
JP3848983B2 (en) | Data transmission method, data equalization method and apparatus | |
US5955976A (en) | Data compression for use with a communications channel | |
US5467087A (en) | High speed lossless data compression system | |
EP0510634B1 (en) | Data base retrieval system | |
US5374916A (en) | Automatic electronic data type identification process | |
EP0559824B1 (en) | Binary data communication system | |
US5663721A (en) | Method and apparatus using code values and length fields for compressing computer data | |
CA1290061C (en) | Text compression and expansion method and apparatus | |
EP2706466A1 (en) | Extraction method, information processing method, extraction program, information processing program, extraction device, and information processing device | |
Severance | A practitioner's guide to data base compression tutorial | |
US4626824A (en) | Apparatus and algorithm for compressing and decompressing data | |
JP2581661B2 (en) | Text information communication system | |
US5815096A (en) | Method for compressing sequential data into compression symbols using double-indirect indexing into a dictionary data structure | |
US6507877B1 (en) | Asynchronous concurrent dual-stream FIFO | |
US5406281A (en) | Encoder/decoder and method for efficient string handling in data compression | |
US6535150B1 (en) | Method and apparatus for implementing run-length compression | |
US6292115B1 (en) | Data compression for use with a communications channel | |
JP3599867B2 (en) | Method of processing text data representing textual information to guide a user in controlling various functions of a system | |
US5915041A (en) | Method and apparatus for efficiently decoding variable length encoded data | |
EP0721699B1 (en) | Method and apparatus for a unique and efficient use of a data structure for compressing data | |
Ahmed et al. | Efficient taxa identification using a pangenome index | |
US5406280A (en) | Data retrieval system using compression scheme especially for serial data stream | |
Yokoo | An adaptive data compression method based on context sorting | |
Ong | Text compression for transmission and storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE FR GB |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): DE FR GB |
|
17P | Request for examination filed |
Effective date: 19960209 |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
17Q | First examination report despatched |
Effective date: 19990111 |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 69421966 Country of ref document: DE Date of ref document: 20000113 |
|
ET | Fr: translation filed | ||
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20000612 Year of fee payment: 7 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20020228 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20130529 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20130628 Year of fee payment: 20 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R071 Ref document number: 69421966 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R071 Ref document number: 69421966 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 Expiry date: 20140627 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20140627 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20140701 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 69421966 Country of ref document: DE Representative=s name: GRUENECKER, KINKELDEY, STOCKMAIR & SCHWANHAEUS, DE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20150115 AND 20150121 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R082 Ref document number: 69421966 Country of ref document: DE Representative=s name: GRUENECKER PATENT- UND RECHTSANWAELTE PARTG MB, DE Effective date: 20150126 Ref country code: DE Ref legal event code: R081 Ref document number: 69421966 Country of ref document: DE Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, REDMOND, US Free format text: FORMER OWNER: MICROSOFT CORP., REDMOND, WASH., US Effective date: 20150126 |