CN100472526C - Method for storing, fetching and indexing data - Google Patents

Method for storing, fetching and indexing data Download PDF

Info

Publication number
CN100472526C
CN100472526C CNB2006100905678A CN200610090567A CN100472526C CN 100472526 C CN100472526 C CN 100472526C CN B2006100905678 A CNB2006100905678 A CN B2006100905678A CN 200610090567 A CN200610090567 A CN 200610090567A CN 100472526 C CN100472526 C CN 100472526C
Authority
CN
China
Prior art keywords
data
compressed encoding
integer
byte
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2006100905678A
Other languages
Chinese (zh)
Other versions
CN101075237A (en
Inventor
谢海劝
邵荣防
王志平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CNB2006100905678A priority Critical patent/CN100472526C/en
Publication of CN101075237A publication Critical patent/CN101075237A/en
Application granted granted Critical
Publication of CN100472526C publication Critical patent/CN100472526C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A method for storing data includes carrying out byte-complexing and differential-coding as well as compression-coding of integer compression mode on data to be stored according to compression-coding information and then storing processed data into buffer storage region when surplus space is existed in buffer storage region. A data fetching method and a data indexing method utilizing said data storing method and said data fetching method are also disclosed.

Description

A kind of date storage method, method for reading data and data retrieval method
Technical field
The present invention relates to data processing technique, relate in particular to a kind of date storage method, method for reading data and data retrieval method.
Background technology
Along with the fast development of Internet technology, volatile rising tendency has also appearred in the quantity of information that is presented in face of the user, and the data that accumulated in the information database and theme are more and more.In order to guarantee better that the user finds required information in the short period of time from mass data, search engine technique arises at the historic moment.This technology has immeasurable use value and commercial value as professional core such as Webpage search, news search, music searching, picture searching and map search.
Index module is the key component in the search engine, and this module provides Data Source and access interface for keyword retrieval.In other words, index module is carried out the participle operation to raw data, and utilizes indexing means to handle each word that is obtained, and sets up index and writes the file that is used to preserve index data.When retrieve data, the file of reading and saving index data obtains result for retrieval.
The index that industry is weighed retrieval performance mainly comprises data volume and to response time of user's retrieval request.Data volume is big more, the response time is short more, and then retrieval performance is good more.With regard to present case, search engine is handled to be mass data, and for response time of user's retrieval request rank at 0.1 second.And file input and output (IO) action need that is used in the retrieving that file is read and writes expends the long time, is the bottleneck of retrieval performance.In order to improve retrieval performance, usually will put into buffer zone (cache) such as higher or nearest accessed etc. the data of rate of people logging in, when carrying out data retrieval, priority access cache, if can hit result for retrieval, then need not to carry out the file I/O operation, shortened the response time effectively, improve retrieval performance.
In present data retrieval method, will directly not put among the cache usually, in order to retrieval through the data of any processing.Because the physical memory of cache can't unconfinedly enlarge, then under the certain situation in cache space, the data volume that can preserve in it is limited, so the hit rate of result for retrieval is subjected to the restriction in cache space, thereby retrieval performance is lower.
Summary of the invention
In view of this, the invention provides a kind of date storage method, can increase the memory data output in the fixed memory space.In the date storage method of the present invention, set in advance compressed encoding information, expression compression coding mode corresponding data types, when having remaining space in the buffer zone, according to described compressed encoding information, judge whether there is the byte multiplexer mode corresponding data types in the data to be stored, if, then to described data to be stored carry out successively that byte is multiplexing, the compressed encoding of differential coding and integer compress mode correspondence, and the compressed encoding result is kept in the described buffer zone; Otherwise, data to be stored are carried out the compressed encoding of differential coding and integer compress mode correspondence successively, and the compressed encoding result are kept in the described buffer zone.
Wherein, the compressed encoding that data to be stored is carried out byte multiplexer mode is:
From described data to be stored, select single datum length less than 1 byte and can merge to 1 word
Multinomial data in the joint, with selected multinomial data 1 byte representation that goes out, and with byte multiplexer mode
The compressed encoding result replaces the selected multinomial data that go out, and puts into data to be stored.
Wherein, the described compressed encoding that data to be stored are carried out the differential coding mode is:
From described data to be stored, select orderly integer sequence, orderly integer sequence is converted to difference sequence, and the compressed encoding result of differential coding mode is replaced the selected orderly integer sequence that goes out, put into data to be stored.
Preferably, between the compressed encoding of the compressed encoding of described byte multiplexer mode and differential coding mode, further comprise:
According to described compressed encoding information, judge whether there is differential coding mode correspondence in the data to be stored
Orderly integer sequence is if then carry out the compressed encoding of described differential coding mode correspondence; Otherwise,
Carry out the compressed encoding of described integer compress mode.
Wherein, the described compressed encoding that data to be stored are carried out the integer compress mode is:
From described data to be stored, select integer data, actual size according to selected integer data, determine the data length that this integer data is required, represent this integer data according to determined data length, and the compressed encoding result of integer compress mode replaced the selected integer data that goes out, put into data to be stored.
Preferably, between the compressed encoding of the compressed encoding of described differential coding mode and integer compress mode, further comprise:
According to described compressed encoding information, judge the integer data that whether has integer compress mode correspondence in the data to be stored, if then carry out the compressed encoding of described integer compress mode correspondence; Otherwise, carry out described the compressed encoding result is kept at operation in the described buffer zone.
Corresponding to above-mentioned date storage method, the invention provides a kind of method for reading data, can from the cache that stores the data of passing through compressed encoding, get access to raw data.In the method for reading data of the present invention, obtain the compressed encoding information of expression compression coding mode corresponding data types in advance, and data to be read are read from buffer zone, according to compressed encoding information, the data that are read out are carried out the decoding of integer compress mode, differential coding mode and byte multiplexer mode correspondence successively, obtain reading the result.
Wherein said the data that are read out are carried out being decoded as of integer compress mode correspondence:
According to described compressed encoding information, from the data that are read out, select data through the compressed encoding of integer compress mode, be reduced to and carry out integer compression numerical value before, and the decoded result that obtains is replaced selected data, put into the data that are read out.
Preferably, before the decoding of described integer compress mode correspondence, further comprise:
According to compressed encoding information, judge the data that whether exist in the data that are read out through the compressed encoding of integer compress mode, if then carry out the decoding of described integer compress mode correspondence; Otherwise, carry out the decoding of differential mode correspondence.
Wherein, described the data that are read out are carried out being decoded as of differential coding mode correspondence:
According to described compressed encoding information, from the data that are read out, select data through the compressed encoding of differential coding mode, be converted to orderly integer sequence, and the decoded result that obtains replaces selected data, put into the data that are read out.
Preferably, between the decoding of the described integer compress mode correspondence decoding corresponding, further comprise with differential coding:
According to compressed encoding information, judge the data that whether exist in the data that are read out through the compressed encoding of differential coding mode, if then carry out the decoding of described differential coding mode correspondence; Otherwise, carry out the decoding of byte multiplexer mode correspondence.
Wherein, described the data that are read out are carried out being decoded as of byte multiplexer mode correspondence:
According to described compressed encoding information, from the data that are read out, select data through the compressed encoding of byte multiplexer mode, the multinomial data that are compressed in 1 byte are used 1 byte representation respectively, and the decoded result that obtains is replaced selected data, put into the data that are read out.
Preferably, between the decoding of the described differential coding mode correspondence decoding corresponding, further comprise with byte multiplexer mode:
According to compressed encoding information, judge the data that whether exist in the data that are read out through the byte multiplexer mode correspondence, if then carry out the decoding of described byte multiplexer mode correspondence; Otherwise, carry out the described operation that obtains reading the result.
The present invention also provides a kind of data retrieval method, can improve the hit rate of result for retrieval.In the data retrieval method of the present invention, set in advance compressed encoding information, expression compression coding mode corresponding data types, this method may further comprise the steps:
A. when receiving the retrieval request that comes from the user, the data of judging this retrieval request correspondence whether in buffer zone, if, execution in step B then; Otherwise, execution in step C;
B. from buffer zone, read out these data, carry out the decoding of integer compression, differential coding and byte multiplexer mode correspondence successively, decoded result as result for retrieval, is returned to the user, and finish the notebook data retrieval flow according to compressed encoding information;
C. from index file, read the index data of retrieval request correspondence, as result for retrieval, meeting when depositing strategy, according to compressed encoding information, index data is carried out successively that byte is multiplexing, puts into described buffer zone behind the compressed encoding of differential coding and integer compress mode, again result for retrieval is returned to the user.
Wherein, the described compressed encoding that index data is carried out byte multiplexer mode is:
From index data, select single datum length less than 1 byte and can merge to multinomial data in 1 byte, with selected multinomial data 1 byte representation that goes out, and the compressed encoding result of byte multiplexer mode replaced the selected multinomial data that go out, put into described index data;
Describedly carry out being decoded as of byte multiplexer mode correspondence:
According to described compressed encoding information, from the index data that is read out, select data through the compressed encoding of byte multiplexer mode, the multinomial data that are compressed in 1 byte are used 1 byte representation respectively, and the decoded result that obtains replaced selected data, put into the index data that is read out.
Wherein, the described compressed encoding that index data is carried out the differential coding mode is:
From index data, select orderly integer sequence, orderly integer sequence is converted to difference sequence, and the compressed encoding result of differential coding mode is replaced the selected orderly integer sequence that goes out, put into described index data;
Describedly carry out being decoded as of differential coding mode correspondence:
According to described compressed encoding information, from the index data that is read out, select data through the compressed encoding of differential coding mode, be converted to orderly integer sequence, and the decoded result that obtains replaces selected data, put into the index data that is read out.
Wherein, the described compressed encoding that index data is carried out the integer compress mode is:
From index data, select integer data, actual size according to selected integer data, determine the data length that this integer data is required, represent this integer data according to determined data length, and the compressed encoding result of integer compress mode replaced the selected integer data that goes out, put into index data;
Describedly carry out being decoded as of integer compress mode correspondence:
According to described compressed encoding information, from the index data that is read out, select data through the compressed encoding of integer compress mode, be reduced to and carry out integer compression numerical value before, and the decoded result that obtains is replaced selected data, put into the index data that is read out.
Preferably, set in advance the replacement condition that expression allows the index data in the buffer zone to be replaced, described the carrying out of step C further comprises before the compressed encoding:
C01. judge whether there is remaining space in the described buffer zone, if then continue to carry out described compressed encoding; Otherwise, execution in step C02;
C02. according to the replacement condition that sets in advance, judge whether to utilize the described index data that from file, reads to replace former index data in the buffer zone, if, then determine and former index data that deletion is replaced, only need carry out described compressed encoding again; Otherwise, carry out the described operation that result for retrieval is returned to the user.
By such scheme as can be known, in date storage method of the present invention, behind the compressed encoding of modes such as data to be stored are multiplexing through byte, differential coding and integer compression, put into cache.Can reduce the space of data occupancy like this, increase the memory data output of the cache of fixed space.In method for reading data of the present invention, after data are taken out from cache, data are decoded according to the compressed encoding information of knowing in advance, be reduced into raw data, guaranteed the correctness of data read.
In addition, in data retrieval method of the present invention,, reduced every index data occupation space, increased the memory data output of the cache with fixed space because index data is put into cache through after the compressed encoding; When data retrieval, more index data can both find in cache, has improved the hit rate of data retrieval effectively.And, because the number of times of visit cache is more in the retrieving, thereby greatly reduce the probability that from index file, reads index data, reduced consumed time, thereby improved the efficient of data retrieval under the mass data situation effectively because of execute file IO.In addition, when from index file, reading index data,, then this index data is deposited among the cache if this index data meets the requirement of depositing strategy in the search engine, replacing condition, thereby can upgrade cache according to user's Search Requirement in time, guarantee the high-level efficiency of data retrieval.
Description of drawings
To make clearer above-mentioned and other feature and advantage of the present invention of those of ordinary skill in the art by describe exemplary embodiment of the present invention in detail with reference to accompanying drawing below, in the accompanying drawing:
Fig. 1 is the process flow diagram of date storage method in the embodiment of the invention;
Fig. 2 is the process flow diagram of data read storage means in the embodiment of the invention;
Fig. 3 is the exemplary process diagram of data retrieval method among the present invention;
Fig. 4 is the process flow diagram of data retrieval method in the embodiment of the invention.
Embodiment
For making purpose of the present invention, technical scheme clearer, below with reference to the accompanying drawing embodiment that develops simultaneously, the present invention is described in further detail.
The invention provides a kind of date storage method, its basic thought is: when having remaining space in the buffer zone, data are carried out depositing in the internal memory behind the compressed encoding.
The mode of compressed encoding comprises that integer compression, differential coding, byte are multiplexing, and the execution sequence of above-mentioned three kinds of compression coding modes is: it is multiplexing at first to carry out byte, then carries out differential coding, carries out the integer compression at last.In the application of reality, since the type decided of data these data can be performed the compressed encoding of above-mentioned which kind of mode, like this, as long as determine the total data type that data to be stored comprise, can determine the compression coding mode carried out and the Data Position of every kind of mode correspondence.Here, each compression coding mode and corresponding data types are known as compressed encoding information.
Fig. 1 shows the process flow diagram of date storage method in the embodiment of the invention.As shown in Figure 1, the date storage method in the present embodiment comprises:
In step 101~102, when in cache, having remaining space, judge whether to exist in the current data to be stored and can carry out the multiplexing data of byte, if, then carry out the compressed encoding of byte multiplexer mode to carrying out the multiplexing data of byte, will be through the multiplexing data of byte and uncompressed coded data as current data to be stored, and execution in step 103; Otherwise, direct execution in step 103.
In the present embodiment, the space of application fixed size is used to preserve data to be stored as cache in internal memory in advance.Because the space among the cache is limited, in the time of therefore can only in cache, having remaining space, can put into data.
Byte is multiplexing to be meant single datum length less than 1 byte and can merge to multinomial data 1 byte representation in 1 byte.Therefore, if include single datum length in the above-mentioned data to be stored less than 1 byte and can merge to multinomial data in 1 byte, then judge to exist and to carry out the multiplexing data of byte; Otherwise, judge not exist and can carry out the multiplexing data of byte.
For example: have font size, capital and small letter, whether increase the weight of and occur the data of four types of positions, wherein the span of font size is 1~32, and data length is 5 bits (bit); The value of capital and small letter is capitalization or small letter, and data length is 1bit; The value that whether increases the weight of is for increasing the weight of or do not increase the weight of, and data length is 1bit; The value that the position occurs is title or text, and data length is 1bit.Because 1 byte comprises 8bit, so, four kinds of above-mentioned data can enough 1 bytes be represented, all take 1 byte and need not every kind of data.
After in determining data to be stored, comprising can byte multiplexing data, can the multiplexing data of byte choose determined, carry out the compressed encoding of byte multiplexer mode again, and replace selecteed data with the multiplexing result of byte, put in the data to be stored, so that the data to be stored that in the subsequent step formed this moment are carried out compressed encoding.
In step 103~104, judge in the current data to be stored and whether have orderly integer data, if, then orderly integer data is carried out the compressed encoding of differential coding mode, will be through the data of differential coding and uncompressed coded data as current data to be stored, and execution in step 105; Otherwise, direct execution in step 105.
In the present embodiment, orderly integer data is meant that a plurality of integer numerical value in the data are the orderly integer sequence form that increases progressively or successively decrease.For example, document is: world cup will be held in Germany in 2006, and we expect the arrival of world cup; Data to be stored are that " world cup " this word appears at the position in the document, and promptly this word is which word in the document, then comprises 3 and 10 two numerical value in these data.Therefore the data to be stored of the above-mentioned type are orderly integer datas.
When carrying out differential coding, orderly integer sequence is converted to difference sequence.Particularly, in order first element in the integer sequence remains unchanged, and each element after this all is expressed as numerical value poor of the former numerical value of this element and last element.For for the 3 and 10 orderly integer sequences of forming, the difference sequence after the conversion is 3,7.After finishing differential coding, the differential coding result is replaced being put into data to be stored by the data of differential coding, so that the data to be stored that in the subsequent step formed this moment are carried out compressed encoding.
In step 105~106, judge in the current data to be stored whether have integer data, if then integer data is carried out the compressed encoding of integer compress mode, with the data behind the compressed encoding and uncompressed coded data as data to be stored, and execution in step 107; Otherwise, direct execution in step 107.
In classic method, adopt fixing data length to represent integer, for example: each integer takies 4 bytes.But in actual conditions, because the numerical value of integer is less, occupation space is less, if still adopt fixing data length, then can cause the waste in space.According to the actual size of integer, determine the required data length of this integer of expression in the present embodiment.For example:, only need 1 byte for integer 1; And, need 2 bytes for integer 1000.
In step 107, data to be stored are deposited among the cache.
So far, finish data compression flow process in the present embodiment.
Above-mentioned data compression method is applicable to such as multiple occasions such as index datastore.In the date storage method in the present embodiment, behind the compressed encoding of modes such as data to be stored are multiplexing through byte, differential coding and integer compression, put into cache.Can reduce the space of data occupancy like this, increase the memory data output of the cache of fixed space.
Correspondingly, in order to obtain raw data from cache, present embodiment also provides a kind of method for reading data.In the application of reality, owing to include the data type of the compressed encoding of carrying out variety of way in the compressed encoding information.Therefore, in data read process, can determine to be compressed coded data, then by obtaining raw data to decoding through the data of compressed encoding according to compressed encoding information.
Fig. 2 shows the exemplary process diagram of method for reading data in the present embodiment.Before this method is carried out, read compressed encoding information in advance, so that read decoding smoothly in the process at follow-up data.As shown in Figure 2, the method for reading data in the present embodiment comprises:
In step 201~202, data to be read are read from cache, judge in the data that are read whether have integer data according to compressed encoding information, if, then integer data is carried out the decoding of integer compress mode correspondence, decoded data are replaced the preceding data of decoding, put into the data that are read out, as current data, and execution in step 203; Otherwise, direct execution in step 203.
Here, after will choosing through the data of integer compression, decode again.When integer data was carried out the decoding of integer compress mode correspondence, the integer data that will read from cache was reduced to the numerical value before the compressed encoding of carrying out the integer compress mode.
In step 203~204, judge the data that whether exist in the current data through differential coding according to compressed encoding information, if, then the data through differential coding are carried out the decoding of differential coding mode correspondence, decoded data are replaced the preceding data of decoding, put into the data that are read out, as current data, and execution in step 205; Otherwise, direct execution in step 205.
Here after determining there are the process data of differential coding, determined data are chosen, decode again.And, the decoding of carrying out differential coding mode correspondence is meant, to remain unchanged through first element in the integer sequence of differential coding, since second element, with the raw value sum of the current numerical value of each element and last element as this element through the corresponding decoded result of differential coding mode.For example: the element that the integer sequence of process differential coding comprises is: 3,7,6; Then at first keep first element 3 constant, the current numerical value of second element is added that the raw value that the numerical value of first element obtains second element is 10, the more current numerical value of the 3rd element is added that the raw value that the raw value of second element obtains the 3rd element is 16.
In step 205~206, judge whether exist in the current data according to compressed encoding information through the multiplexing data of byte, if, then the multiplexing data of process byte are carried out the decoding of byte multiplexer mode correspondence, and execution in step 207; Otherwise, direct execution in step 207.
Here, after determining there are the data multiplexing, these data are chosen, and the multinomial data that will be compressed in 1 byte are used 1 byte representation respectively through byte.
In step 207, obtain raw data.
If do not comprise any data through compressed encoding in the data of reading in the step 201 from cache, then the raw data in this step is the data of being read; If the part in the data of reading is for through the data of compressed encoding, then the raw data in this step is decoded data and combination without the data of decoding; If the data of reading all are the data through compressed encoding, then the raw data in this step is decoded data.
So far, finish data read flow process in the present embodiment.
Above-mentioned date storage method illustrated in figures 1 and 2 and method for reading data can be applied in the data retrieval.
Fig. 3 shows the exemplary process diagram of data retrieval method among the present invention.Referring to Fig. 3, this method comprises:
In step 301, when receiving the retrieval request that comes from the user, whether the data of judging this retrieval request correspondence are in cache, if then execution in step 302; Otherwise, execution in step 303;
In step 302, from cache, take out these data, carry out the decoding of integer compression, differential coding and byte multiplexer mode correspondence according to compressed encoding information, decoded result as result for retrieval, and is finished the notebook data retrieval flow;
In step 303, from index file, read the index data of retrieval request correspondence, as result for retrieval, meeting when depositing strategy, index data is carried out that byte is multiplexing, puts into cache behind the compressed encoding of differential coding and integer compress mode.
Data retrieval method is suitable for multiple indexed mode among the present invention, is for example just arranging index, inverted index etc.
Be example with the inverted index below, the data retrieval method among the present invention is described.
Fig. 4 shows the process flow diagram of data retrieval method in the present embodiment.The position that index data is deposited can be the index file in cache or the disk.What wherein preserve among the cache is to meet the index data of depositing strategy that sets in advance in the search engine, for example: nearest accessed index data, the index data that rate of people logging in is higher etc.In addition, also read the compressed encoding information of the retrieve data correspondence among the cache in the present embodiment in advance, so that after sense data from cache, decode.As shown in Figure 4, the data retrieval method in the present embodiment comprises:
In step 401~402, search engine receives the retrieval request that comes from the user, judges whether the retrieve data of this retrieval request correspondence is stored among the cache, if then execution in step 403; Otherwise, execution in step 404.
In step 403, from cache, read the index data of retrieval request correspondence, and the index data of being read decoded according to compressed encoding information, and with decoded result as result for retrieval, then execution in step 410.
The concrete operations of in this step the index data that comes from cache being decoded are identical with the operation in step 201 shown in Figure 2~207.
In step 404~405, from index file, read the index data of retrieval request correspondence, the index data that reads as result for retrieval, and is judged according to the strategy of depositing that sets in advance whether this index data should be stored among the cache, if then execution in step 406; Otherwise, direct execution in step 410.
In step 406~407, judge whether there is remaining space among the cache, if, behind the compressed encoding of then that current index data is multiplexing through byte, differential coding and integer compress mode, deposit among the cache, and execution in step 410; Otherwise, execution in step 408.
Here step 101 is identical to 107 operation in the method that data are carried out compressed encoding and the data storage flow process shown in Figure 1.
For inverted index, the basic thought of its inverted index is that the record word occurs in which document.The structure of index data is:<T, F t,<D, F D, t,<P 〉 * * *.Wherein, T represents word sign, F tThe document frequency of expression word, D represents document identification, F D, tThe word frequency of expression word in document, P represents the position of word in document, *The corresponding project of expression can have a plurality of values.Can also write down such as font size, literal capital and small letter in addition, whether increase the weight of, come across the word additional information of types such as title or text as index data.The data of each type are integer data in the above-mentioned index data, can carry out the integer compression; The data of position this type of the word of P representative in document are ordered sequence, therefore can carry out differential coding; And that additional information can be performed byte is multiplexing.Behind the compressed encoding in this step, the index data requisite space can obviously reduce.
In step 408, according to the replacement condition that sets in advance, judge whether to utilize the former index data among the current index data replacement cache, if, the former index data then definite and deletion is replaced, behind the compressed encoding of again that current index data is multiplexing through byte, differential coding and integer compress mode, deposit among the cache execution in step 410 in; Otherwise, direct execution in step 410.
For what guarantee to preserve among the cache is the index data of being convenient to retrieve most, present embodiment also sets in advance the replacement condition, be used for not existing under the situation of remaining space at cache, weigh new index data and whether substitute partial index data among the cache, can require according to user's retrieval to upgrade so that guarantee the content among the cache.The replacement condition here can be that access times are more, the access time is nearer etc.
In step 410, return result for retrieval to the user.
So far, finish data retrieval flow process in the present embodiment.
Above-mentioned retrieval flow is arranged as seen,, reduced every index data occupation space, increased the memory data output of the cache with fixed space because index data is put into cache through after the compressed encoding; When data retrieval, more index data can both find in cache, has improved the hit rate of data retrieval effectively.And, because the number of times of visit cache is more in the retrieving, thereby greatly reduce the probability that from index file, reads index data, reduced consumed time, thereby improved the efficient of data retrieval under the mass data situation effectively because of execute file IO.In addition, in the present embodiment when from index file, reading index data, if this index data meets the requirement of depositing strategy in the search engine, replacing condition, then this index data is deposited among the cache, thereby can upgrade cache according to user's Search Requirement in time, guarantee the high-level efficiency of data retrieval.
The above only is preferred embodiment of the present invention, and is in order to restriction the present invention, within the spirit and principles in the present invention not all, any modification of being made, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (18)

1, a kind of date storage method is characterized in that, sets in advance compressed encoding information, expression compression coding mode corresponding data types, and this method comprises:
When having remaining space in the buffer zone, according to described compressed encoding information, judge and whether have the byte multiplexer mode corresponding data types in the data to be stored, if, then described data to be stored are carried out successively the compressed encoding of byte multiplexer mode, differential coding and integer compress mode correspondence, and the compressed encoding result is kept in the described buffer zone; Otherwise, data to be stored are carried out the compressed encoding of differential coding and integer compress mode correspondence successively, and the compressed encoding result are kept in the described buffer zone.
2, the method for claim 1 is characterized in that, the compressed encoding that data to be stored is carried out byte multiplexer mode is:
From described data to be stored, select single datum length less than 1 byte and can merge to multinomial data in 1 byte, with selected multinomial data 1 byte representation that goes out, and the compressed encoding result of byte multiplexer mode replaced the selected multinomial data that go out, put into data to be stored.
3, the method for claim 1 is characterized in that, the described compressed encoding that data to be stored are carried out the differential coding mode is:
From described data to be stored, select orderly integer sequence, orderly integer sequence is converted to difference sequence, and the compressed encoding result of differential coding mode is replaced the selected orderly integer sequence that goes out, put into data to be stored.
4, as claim 1 or 3 described methods, it is characterized in that, between the compressed encoding of the compressed encoding of described byte multiplexer mode and differential coding mode, further comprise:
According to described compressed encoding information, judge the orderly integer sequence that whether has differential coding mode correspondence in the data to be stored, if then carry out the compressed encoding of described differential coding mode correspondence; Otherwise, carry out the compressed encoding of described integer compress mode.
5, the method for claim 1 is characterized in that, the described compressed encoding that data to be stored are carried out the integer compress mode is:
From described data to be stored, select integer data, actual size according to selected integer data, determine the data length that this integer data is required, represent this integer data according to determined data length, and the compressed encoding result of integer compress mode replaced the selected integer data that goes out, put into data to be stored.
6, as claim 1 or 5 described methods, it is characterized in that, between the compressed encoding of the compressed encoding of described differential coding mode and integer compress mode, further comprise:
According to described compressed encoding information, judge the integer data that whether has integer compress mode correspondence in the data to be stored, if then carry out the compressed encoding of described integer compress mode correspondence; Otherwise, carry out described the compressed encoding result is kept at operation in the described buffer zone.
7, a kind of method for reading data is characterized in that, obtains the compressed encoding information of expression compression coding mode corresponding data types in advance, and this method comprises:
Data to be read are read from buffer zone,, the data that are read out are carried out the decoding of integer compress mode, differential coding mode and byte multiplexer mode correspondence successively, obtain reading the result according to compressed encoding information.
8, method as claimed in claim 7 is characterized in that, described the data that are read out is carried out being decoded as of integer compress mode correspondence:
According to described compressed encoding information, from the data that are read out, select data through the compressed encoding of integer compress mode, be reduced to and carry out integer compression numerical value before, and the decoded result that obtains is replaced selected data, put into the data that are read out.
9, as claim 7 or 8 described methods, it is characterized in that, before the decoding of described integer compress mode correspondence, further comprise:
According to compressed encoding information, judge the data that whether exist in the data that are read out through the compressed encoding of integer compress mode, if then carry out the decoding of described integer compress mode correspondence; Otherwise, carry out the decoding of differential mode correspondence.
10, method as claimed in claim 7 is characterized in that, described the data that are read out is carried out being decoded as of differential coding mode correspondence:
According to described compressed encoding information, from the data that are read out, select data through the compressed encoding of differential coding mode, be converted to orderly integer sequence, and the decoded result that obtains replaces selected data, put into the data that are read out.
11, as claim 7 or 10 described methods, it is characterized in that, between the decoding of the described integer compress mode correspondence decoding corresponding, further comprise with differential coding:
According to compressed encoding information, judge the data that whether exist in the data that are read out through the compressed encoding of differential coding mode, if then carry out the decoding of described differential coding mode correspondence; Otherwise, carry out the decoding of byte multiplexer mode correspondence.
12, method as claimed in claim 7 is characterized in that, described the data that are read out is carried out being decoded as of byte multiplexer mode correspondence:
According to described compressed encoding information, from the data that are read out, select data through the compressed encoding of byte multiplexer mode, the multinomial data that are compressed in 1 byte are used 1 byte representation respectively, and the decoded result that obtains is replaced selected data, put into the data that are read out.
13, as claim 7 or 12 described methods, it is characterized in that, between the decoding of the described differential coding mode correspondence decoding corresponding, further comprise with byte multiplexer mode:
According to compressed encoding information, judge the data that whether exist in the data that are read out through the byte multiplexer mode correspondence, if then carry out the decoding of described byte multiplexer mode correspondence; Otherwise, carry out the described operation that obtains reading the result.
14, a kind of data retrieval method is characterized in that, sets in advance compressed encoding information, expression compression coding mode corresponding data types, and this method comprises:
A. when receiving the retrieval request that comes from the user, the data of judging this retrieval request correspondence whether in buffer zone, if, execution in step B then; Otherwise, execution in step C;
B. from buffer zone, read out these data, carry out the decoding of integer compression, differential coding and byte multiplexer mode correspondence successively, decoded result as result for retrieval, is returned to the user, and finish the notebook data retrieval flow according to compressed encoding information;
C. from index file, read the index data of retrieval request correspondence, as result for retrieval, meeting when depositing strategy, according to compressed encoding information, index data is carried out successively that byte is multiplexing, puts into described buffer zone behind the compressed encoding of differential coding and integer compress mode, again result for retrieval is returned to the user.
15, method as claimed in claim 14 is characterized in that, the described compressed encoding that index data is carried out byte multiplexer mode is:
From index data, select single datum length less than 1 byte and can merge to multinomial data in 1 byte, with selected multinomial data 1 byte representation that goes out, and the compressed encoding result of byte multiplexer mode replaced the selected multinomial data that go out, put into described index data;
Describedly carry out being decoded as of byte multiplexer mode correspondence:
According to described compressed encoding information, from the index data that is read out, select data through the compressed encoding of byte multiplexer mode, the multinomial data that are compressed in 1 byte are used 1 byte representation respectively, and the decoded result that obtains replaced selected data, put into the index data that is read out.
16, method as claimed in claim 14 is characterized in that, the described compressed encoding that index data is carried out the differential coding mode is:
From index data, select orderly integer sequence, orderly integer sequence is converted to difference sequence, and the compressed encoding result of differential coding mode is replaced the selected orderly integer sequence that goes out, put into described index data;
Describedly carry out being decoded as of differential coding mode correspondence:
According to described compressed encoding information, from the index data that is read out, select data through the compressed encoding of differential coding mode, be converted to orderly integer sequence, and the decoded result that obtains replaces selected data, put into the index data that is read out.
17, method as claimed in claim 14 is characterized in that, the described compressed encoding that index data is carried out the integer compress mode is:
From index data, select integer data, actual size according to selected integer data, determine the data length that this integer data is required, represent this integer data according to determined data length, and the compressed encoding result of integer compress mode replaced the selected integer data that goes out, put into index data;
Describedly carry out being decoded as of integer compress mode correspondence:
According to described compressed encoding information, from the index data that is read out, select data through the compressed encoding of integer compress mode, be reduced to and carry out integer compression numerical value before, and the decoded result that obtains is replaced selected data, put into the index data that is read out.
As any described method in the claim 14 to 17, it is characterized in that 18, set in advance the replacement condition that expression allows the index data in the buffer zone to be replaced, described the carrying out of step C further comprises before the compressed encoding:
C01. judge whether there is remaining space in the described buffer zone, if then continue to carry out described compressed encoding; Otherwise, execution in step C02;
C02. according to the replacement condition that sets in advance, judge whether to utilize the described index data that from file, reads to replace former index data in the buffer zone, if, then determine and former index data that deletion is replaced, only need carry out described compressed encoding again; Otherwise, carry out the described operation that result for retrieval is returned to the user.
CNB2006100905678A 2006-06-28 2006-06-28 Method for storing, fetching and indexing data Active CN100472526C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100905678A CN100472526C (en) 2006-06-28 2006-06-28 Method for storing, fetching and indexing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100905678A CN100472526C (en) 2006-06-28 2006-06-28 Method for storing, fetching and indexing data

Publications (2)

Publication Number Publication Date
CN101075237A CN101075237A (en) 2007-11-21
CN100472526C true CN100472526C (en) 2009-03-25

Family

ID=38976290

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100905678A Active CN100472526C (en) 2006-06-28 2006-06-28 Method for storing, fetching and indexing data

Country Status (1)

Country Link
CN (1) CN100472526C (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8010704B2 (en) * 2008-05-29 2011-08-30 GM Global Technology Operations LLC Method of efficient compression for measurement data
CN103778203B (en) * 2014-01-13 2018-01-19 中国人民解放军91655部队 A kind of method and system of network management data Lossless Compression storage and retrieval
CN104811265B (en) * 2014-01-29 2018-12-18 上海数字电视国家工程研究中心有限公司 The packaging method and de-encapsulation method of base band frame
JP6344486B2 (en) 2015-12-29 2018-06-20 華為技術有限公司Huawei Technologies Co.,Ltd. Method for compressing data by server and device
CN108038158B (en) * 2017-12-05 2019-11-05 北京百度网讯科技有限公司 Reduce the date storage method of database storage capacity
CN108446304A (en) * 2018-01-30 2018-08-24 上海天旦网络科技发展有限公司 Data block retrieval system and method
CN110516117A (en) * 2019-07-22 2019-11-29 平安科技(深圳)有限公司 Scheme classification type variable storage method, apparatus, equipment and the storage medium calculated
CN117040542B (en) * 2023-10-08 2024-01-12 广东佰林电气设备厂有限公司 Intelligent comprehensive distribution box energy consumption data processing method

Also Published As

Publication number Publication date
CN101075237A (en) 2007-11-21

Similar Documents

Publication Publication Date Title
CN100472526C (en) Method for storing, fetching and indexing data
TWI480744B (en) Search index format optimizations
US10303596B2 (en) Read-write control method for memory, and corresponding memory and server
US20130103655A1 (en) Multi-level database compression
US8255398B2 (en) Compression of sorted value indexes using common prefixes
EP2924594B1 (en) Data encoding and corresponding data structure in a column-store database
US8914718B2 (en) Coding a structured document as a bitstream by storing in memory a reference to an entry in a coding dictionary
CN107577436B (en) Data storage method and device
CN103339624A (en) High efficiency prefix search algorithm supporting interactive, fuzzy search on geographical structured data
CN101894115A (en) Image data processing method of electronic document and device thereof
CN104283567A (en) Method for compressing or decompressing name data, and equipment thereof
CN102893265A (en) Managing storage of individually accessible data units
CN105027071A (en) Managing operations on stored data units
CN105068885B (en) A kind of JPG fragments file access pattern and the method for restructuring
US9665590B2 (en) Bitmap compression for fast searches and updates
US20020040361A1 (en) Memory based on a digital trie structure
CN101551820B (en) Generation method and apparatus for index database of points of interest attribute
CN106909623B (en) A kind of data set and date storage method for supporting efficient mass data to analyze and retrieve
CN101295312B (en) Method for presenting data by table
CN115438114B (en) Storage format conversion method, system, device, electronic equipment and storage medium
US20090259617A1 (en) Method And System For Data Management
JPH10261969A (en) Data compression method and its device
JP5626561B2 (en) Information processing system and data management method thereof
CN1412694A (en) Document library system storage and fetch recording method
US20090292699A1 (en) Nucleotide and amino acid sequence compression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190808

Address after: 518057 Nanshan District science and technology zone, Guangdong, Zhejiang Province, science and technology in the Tencent Building on the 1st floor of the 35 layer

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.