CN109977113A - A kind of HBase Index Design method based on Bloom filter for medical imaging data - Google Patents
A kind of HBase Index Design method based on Bloom filter for medical imaging data Download PDFInfo
- Publication number
- CN109977113A CN109977113A CN201910070748.1A CN201910070748A CN109977113A CN 109977113 A CN109977113 A CN 109977113A CN 201910070748 A CN201910070748 A CN 201910070748A CN 109977113 A CN109977113 A CN 109977113A
- Authority
- CN
- China
- Prior art keywords
- hbase
- data
- bloom filter
- region
- medical imaging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/20—ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
Landscapes
- Health & Medical Sciences (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of HBase multiple index design method based on Bloom filter for medical imaging data, be adopted as each random function and individually distribute the mode of one group of bit vector reducing the false positive False Rate of Bloom filter, and as judge data to be retrieved whether the first step in set.Improved method is proposed to existing HBase secondary index later, the network I/O number of data will be reduced as main optimization point, unique line unit design ensure that tables of data and concordance list can be distributed on the same Region, and devise that a kind of sampling hashing solves Region writes hot issue, to which the characteristic of load balancing be utilized, and recall precision is accelerated to a certain extent.
Description
Technical field
The invention belongs to computer software fields more particularly to it is a kind of for medical imaging data based on Bloom filter
Multiple index design method.
Background technique
With the continuous development of medical information now, data volume is sharply increased, PACS (image archiving and communication system
System) used in relevant database storage scheme be difficult to meet daily storage and retrieval demand.By Hadoop this
The storage and retrieval that the big data distributed platform of sample and novel NoSQL database solve magnanimity medical imaging data has become
Solve the problems, such as this one of effective way.
HBase columnar database belongs to the Hadoop ecosphere, and compatibility is very good, and friendship can be directly written and read with HDFS
Mutually.Meanwhile the scalability of HBase makes user not need the structure for making table in advance, but dynamic is carried out as needed
Extension, solves the problems, such as that relevant database must pre-establish table structure.However, HBase is still deficient in terms of index
Lack, only support major key index, for the column of non-primary key must full table scan, it is inefficient.There are some scholars at present
Research and propose the design method of some key indexes non-master for HBase.It is exemplified below.(1) method of secondary index is constructed.
This method mainly takes the thought of inverted index, and the column that lithol yet to be built in main table is drawn are led as the major key in concordance list
Value of the major key of table as concordance list takes the corresponding major key of related column, then from main table first by concordance list of inquiry
Corresponding row is inquired, although this method is simple, needs to inquire twice, sacrifices some performances.(2) index side is linearized
Method, this method are then to utilize the one-dimensional index technology of HBase by the way that K dimension data is mapped to the one-dimensional space.This method
There is good effect in processing space data, but it is unsatisfactory to handle text data, and medical imaging file is mainly
Number and text data, institute are also not suitable in this way.(3) the double-deck indexing method, this method use global index and local rope
Draw the form matched, to reduce the back end number of inquiry, indexes low layer index from high level and reduce query context.But
Require to introduce double-layer cable row maintenance when being each write-in data, cost is very big, and the specific double-deck index need using
Different data structures is realized more complicated.
Index itself is also a kind of data structure, in order to and he can be divided into two processes to quick location data,
First, judge the data being retrieved whether in set.Second, if navigating to the accurate location of data.Referring initially to first
Step judges an element whether among a length is the set of n, and the most common scheme is exactly to take in this element and set
Element compare in turn, such as sequence list.But the time complexity of this algorithm is O (n), inefficiency.Hash algorithm is
With the bigger array of an index bound come storage element, the keyword of each element pass through between the hash function that sets
Calculating, obtained result is corresponding with array index, this set is stored with this array location.The advantages of using Hash
It is that can be quickly and accurately positioned element, it is only necessary to which the time complexity of O (1), certain this algorithm is it is possible that conflict, just
It is that the keywords of different elements has obtained identical functional value, thereby produces many Conflict solving methods.However this method compares
Waste memory headroom because in the case that data volume is very big, storage array to be also arranged it is especially big.Bu Long filtering is calculated
Method is also the realization mechanism using Hash in principle, and only it has better space efficiency than Hash, and core is random
Change the mapping function in Hash.The biggish bit string structure of a capacity is initially set up, if each keyword in set is passed through
A dry Hash function calculates separately out corresponding hash value, and then these values are used with the length modulus of bit string respectively, is finally exactly
The operation of similar Hash table, corresponding position sets 1 wherein, we are referred to as characteristic value, is briefly exactly by each key
Word corresponds on several positions in bit string, when needing quickly to search some keyword, it is only necessary to be passed through several
Then Hash functional operation is mapped to the correspondence position in bit string, if the correspondence position in bit string is entirely 1, illustrate keyword
With success, if at least one is 0, searches and fail.But there is also defects for this algorithm, may exactly be not belonging to
The element of this set is misjudged to belong to this set, i.e., " false positive ".If there is the element in a non-set, pass through
Value after Hash sets 1 position in bit string, and just to belong to the position of element in set with some identical, just will appear
Erroneous judgement.This invention takes some optimization methods to reduce False Rate.Second step is seen again, if it is decided that element to be checked is being gathered
In after, how to position? this patent uses a kind of improved secondary index method, it can be seen that traditional secondary index
Why inefficient method is, mainly needs to be inquired twice, returns to client wherein needing that result will be inquired for the first time
End, then client initiates one query again, can generate some extra I/O operations in this way, and the speed of network I/O is for interior
It is many slowly for the retrieval rate deposited, as long as so reducing the sacrifice of these I/O time by certain methods
The inquiry velocity of secondary index method is substantially improved.
In conclusion the present invention judged by the grand filter algorithm of improved cloth element to be checked whether in set,
It can rapidly be fed back and very low error rate, later the secondary index scheme by redesigning, it is possible to reduce pass
Deficiency of the secondary index of system in efficiency.So the present invention proposes a kind of new index scheme for combining the two, use
Bit string structure global in the grand filter algorithm of cloth is replaced with each random function and distributes a bit string by the method for bit vector
Structure, preferably to reduce error rate.Meanwhile the data fragmentation by controlling HBase operates, so that concordance list and tables of data
It is physically located in same Region, thus can initiate inquire twice in the same Region, and is saved time-consuming twice
I/O operation.
Summary of the invention
The contents of the present invention:
1. the method based on the grand filter algorithm building HBase index of cloth is proposed, for determining element to be checked whether in rope
Draw in table, and error rate is reduced by optimization, traditional Bloom filter is by element to be checked by multiple hash functions
Value after hash is mapped in the same bit string, and optimization point of the invention is to dissipate element to be checked by multiple hash functions
Value after column is mapped in different bit strings, these bit strings are formed a Vector Groups, are reduced in a manner of suitably increasing memory
Error rate.
2. proposing a kind of design method of improved Hbase secondary index, the coprocessor carried by HBase is allowed
Tables of data and concordance list are physically located together in the same Region, two will needed in traditional HBase secondary index scheme
Secondary I/O operation is reduced to once, greatly improve retrieval rate.
3. proposing a kind of sampling hashing, solve HBase Region writes hot issue, i.e. adjacent number in logic
According to can always write in same or adjacent Region.This method pre-estimates the quantity of the entire Region of HBase, so
Afterwards by the sampling to line unit, hash is assigned to different write data requests on different Region, is solved and is write hot spot and ask
Topic.
The present invention is a kind of Index Design method of integrated form, in view of many merits of the grand filter algorithm of cloth, using optimization
The first step of the grand filter algorithm of cloth as index, while further increasing HBase index using improved secondary index scheme
Retrieval rate.
To achieve the above object, the present invention adopts the following technical scheme that:
Step 1. optimizes traditional grand filter algorithm of cloth using bit vector method, and traditional Bloom filter is will be to
Value of the element after multiple hash functions hash is looked into be mapped in the same bit string, and be will be to be checked for optimization point of the invention
Value of the element after multiple hash functions hash is mapped in different bit strings, these bit strings are formed a Vector Groups, with
The appropriate mode for increasing memory reduces error rate.It is applied in the index of HBase and is filtered as first layer, if to be checked
Element then carries out step 2 in concordance list;If not, jumping to step 4.
Step 2. closes the auto plate separation function of HBase, using the improved secondary index method of the present invention, estimates
The quantity of Region and the split point of Region, then hash the major key of tables of data, averagely divide the model of major key
It encloses, so that the write-in of data not can be concentrated in some hot spot every time, it is each that the assigning to inquiry request of HBase is preferably utilized
Characteristic on a server, it is most important that the uniformity that ensure that tables of data and concordance list, the time for reducing I/O operation disappear
Consumption.
The Coprocessor coprocessor that step 3. is carried using HBase constructs index module, to be checked in step 1
In the case that element is in concordance list, the process inquired twice is carried out in server local by coprocessor, reduces inquiry
Time.
Step 4. returns to query result.
Detailed description of the invention
The optimized flow chart of the grand filter algorithm of Fig. 1 cloth
The optimized flow chart of Fig. 2 secondary index scheme
The pre- slicing algorithm flow chart of Fig. 3
Fig. 4 comprehensive querying flow figure
Specific embodiment
The present invention improves algorithm therein using the basic scheme of mixing Bloom filter and secondary index, and
On the index of application, it is desirable to achieve the purpose that faster retrieving to HBase index.
Traditional Bloom filter is exactly to be not belonging to the element that this is gathered to be misjudged to belong to there is " false positive "
This set, we analyze the probability of this misjudgement.It is assumed that Bloom filter has the bit string of m bit size, each element
The hash function of corresponding k information fingerprint, some are 1 in these m bits certainly, some are 0.Look at that some bit is first
Zero probability.An element is inserted into this Bloom filter, its first hash function can be some in filter
Bit position 1, therefore, the probability that any one bit is set to 1 is 1/m, and the probability that it is still 0 is 1-1/m.For filter
In a specific position, if it is not all arranged to 1 by k hash function of this element, probability is 1 institute of formula
Show:
If being inserted into second element in filter, some specific position is still not set to 1, and probability is public affairs
Shown in formula 2:
If inserting n element altogether now, there are no some position is arranged to 1, probability is shown in formula 3:
In turn, then it is shown in formula 4 in the probability that the latter bit for inserting n element is set to 1:
The bit string for currently assuming that this n element is all placed to Bloom filter suffers, and new one in set
Element, due to the hash function of its information fingerprint be all it is random, its first hash function just hits some
The probability for the bit that value is 1 is exactly above-mentioned probability.One element not in set is misidentified in set, and all Kazakhstan are needed
The uncommon corresponding bit value of function is 1, probability p, as shown in formula 5:
After abbreviation are as follows:
If n is bigger, can be approximated to be:
It is assumed that 16 bits of an element, k=8, then the probability of false positive is probably 5/10000ths.
We discuss the improvement of algorithm below.One key data is after d Hash Function Mapping, length N
Bit string V in certain for 1 probability be d/N.Each function h (i) is independent random, and i value is 1~d, there is a length
For the set S of y, when whole members of set S={ X1, X2, X3...Xy } are m by these Hash Function Mappings to length
Array when, the probability P that a certain position is 1 in this array, as shown in formula 8:
If there is the element K outside some set is mistaken as data set represented by Hash, that is to say, that the element is by institute
Have Hash mapping after as a result, there is h (K)=1.Therefore we obtain error rate Perr, as shown in formula 9:
perr=pd (9)
According to False Rate we it can be concluded that the average judgement time of the grand filtering of cloth is T, as shown in formula 10:
In general, the range N that bit vector indicates is more much bigger than the range y that the number of data source set S indicates, because
If y > N, the error rate of the grand filtering of cloth can be very big, and the process from data Xi by Hash Function Mapping to bit vector is inevitable
There are multiple conflicts, and conflict can only be reduced by the selection of Hash function.This is that the intrinsic characteristic of Hash representation generates
Conflict, such conflict is known as interior conflict by us.And corresponding with interior conflict is outer conflict, due to multiple Hash Function Mappings
Conflict to caused by the same bit vector.As can be seen that the basic reason clashed is because mapping address is inadequate.That energy
Cannot appropriate under the premise of not victim queries performance it increase some address spaces? it based on this idea, is each
Hash function h (i) carries out address of cache using an independent bit vector, to form a Vector Groups V.Assuming that there is one
Data set A, x is some element in A, then the expression of Vector Groups V is as shown in formula 11
Wherein, V (i, j) indicates the jth position in i-th of vector.Assuming that all Hash functions are all random distributions, then
The error rate p of each function address of cache due to caused by interior conflict in its bit string alonenewFor shown in formula 12:
It is improved averagely to determine that time formula is constant, but because error rate pnewBecome smaller, so entire average
Determine that the time also becomes smaller.
The structure of algorithm can be clearly found out in conjunction with attached drawing 1.
Referring again to the improvement project of secondary index.In HBase, the data volume of individual usual table can be very big, therefore single
The data of table can be respectively stored into one or more Region, equally can also be supervised by one or more Region server.
Region has the major key of starting and termination mark, indicates the major key range of this Region, when being written and read, if main
Key meets the major key range of some Region, then this Region will be hit, reads and writes related data.But has a problem in that single
It can be divided after a Region storage to certain size, this is determined by the LSM tree index structure of HBase.So can
Such a case can be will appear, the tables of data on the same Region and the data of concordance list, which may be split, originally assigns to
Different Region.In this way client send an inquiry request, will front and back carry out four I/O operations, for the first time according to
Data query concordance list is inquired, obtains the major key of main table for the second time, main table is inquired according to major key for the third time, the 4th time must to the end
Query result.Although can greatly improve I/O speed now with many outstanding I/O frame such as Netty, such as
Fruit can be reduced such I/O operation number, then bring performance improvement is predictable.Specific implementation is as follows:
The Coprocessor coprocessor that HBase is provided can directly run program on the server, reach a kind of journey
Effect of the sequence in data local runtime.Remaining issues seeks to guarantee main table and concordance list in the same RegionServer
On, by before for the introduction of Region it will be apparent that it is according to its master which Region is data, which be especially stored in,
What the range of key determined, as long as so the major key of main table and the major key of concordance list match, and because the retrieval of major key is abided by
Follow it is most left front sew principle, so only needing the beginning part of the major key of concordance list is identical as the major key of main table.And data
The major key of table requires uniquely, in summary demand, we design the major key of concordance list are as follows: starting line unit+index of region
Name+index value+main table line unit.Start-up portion ensure that with concordance list and main table, ending ensure that in same Region
The major key uniqueness of concordance list.
There are three types of the region sharding method of HBase is basic, one is preparatory zone methods, that is, table foundation before just it is right
The subregion number of table and the corresponding major key range of each Region are configured, and then data are written again.It is for second
Auto-partition method, this method are the partition methods of HBase default, i.e., have just started only one Region, not with data
Disconnected write-in, Region constantly increase, and two equal-sized Region will be split by waiting when increasing to certain volume, then
Data are write to newly-generated Region again, continues to divide, continue.The third method is pressure zone method, that is,
By HBase order line, specific instruction is inputted to control the fragment situation of HBase by force.
If being sequentially written in data using the auto-partition method of default according to the increase of major key, may generating
Region's writes hot issue.After a Region is split into two, the range of major key is also divided into two, and data are write
Enter be according to major key increase sequence, this mean that after write-in always can starting major key it is bigger Region it is enterprising
Row, and originate the smaller Region of major key and be difficult to be written into again, that does not utilize distributed data base thus well
Load balancing characteristic, HBase powerful write performance also will receive influence.
This programme uses the first preparatory fragment method, but solves the problems, such as to write hot spot further through sampling hashing, very
The good characteristic that load balancing is utilized.This method can by HBase provide programming interface realize, but build table it
Preceding needs
Know the Region quantity of tables of data and the split point of each Region, that is, fixes the master of each subregion
Key range.In order to solve the hot issue of writing of Region, this programme devises a kind of sampling hashing.In conjunction with Fig. 3, detailed description
The sampling hashing:
Step 1: estimating Region quantity M.
The quantity of Region has a great impact for the read-write efficiency of HBase entirety, if quantity is too many, memory
Occupancy can be excessively high;If quantity is very little, and concurrent characteristic cannot be utilized well.Therefore, it is necessary to choose industry warp
The reasonable value that many experiments provide is crossed, the value of Region is estimated in conjunction with the size of our tables of data, formula is as follows:
Wherein RSXmx is the memory size of a RegionServer, habse.regionserver.global.mems
The optimal value that tore.size and hbase.hregion.memstore.flush.size uses HBase official to recommend, can be from
HBase official document obtains, and cf is the column family number of tables of data, and the quantity M of Region has been calculated in we in this way.
Step 2: line unit being hashed, the character string of out-of-order is formed
Because we need to retrieve, reversible Encryption Algorithm is preferably selected, this programme uses AES encryption algorithm, will
Major key hash is random character string.
Step 3: a certain number of major keys are taken out in sampling at random, then put it in a set according to ascending sort
Step 4: according to the subregion number M estimated, entire ensemble average being divided, split point is found.
Finally in conjunction with attached drawing 4, entire protocol procedures step is summarized:
Step 1: starting to query request.
Step 2: being parsed and inquired by Coprocessor coprocessor.
Step 3: search index table.
Step 4: being filtered by Bloom filter.If it does, jump to step 5, if there is no jumping to step 6.
Step 5: inquiring main table.
Step 6: returning to final result.
Claims (3)
1. a kind of HBase multiple index design method based on Bloom filter for medical imaging data, which is characterized in that
The following steps are included:
Inquiry request is sent to query service device first by step 1., and HBase Coprocessor coprocessor can parse this
Inquiry request;
Inquiry is first passed through Bloom filter filtering by step 2.;
If step 3. is by filtering, then goes for seeking concordance list;The specific line unit of concordance list is designed as the starting of region
Line unit+index name+value+main table line unit;
The pre- subregion of step 4. oversampling hashing progress Region.
2. the HBase multiple index design based on Bloom filter according to claim 1 for medical imaging data
Method, which is characterized in that step 4 is specific as follows:
Step 1: estimating Region quantity N;
Wherein RSXmx is the memory size of a RegionServer;
Habse.regionserver.global.memstore.size and hbase.hregion.memstore.flush.
Size uses the optimal value of system recommendation, obtains from HBase official document, cf is the column family number of tables of data, has obtained Region
Quantity N;
Step 2: using irreversible cryptographic algorithm, major key is hashed as random character string;
Step 3: a certain number of major keys are taken out in sampling at random, then put it in a set according to ascending sort;
Step 4: according to the subregion number N estimated, entire ensemble average being divided, split point is found.
3. the HBase multiple index design based on Bloom filter according to claim 1 for medical imaging data
Method, which is characterized in that come for each Hash function h i (x) using an independent bit vector when Bloom filter filters
Address of cache is carried out, to form a Vector Groups V;Assuming that there is a data set A, x is some element in A, then vector
Being expressed as follows for group V is shown
Wherein, V (i, j) indicates the jth position in i-th of vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910070748.1A CN109977113A (en) | 2019-01-25 | 2019-01-25 | A kind of HBase Index Design method based on Bloom filter for medical imaging data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910070748.1A CN109977113A (en) | 2019-01-25 | 2019-01-25 | A kind of HBase Index Design method based on Bloom filter for medical imaging data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109977113A true CN109977113A (en) | 2019-07-05 |
Family
ID=67076713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910070748.1A Pending CN109977113A (en) | 2019-01-25 | 2019-01-25 | A kind of HBase Index Design method based on Bloom filter for medical imaging data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977113A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111198886A (en) * | 2019-12-31 | 2020-05-26 | 浙江华云信息科技有限公司 | Method for constructing Hbase secondary index table |
CN111198847A (en) * | 2019-12-30 | 2020-05-26 | 广东奡风科技股份有限公司 | Data parallel processing method, device and system suitable for large data set |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901248A (en) * | 2010-04-07 | 2010-12-01 | 北京星网锐捷网络技术有限公司 | Method and device for creating and updating Bloom filter and searching elements |
CN101958883A (en) * | 2010-03-26 | 2011-01-26 | 湘潭大学 | Bloom Filter and open-source kernel-based method for defensing SYN Flood attack |
US20190005041A1 (en) * | 2016-02-12 | 2019-01-03 | International Business Machines Corporation | Locating data in a set with a single index using multiple property values |
CN109165222A (en) * | 2018-08-20 | 2019-01-08 | 福州大学 | A kind of HBase secondary index creation method and system based on coprocessor |
-
2019
- 2019-01-25 CN CN201910070748.1A patent/CN109977113A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101958883A (en) * | 2010-03-26 | 2011-01-26 | 湘潭大学 | Bloom Filter and open-source kernel-based method for defensing SYN Flood attack |
CN101901248A (en) * | 2010-04-07 | 2010-12-01 | 北京星网锐捷网络技术有限公司 | Method and device for creating and updating Bloom filter and searching elements |
US20190005041A1 (en) * | 2016-02-12 | 2019-01-03 | International Business Machines Corporation | Locating data in a set with a single index using multiple property values |
CN109165222A (en) * | 2018-08-20 | 2019-01-08 | 福州大学 | A kind of HBase secondary index creation method and system based on coprocessor |
Non-Patent Citations (4)
Title |
---|
2K10: "Hbase 学习(九) 华为二级索引(原理)", 《HTTPS://MY.OSCHINA.NET/U/923508/BLOG/413129》 * |
王晓明: "布隆过滤器及其改进算法在分布式环境下的模拟实现", 《中国优秀硕士论文全文数据库》 * |
郭红: "基于协处理器的HBase二级索引方法", 《计算机工程与应用》 * |
马成龙: "基于Hadoop的医学影像存储检索系统的研究与实现", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111198847A (en) * | 2019-12-30 | 2020-05-26 | 广东奡风科技股份有限公司 | Data parallel processing method, device and system suitable for large data set |
CN111198886A (en) * | 2019-12-31 | 2020-05-26 | 浙江华云信息科技有限公司 | Method for constructing Hbase secondary index table |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7805427B1 (en) | Integrated search engine devices that support multi-way search trees having multi-column nodes | |
CN102122285B (en) | Data cache system and data inquiry method | |
US8099421B2 (en) | File system, and method for storing and searching for file by the same | |
US8799267B2 (en) | Optimizing storage allocation | |
JP5466210B2 (en) | Table search device, table search method, and table search system | |
CN103544261B (en) | A kind of magnanimity structuring daily record data global index's management method and device | |
CN1504912A (en) | Performance and memory bandwidth utilization for tree searches using tree fragmentation | |
US20090254523A1 (en) | Hybrid term and document-based indexing for search query resolution | |
CN100458784C (en) | Researching system and method used in digital labrary | |
CN1752980A (en) | Apparatus and method for searching structured documents | |
CN107491487A (en) | A kind of full-text database framework and bitmap index establishment, data query method, server and medium | |
US9262511B2 (en) | System and method for indexing streams containing unstructured text data | |
US20080133494A1 (en) | Method and apparatus for searching forwarding table | |
CN109977113A (en) | A kind of HBase Index Design method based on Bloom filter for medical imaging data | |
CN108475266A (en) | For removing the matching reparation of matching document | |
CN108647266A (en) | A kind of isomeric data is quickly distributed storage, exchange method | |
US7987205B1 (en) | Integrated search engine devices having pipelined node maintenance sub-engines therein that support database flush operations | |
US20140067853A1 (en) | Data search method, information system, and recording medium storing data search program | |
CN113722274A (en) | Efficient R-tree index remote sensing data storage model | |
CN112486988A (en) | Data processing method, device, equipment and storage medium | |
US20170242880A1 (en) | B-tree index structure with grouped index leaf pages and computer-implemented method for modifying the same | |
CN116644069A (en) | Multi-model learning index construction method and system for time sequence database | |
JP3653333B2 (en) | Database management method and system | |
US7822736B2 (en) | Method and system for managing an index arrangement for a directory | |
CN108509585A (en) | A kind of isomeric data real-time, interactive optimized treatment method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190705 |
|
WD01 | Invention patent application deemed withdrawn after publication |