CN108509505A - A kind of character string retrieving method and device based on subregion even numbers group Trie - Google Patents

A kind of character string retrieving method and device based on subregion even numbers group Trie Download PDF

Info

Publication number
CN108509505A
CN108509505A CN201810179880.1A CN201810179880A CN108509505A CN 108509505 A CN108509505 A CN 108509505A CN 201810179880 A CN201810179880 A CN 201810179880A CN 108509505 A CN108509505 A CN 108509505A
Authority
CN
China
Prior art keywords
subregion
character string
even numbers
character
numbers group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810179880.1A
Other languages
Chinese (zh)
Other versions
CN108509505B (en
Inventor
陈文焰
贾连印
丁家满
李孟娟
游进国
章露露
吕晓伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810179880.1A priority Critical patent/CN108509505B/en
Publication of CN108509505A publication Critical patent/CN108509505A/en
Application granted granted Critical
Publication of CN108509505B publication Critical patent/CN108509505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of character string retrieving methods and device based on subregion even numbers group Trie, belong to database technical field.The present invention includes data prediction step, to character string sorting and counts the character string quantity of different initial characters;Index creation step carries out subregion division according to the number of partitions N of input, generates subregion mapping table and is the independent even numbers group Trie index structures of each partition creating;Searching step inputs the character string of retrieval, is retrieved on subregion even numbers group Trie index structures.The present invention can effectively reduce the cost that traditional even numbers group creates the conflict and processing conflict of process, can greatly improve the efficiency and effectiveness of retrieval of index creation by the even numbers group that creates the division.

Description

A kind of character string retrieving method and device based on subregion even numbers group Trie
Technical field
The present invention relates to a kind of character string retrieving methods and device based on subregion even numbers group Trie, belong to database technology Field.
Background technology
In recent years, database field has carried out a large amount of research to the retrieval of character string.Trie is that character is stored on side An orderly tree construction, can be widely used in lexical analyzer, bibliography search, the dictionary of spell check, language model reality The fields such as the existing, lookup of IP routing address, the similarity search of character string or set and connection.There are two types of common Trie at present Storage mode, 1) matrix storage, 2) chain type storage.Above two mode is required to create complete Trie, so as to cause larger Storage overhead, especially data set than it is sparse in the case of.To reduce the space expense of Trie, Aoe, Yata et al. are proposed Even numbers group Trie data structures, the structure using BASE and CHECK two each character strings of storage of array can with it is all other The prefix part that character string is mutually distinguished deposits the suffix portion that TAIL stores character string, when retrieving only with another character array It is related to the access of array and adds two kinds of operations, therefore efficiency is higher.
Nevertheless, the problems such as that there is also space expenses is higher by even numbers group Trie, and index creation expense is big, there is more be directed to Some optimizing research that DAT is carried out, for example, during index creation, the more nodes of priority processing branch, this optimization Strategy can improve the utilization in space, but its comparison ramifying introduces additional expense, to reduce even numbers group Trie The efficiency of index creation;Also CDA (Compression Double Array, a kind of even numbers group Trie that space is more compressed) Character information is stored in and compresses memory space in CHECK arrays, but this method needs additional expense to meet BASE The uniqueness of value also results in the reduction of index creation efficiency;On the basis of CDA, and there is scholar to propose odd number group Trie Structure, which removes BASE arrays, CHECK storage of array character informations is used only, but this method is mainly used in Such as character string of postcode regular length.
The above-mentioned optimization algorithm to DAT focuses mostly in maximum compression memory space, often leads to it and creates the big of efficiency Width reduces.
And dissected from the establishment process of even numbers group Trie indexes, find index establishment process along with conflict ( In even numbers group Trie be inserted into character string when, different father positions fight for same sub- position the case where be known as conflict) generation, and rush Prominent number increased with the increase of character string quantity.
Invention content
The technical problem to be solved in the present invention is to provide a kind of character string retrieving method based on subregion even numbers group Trie with Device, it is therefore intended that the quantity to conflict during DAT index creations and the expense for solving conflict are reduced, to improve DAT ropes The efficiency that guiding structure creates;It effectively solves the problems, such as to cause DAT index creations efficiency drastically to decline because data volume increases, simultaneously Improve the recall precision of DAT.
The technical scheme is that:A kind of character string retrieving method based on subregion even numbers group Trie, including following step Suddenly:
Data prediction step:The character string quantity of different initial characters is ranked up and counted to string data collection;
Index creation step:According to the number of partitions N of input, the division of subregion is carried out, regeneration Composition Region mapping table is (following Abbreviation PMT, so-called PMT are a kind of mapping relations between prefix and subregion) and be the independent even numbers group of each partition creating Trie (double array trie, hereinafter referred to as DAT) index structure;
Searching step:The character string for inputting retrieval, is retrieved on subregion DAT index structures.
Data prediction step is divided into two steps:
Step 110:Lexcographical order ascending sort is carried out to string data collection;
Step 111:Count the character string quantity of different initial characters.
Index creation step executes as follows:
Step 210:The division of subregion;
Step 220:Generate PMT;
Step 230:The establishment of subregion DAT index structures.
The step 210, executes as follows:
Step 211:To given number of partitions N, N<M, m represent the quantity of independent initial character, determine that N-1 cut-off rule can Data set is balancedly divided into N number of subregion;
Step 212:Cut-off rule is adjusted according to common prefix characteristic, if certain part has the data of common prefix (for example, first Character is all data of " b ") two parts are divided by certain cut-off rule, then by the cut-off rule move to an off the cut-off rule it is nearest should The edge of partial data ensures that the identical character string of initial character is divided in the same subregion.
The step 220 specifically executes step:
Build PMT according to data set actual division situation, in mapping table each list item by<Character string initial character, subregion Number>Composition, i.e., be mapped to corresponding subregion by the initial character of character string.
The step 230, executes as follows:
Step 231:To a character string in subregion DAT to be inserted into, is mapped, obtained in PMT according to its initial character Take the subregion that it to be inserted into;
Step 232:Character string is inserted into corresponding subregion according to the formula for creating DAT indexes, for being inserted into character " c " is transformed into state t from state s, and formula is:
BASE [s]+CODE [c]=t (1)
CHECK [t]=s (2)
Wherein CODE [c] indicates the numeric coding of character c, for English character, character " # ", " a ", " b ", The encoded radio of " c " " z " corresponds to 1,2,3,427 respectively.
The searching step is divided into two steps:
Step 310:A character string to be retrieved is given, its initial character is taken to be mapped in PMT, it is right to obtain its The subregion answered;
Step 320:The searching algorithm of DAT is executed in corresponding subregion, and returns to retrieval result.
A kind of string search device based on subregion even numbers group Trie, including:
Data preprocessing module:The character string quantity of different initial characters is ranked up and counted to string data collection;
Index creation module:According to the number of partitions N of input, the division of subregion is carried out, regenerate Composition Region mapping table and is The independent even numbers group Trie index structures of each partition creating;
Retrieve module:The character string for inputting retrieval, is retrieved on subregion DAT index structures.
The beneficial effects of the invention are as follows:The conflict that even numbers group trie creates process can be greatly decreased, to improve index wound Build the efficiency inquired with character string.
Description of the drawings
Fig. 1 is that the present invention is based on the search index functional block diagrams of subregion even numbers group Trie
Fig. 2 is the subregion mapping table of " bachelor# " of the invention, " badge# ", " baby# ", " jack# ", " jar# "
Fig. 3 is the initialization figure of even numbers group Trie of the present invention
Fig. 4 is the reduced trie and even numbers group schematic diagram after present invention insertion " baby# "
Fig. 5 is the reduced trie and even numbers group schematic diagram after present invention insertion " bachelor# "
Fig. 6 is the reduced trie and even numbers group schematic diagram after present invention insertion " badge# "
Fig. 7 is the reduced trie and even numbers group schematic diagram of establishment after subregion of the present invention
Fig. 8 is the comparison figure of the index creation time of DAT and DO-DAT of the present invention
Fig. 9 is the comparison figure of the data amount of movement of DAT and DO-DAT of the present invention
Figure 10 is the influence diagram of number of partitions index creation time of the present invention
Figure 11 is influence diagram of the number of partitions of the present invention to number of collisions
Figure 12 is influence diagram of the number of partitions of the present invention to detection BASE value length
Figure 13 is the comparison figure of different index Structure Creating time of the present invention
Figure 14 is the comparison figure of different index structure retrieval time of the present invention
Figure 15 is the comparison figure of different index structure memory space of the present invention
Specific implementation mode
With reference to the accompanying drawings and detailed description, the invention will be further described.
Embodiment 1:A kind of character string retrieving method based on subregion even numbers group Trie, including:
Data prediction step is ranked up data set by lexcographical order ascending order, for data set K= { " bachelor# ", " jar# ", " badge# ", " baby# ", " jack# " }, here in order to distinguish as " the ", " then " in this way Character string, the end that " # " is specially added behind each character string as each character string marks, then sorted set Ko={ " baby# ", " bachelor# ", " badge# ", " jack# ", " jar# " }.
The division of subregion, according to initial character carry out subregion, you can be divided into K1=" baby# ", " bachelor# ", " badge# " } and two subregions of K2={ " jack# ", " jar# " }.
The mapping table of the subregion generated at this time is as shown in Figure 2.
General T rie structures need to store entire character trail K, this needs larger storage overhead.To reduce space expense, Only need to store the prefix part that can be mutually distinguished with all other character string by each character string in DAT (such as in " bachelor# " Prefix " bac " can mutually be distinguished with other all character strings), i.e. parts reduced trie.DAT is isometric one-dimensional whole by two Number array BASE and CHECK and the character array TAIL compositions for storing suffix portion, wherein BASE storage of array state shift Base value, CHECK storage of array check values, for detecting state transfer correctness.For DAT, BASE and CHECK are word The prefix part that symbol string has indexed, TAIL is the suffix portion not indexed.In DAT, one character " c " of input turns from state s The state t of moving on to need to meet following two relational expressions:
BASE [s]+CODE [c]=t (1)
CHECK [t]=s (2)
Wherein, CODE [c] indicates the numeric coding of character " c ", for English character, character " # " " a " " b " The encoded radio of " c " " z " corresponds to 1,2,3,427 respectively.
To array index i, BASE [i] and CHECK [i] are 0 to show that the position is sky, for can be whole when BASE values are negative Only state takes position of the suffix portion of its absolute value pointing character string in TAIL arrays.
The establishment of subregion DAT index structures, take the character string being inserted into and PMT shown in Fig. 2 mapped known to should general K1={ " baby# ", " bachelor# ", " badge# " } is inserted into No. 1 subregion, and K2={ " jack# ", " jar# " } is inserted into To No. 2 subregions.
The following detailed description of the establishment process of set K1 even numbers groups Trie.
The initialization of even numbers group Trie is as shown in figure 3, wherein the value of POS shows when the preceding position for being inserted into character to TAIL arrays It sets.
Character string " baby# " is inserted into No. 1 subregion, is divided into following steps:
Step A1:The establishment of index is proceeded by from even numbers group BASE arrays position 1, the encoded radio of " b " is 3, then Just have:
BASE [1]+" b "=BASE [1]+3=1+3=4, and CHECK [4]=0 ≠ 1
Step A2:CHECK values show to be inserted into remaining character string to TAIL arrays for 0, and being inserted into " b " at this time can Unique identification " baby# ", then by remaining part " aby# " from being sequentially inserted into from POS=1 in TAIL arrays.
Step A3:Setting
BASE [4] ←-POS=-1
Show the absolute value for the position i.e. BASE [4] that remaining character string starts to read in TAIL arrays.
Update
POS=1+length (" aby# ")=1+4=5
It updates again
CHECK [4]=1
Show that node 4 is child's node that the i.e. node 4 redirected from node 1 is node 1, the reduce constructed at this time Trie and even numbers group are as shown in Figure 4
Character string " bachelor# " is inserted into No. 1 subregion:
Step B1:The establishment of index is proceeded by from even numbers group BASE arrays position 1, the encoded radio of " b " is 3, then has:
BASE [1]+" b "=BASE [1]+3=1+3=4, and CHECK [4]=1
Non-zero CHECK values show to have existed the side from node 1 to node 4.
Step B2:It needs to index to distinguish the two character strings in more characters to even numbers group, then node 4 will be made For the base value of state transfer, and BASE [4]=- 1 at this time shows that inquiry being over.By the value of current BASE [4], there are one In temporary variable TEMP, accessing X_CHECK (LIST), (X_CHECK (LIST) function is to return to minimum integer q, q to meet q>0 And CHECK [q+c]=0 finds an empty position, and c is the character in LIST, and the value of q is always incremented by since 1) function is simultaneously A new base value is found for BASE [4].
TEMP ← BASE [4]=- 1
Step B3:A new base value is found for BASE [4], new base value will meet is inserted into a sky by character " a " Position, the encoded radio of " a " is 2, so accessing X_CHECK (LIST), (X_CHECK (LIST) function is to return to minimum integer Q, q meet q>0 and CHECK [q+c]=0 finds an empty position, and c is that the character in LIST needs to index even numbers group In character, the value of q is always incremented by since 1) function and find a new base value for BASE [4].
CHECK [q+ " a "]=CHECK [1+2]=CHECK [3]=BASE [3]=0
An empty position is found, the q values of return are for 1
BASE [4]=1
Step B4:Character " b ", " c " are indexed in even numbers group, X_ is accessed with differentiation " baby# ", " bachelor# " CHECK (LIST) function finds suitable empty position and is inserted into character " b ", and " c ", as BASE [3] find a suitable base value The transfer of carry out state:
CHECK [q+ " b "]=CHECK [1+3]=CHECK [4] ≠ 0, q=1 is unavailable
CHECK [q+ " b "]=CHECK [2+3]=CHECK [5]=0, q=2 is available
CHECK [2+ " c "]=CHECK [2+4]=CHECK [6]=0, q=2 is available, then
BASE [3]=2
Step B5:It indexes in " b " to even numbers group:
BASE [3]+" b "=2+3=5
It enables
CHECK [5]=3
BASE [5] ← TEMP=-1
It indexes in " c " to even numbers group:
BASE [3]+" c "=2+4=6
It enables
BASE [6] ←-POS=-5
CHECK [6]=3
Step B6:It updates again
POS=5+length (" helor# ")=5+6=11
The reduce trie and even numbers group constructed at this time is as shown in Figure 5;
It is inserted into character string " badge# " to No. 1 subregion:
Step C1:The establishment of index is proceeded by from even numbers group BASE arrays position 1, the encoded radio of " b " is 3, then has:
BASE [1]+" b "=1+3=4 and CHECK [4]=1
BASE [4]+" a "=1+2=3 and CHECK [3]=4
BASE [3]+" d "=2+5=7 and CHECK [7]=0 ≠ 3
Step C2:CHECK values are 0 to show to be inserted into remaining character string to TAIL arrays, from POS=11 according to In secondary insertion " ge# " to TAIL arrays.
Step C3:It enables
BASE [7] ←-POS=-11
CHECK [7]=3
Step C4:It updates again
POS=11+length (" ge# ")=11+3=14
The reduced trie and even numbers group constructed at this time is as shown in Figure 6.
Set K2 creates process of the establishment process with reference to above-mentioned set K1 establishments DAT of DAT, what set K1, K2 created Reduced trie and even numbers group are as shown in Figure 7.
Referring again to the retrieving of subregion DAT, by taking retrieval " bachelor# " as an example.
Take particular prefix, in the present embodiment i.e. initial character " b ", look into PMT shown in Fig. 2, it is known that should in No. 1 subregion into Row retrieval.
Step D1:Retrieved since root node, i.e., at the even numbers group position BASE [1] retrieve.
Step D2:Retrieve first character " b "
BASE [1]+" b "=1+3=4 and CHECK [4]=1
Step D3:Retrieve second character " a "
BASE [4]+" a "=1+2=3 and CHECK [3]=4
Step D4:Inquire third character " c "
BASE [3]+" c "=2+4=6 and CHECK [6]=3
And BASE [6]=- 5<0
Negative value is retrieved, shows the poll-final in BASE and CHECK arrays, is only needed at this time corresponding in TAIL arrays The suffix portion of character string is read in position, i.e., reads the surplus of " bachelor# " at-BASE [6]=5 in TAIL arrays Remaining part point " helor# ".
Based on the string search device of subregion even numbers group Trie, including:
Data preprocessing module:The character string quantity of different initial characters is ranked up and counted to string data collection;
Index creation module:According to the number of partitions N of input, the division of subregion is carried out, regenerate Composition Region mapping table and is Each partition creating independent even numbers group Trie (double array trie, hereinafter referred to as DAT) index structure;
Retrieve module:The character string for inputting retrieval, is retrieved on subregion DAT index structures.
Validity to illustrate the invention, the present embodiment carry out comparative sorting to the DAT ropes with the time overhead of index creation Draw the influence of establishment;Using the time overhead of index creation and retrieval time as index, to compare shadow of the number of partitions to the invention It rings, as described below.
Experimental data set:183361 respective character strings in DBLP data set titles are extracted to create index, character string Minimum length be 1, maximum length 49, average length 8.6.And retrieved with the data set, investigate what retrieval executed Efficiency.
Experimental result:(wherein DO-DAT indicates lexcographical order to unsorted and sequence index creation time overhead as shown in Figure 8 DAT after ascending sort), as seen from the figure, after sequence the index creation time of DAT reduce about compared to unsorted DAT 12.4%.The main reason is that sequence reduces the data volume of required movement, as shown in Figure 9.
The time overhead of subregion index creation is as shown in Figure 10, as seen from the figure, creates the time overhead of index with subregion Increasing for quantity and substantially reduce.In extreme circumstances, not subregion when index creation time be about 105s, and when subregion be 20 When, the time for creating index is only 6.7s, and the improved efficiency of index creation is more than 15 times.The main reason is that subregion DAT is reduced The quantity of conflict and solve conflict expense, that is, BASE detection length reduction, as a result respectively as is illustrated by figs. 11 and 12. When number of partitions is 10, preferable experiment effect is obtained, the time of index creation, which declines, tends towards stability.
For the convenience of description, subregion even numbers group Trie index structures are referred to as DO-PDAT, to verify the effect of DO-PDAT Rate, by its also with the index structures such as DAT, DO-DAT, DO-CDA (lexcographical order ascending sort corresponding CDA) from index creation when Between, query time and memory space etc. compared.Respectively select DBLP data sets in preceding 50,000,100,000,150,000 and Alphabet string indexes to create, and the partitioned parameters of DO-PDAT are set as 10.
The result of index creation time is as shown in figure 13, and DO-CDA ensures the unique of BASE values because needing additional expense Property, therefore index creation is less efficient.And DO-PDAT takes full advantage of the advantage of lexcographical order and subregion, therefore with highest Efficiency.
On query time, the character string for creating index is considered into the efficiency of inquiry as inquiry.Experimental result is as schemed Shown in 14, there is no the search algorithms for changing DAT by CDA, so the efficiency of inquiry is almost consistent with DAT, and the inquiry of DO-PDAT Efficiency is far above other index structures.
On memory space, experimental result is as shown in figure 15.CDA optimizes DAT from the angle of compression stroke, So have better compression effectiveness, and there is no have significant increase than DAT in space utilization by DO-PDAT, DO-DAT.
The specific implementation mode of the present invention is explained in detail above in association with attached drawing, but the present invention is not limited to above-mentioned Embodiment within the knowledge of a person skilled in the art can also be before not departing from present inventive concept Put that various changes can be made.

Claims (8)

1. a kind of character string retrieving method based on subregion even numbers group Trie, it is characterised in that include the following steps:
Data prediction step:The character string quantity of different initial characters is ranked up and counted to string data collection;
Index creation step:According to the number of partitions N of input, the division of subregion is carried out, regenerates Composition Region mapping table, abbreviation PMT, And it is the independent even numbers group Trie index structures of each partition creating, abbreviation DAT index structures;
Searching step:The character string for inputting retrieval, is retrieved on subregion DAT index structures.
2. the character string retrieving method according to claim 1 based on subregion even numbers group Trie, which is characterized in that its data Pre-treatment step is divided into two steps:
Step 110:Lexcographical order ascending sort is carried out to string data collection;
Step 111:Count the character string quantity of different initial characters.
3. the character string retrieving method according to claim 1 based on subregion even numbers group Trie, which is characterized in that it is indexed Foundation step executes as follows:
Step 210:The division of subregion;
Step 220:Generate PMT;
Step 230:The establishment of subregion DAT index structures.
4. the character string retrieving method according to claim 3 based on subregion even numbers group Trie, which is characterized in that the step Rapid 210, it executes as follows:
Step 211:To given number of partitions N, N<M, m represent the quantity of independent initial character, determine that N-1 cut-off rule can be by number It is balancedly divided into N number of subregion according to collection;
Step 212:Cut-off rule is adjusted according to common prefix characteristic, if certain part has the data of common prefix by certain cut-off rule point For two parts, then the edge that the cut-off rule is moved to an off to the nearest partial data of the cut-off rule ensures that initial character is identical Character string is divided in the same subregion.
5. the character string retrieving method according to claim 3 based on subregion even numbers group Trie, which is characterized in that the step Rapid 220 specific execution steps:
Build PMT according to data set actual division situation, in mapping table each list item by<Character string initial character, partition number>Group At the initial character of character string is mapped to corresponding subregion.
6. the character string retrieving method according to claim 3 based on subregion even numbers group Trie, which is characterized in that the step Rapid 230, it executes as follows:
Step 231:To a character string in subregion DAT to be inserted into, is mapped in PMT according to its initial character, obtain it The subregion to be inserted into;
Step 232:Character string is inserted into corresponding subregion according to the formula for creating DAT indexes, for being inserted into character " c ", It is transformed into state t from state s, formula is:
BASE [s]+CODE [c]=t (1)
CHECK [t]=s (2)
Wherein CODE [c] indicates the numeric coding of character c, for English character, character " # ", " a ", " b ", " c " The encoded radio of " z " corresponds to 1,2,3,427 respectively.
7. the character string retrieving method according to claim 1 based on subregion even numbers group Trie, which is characterized in that the inspection Rope step is divided into two steps:
Step 310:A character string to be retrieved is given, its initial character is taken to be mapped in PMT, it is corresponding to obtain its Subregion;
Step 320:The searching algorithm of DAT is executed in corresponding subregion, and returns to retrieval result.
8. a kind of string search device based on subregion even numbers group Trie, it is characterised in that:Including:
Data preprocessing module:The character string quantity of different initial characters is ranked up and counted to string data collection;
Index creation module:According to the number of partitions N of input, the division of subregion is carried out, Composition Region mapping table is regenerated and is each The independent even numbers group Trie index structures of partition creating;
Retrieve module:The character string for inputting retrieval, is retrieved on subregion DAT index structures.
CN201810179880.1A 2018-03-05 2018-03-05 Character string retrieval method and device based on partition double-array Trie Active CN108509505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810179880.1A CN108509505B (en) 2018-03-05 2018-03-05 Character string retrieval method and device based on partition double-array Trie

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810179880.1A CN108509505B (en) 2018-03-05 2018-03-05 Character string retrieval method and device based on partition double-array Trie

Publications (2)

Publication Number Publication Date
CN108509505A true CN108509505A (en) 2018-09-07
CN108509505B CN108509505B (en) 2022-04-12

Family

ID=63376936

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810179880.1A Active CN108509505B (en) 2018-03-05 2018-03-05 Character string retrieval method and device based on partition double-array Trie

Country Status (1)

Country Link
CN (1) CN108509505B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446198A (en) * 2018-10-16 2019-03-08 中国刑事警察学院 A kind of trie tree node compression method and device based on even numbers group
CN110457531A (en) * 2019-07-23 2019-11-15 昆明理工大学 A kind of parallel by character string querying method based on OpenMP
CN111026978A (en) * 2019-10-14 2020-04-17 平安科技(深圳)有限公司 Position query method and device, computer equipment and storage medium
CN111339381A (en) * 2020-03-06 2020-06-26 昆明理工大学 Method and device for batch query of character strings of dictionary sequence partition double arrays
CN111680489A (en) * 2020-06-10 2020-09-18 腾讯科技(深圳)有限公司 Target text matching method and device, storage medium and electronic equipment
CN116610769A (en) * 2023-07-19 2023-08-18 北京惠每云科技有限公司 Medical data space allocation method and device based on double-array TRIE tree

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1239793A (en) * 1998-06-19 1999-12-29 富士通株式会社 Apparatus and method for retrieving charater string based on classification of character
CN1461444A (en) * 1999-07-20 2003-12-10 英门迪亚公司 System and method for organizing data
CN1776688A (en) * 2005-12-15 2006-05-24 北京金山软件有限公司 Word data searching method
CN1786962A (en) * 2005-12-21 2006-06-14 中国科学院计算技术研究所 Method for managing and searching dictionary with perfect even numbers group TRIE Tree
CN1858747A (en) * 2006-04-30 2006-11-08 北京金山软件有限公司 Data storage/searching method and system
CN103440311A (en) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 Method and system for identifying geographical name entities
CN104102661A (en) * 2013-04-09 2014-10-15 重庆新媒农信科技有限公司 Pinyin stream splitting method and system
CN105912627A (en) * 2016-04-07 2016-08-31 上海斐讯数据通信技术有限公司 Data search system and method
CN107239549A (en) * 2017-06-07 2017-10-10 传神语联网网络科技股份有限公司 Method, device and the terminal of database terminology retrieval
CN107273482A (en) * 2017-06-12 2017-10-20 北京市天元网络技术股份有限公司 Alarm data storage method and device based on HBase

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1239793A (en) * 1998-06-19 1999-12-29 富士通株式会社 Apparatus and method for retrieving charater string based on classification of character
CN1461444A (en) * 1999-07-20 2003-12-10 英门迪亚公司 System and method for organizing data
CN1776688A (en) * 2005-12-15 2006-05-24 北京金山软件有限公司 Word data searching method
CN1786962A (en) * 2005-12-21 2006-06-14 中国科学院计算技术研究所 Method for managing and searching dictionary with perfect even numbers group TRIE Tree
CN1858747A (en) * 2006-04-30 2006-11-08 北京金山软件有限公司 Data storage/searching method and system
CN104102661A (en) * 2013-04-09 2014-10-15 重庆新媒农信科技有限公司 Pinyin stream splitting method and system
CN103440311A (en) * 2013-08-27 2013-12-11 深圳市华傲数据技术有限公司 Method and system for identifying geographical name entities
CN105912627A (en) * 2016-04-07 2016-08-31 上海斐讯数据通信技术有限公司 Data search system and method
CN107239549A (en) * 2017-06-07 2017-10-10 传神语联网网络科技股份有限公司 Method, device and the terminal of database terminology retrieval
CN107273482A (en) * 2017-06-12 2017-10-20 北京市天元网络技术股份有限公司 Alarm data storage method and device based on HBase

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446198A (en) * 2018-10-16 2019-03-08 中国刑事警察学院 A kind of trie tree node compression method and device based on even numbers group
CN110457531A (en) * 2019-07-23 2019-11-15 昆明理工大学 A kind of parallel by character string querying method based on OpenMP
CN110457531B (en) * 2019-07-23 2022-11-01 昆明理工大学 OpenMP-based parallel character string query method
CN111026978A (en) * 2019-10-14 2020-04-17 平安科技(深圳)有限公司 Position query method and device, computer equipment and storage medium
WO2021072874A1 (en) * 2019-10-14 2021-04-22 平安科技(深圳)有限公司 Dual array-based location query method and apparatus, computer device, and storage medium
CN111339381A (en) * 2020-03-06 2020-06-26 昆明理工大学 Method and device for batch query of character strings of dictionary sequence partition double arrays
CN111680489A (en) * 2020-06-10 2020-09-18 腾讯科技(深圳)有限公司 Target text matching method and device, storage medium and electronic equipment
CN111680489B (en) * 2020-06-10 2021-11-19 腾讯科技(深圳)有限公司 Target text matching method and device, storage medium and electronic equipment
CN116610769A (en) * 2023-07-19 2023-08-18 北京惠每云科技有限公司 Medical data space allocation method and device based on double-array TRIE tree
CN116610769B (en) * 2023-07-19 2023-10-10 北京惠每云科技有限公司 Medical data space allocation method and device based on double-array TRIE tree

Also Published As

Publication number Publication date
CN108509505B (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN108509505A (en) A kind of character string retrieving method and device based on subregion even numbers group Trie
CN1552032B (en) Database
JP3771271B2 (en) Apparatus and method for storing and retrieving ordered collections of keys in a compact zero complete tree
US7697518B1 (en) Integrated search engine devices and methods of updating same using node splitting and merging operations
CN1016835B (en) Method and apparatus for search
US8145665B2 (en) Bit string search apparatus, search method, and program
Mehlhorn Dynamic binary search
CN109325032B (en) Index data storage and retrieval method, device and storage medium
CN103365992B (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN110196784A (en) Database and solid magnetic disc (SSD) controller
CN101751406A (en) Method and device for realizing column storage based relational database
CN105335481B (en) A kind of the suffix index building method and device of extensive character string text
CN102148746A (en) Message classification method and system
CN1613073A (en) Enhanced multiway radix tree
Conway et al. Optimal hashing in external memory
Navarro Document listing on repetitive collections with guaranteed performance
US20070094313A1 (en) Architecture and method for efficient bulk loading of a PATRICIA trie
CN111339381A (en) Method and device for batch query of character strings of dictionary sequence partition double arrays
CN1776688A (en) Word data searching method
KR20170065374A (en) Method for Hash collision detection that is based on the sorting unit of the bucket
KR100999408B1 (en) Method for searching an ??? using hash tree
JP3691018B2 (en) Longest match search circuit and method, program, and recording medium
Brisaboa et al. Improved structures to solve aggregated queries for trips over public transportation networks
CN110457531B (en) OpenMP-based parallel character string query method
WO2011073680A1 (en) Improvements relating to hash tables

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant