CN100476824C - Method and system for storing element and method and system for searching element - Google Patents

Method and system for storing element and method and system for searching element Download PDF

Info

Publication number
CN100476824C
CN100476824C CN 200610144123 CN200610144123A CN100476824C CN 100476824 C CN100476824 C CN 100476824C CN 200610144123 CN200610144123 CN 200610144123 CN 200610144123 A CN200610144123 A CN 200610144123A CN 100476824 C CN100476824 C CN 100476824C
Authority
CN
China
Prior art keywords
index value
presets
unit
array
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200610144123
Other languages
Chinese (zh)
Other versions
CN1949221A (en
Inventor
彭锦臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Original Assignee
Beijing Kingsoft Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Software Co Ltd filed Critical Beijing Kingsoft Software Co Ltd
Priority to CN 200610144123 priority Critical patent/CN100476824C/en
Publication of CN1949221A publication Critical patent/CN1949221A/en
Application granted granted Critical
Publication of CN100476824C publication Critical patent/CN100476824C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention supplies element storing method. It includes the following steps: using Hash algorithm for all elements in aggregate to generate corresponding index value, compose index value array; storing the array. The invention also supplies element storing system includes computing unit, acquisition unit, and storage unit. Its element searching method includes the following steps: using Hash algorithm for all elements in aggregate to generate corresponding index value, compose index value array; storing the array; inputting searched element; using Hash algorithm to generate its corresponding index value; searching in index value array. The element searching system corresponding with the element searching method includes computing unit, acquisition unit, storage unit, and searching unit.

Description

The method and system of storage element and search the method and system of element
Technical field
The present invention relates to the method and system of storage element and search the method and system of element, particularly a kind of method and system that utilizes the method and system of hash algorithm storage element and search element.
Background technology
Along with development of times, from numerous information the interior Rongcheng of the own needs of screening an important techniques, determine that whether an element is the sub-fraction in this technology in a specific set.
At present, when we need determine that an element is whether in a set, the most conventional way was used the sequential search method exactly.The thought of sequential search is: each order of elements in the element that will search and the set compares one by one, and identical being searched successfully, otherwise searches failure.
If the element (for example the English character string can be arranged according to alphabetic(al) order) that can arrange in order in the set can be earlier carries out ascending order or descending carries out walkthrough to the element of this set, search with binary chop afterwards.The basic thought of binary chop is: the element at first will gathering according to keywords sorts, and secondly the value in element centre position compares in the element that will search and the set, and is identical, then searches successfully; Not etc., then intermediate data is greater than or less than the element that will search, searches in any case and will search in the data of half.
But the search efficiency of above two kinds of lookup methods is not high, if unordered set, the quantity of the time of searching with element in the set is directly proportional; Even if set is orderly, searching when adopting dichotomy is also little than the efficient raising of sequential search, because when the element in the set is a form with character string when existing, just need the value of the single character of compare string string the inside one by one in the process of searching, reduced the efficient of searching like this.
Because mostly the element in the set all is that form with character string exists, and except store character string itself, also needs store character string additional information in addition, for example string length information etc. is unwanted in search procedure but these information all are for we.Therefore these unnecessary information have taken too much resource, have caused the wasting of resources.
In sum, though prior art can be searched element whether in specific set, efficient is lower, and the resource that takies is bigger.
Summary of the invention
The problem to be solved in the present invention be search efficiency not high with take the big problem of resource, and by method of the present invention, original element by special method generation index value, has promptly effectively been reduced the wasting of resources, improved seek rate again.
For solving the problems of the technologies described above, the objective of the invention is to realize by the following method:
All elements uses hash algorithm generation and element index value one to one in will gathering, and forms the index value array; The index value array is preserved.
Wherein, generating index value is the following specific hash algorithm of utilization:
Index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) mod 2 N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets.
Wherein, the index value array of preserving is sorted.
Wherein, described Hash key value is double figures or 31.
For addressing the above problem, the present invention also provides a kind of method of searching element, and this method is specially:
All elements uses hash algorithm generation and element index value one to one in will gathering, and forms the index value array; The index value array is preserved;
The element that input will be searched, the utilization hash algorithm generates and this element corresponding index value, searches this index value in described index value array.
Wherein, generating index value is the following specific hash algorithm of utilization:
Index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) mod 2 N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets.
Wherein, the index value array of preserving is sorted.
Wherein, described Hash key value is double figures or 31.
For realizing said method, the invention provides a kind of system of storage element, this system comprises: computing unit, acquiring unit, storage unit;
Acquiring unit is used to obtain element;
Computing unit be used for will obtain element generate and element index value one to one; Wherein said hash algorithm is: mod 2 index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element; N is the storage bit number of the index value that presets;
The index value that storage unit is used for generating forms the index value array and preserves.
Wherein, this system further comprises sequencing unit;
Sequencing unit is used for the index value array sort that will form.
Method for realizing that element is searched the invention provides a kind of system that is used to search element, and this system comprises: computing unit, acquiring unit, storage unit, search the unit;
Acquiring unit is used to obtain element;
Computing unit be used for will obtain element generate and element index value one to one; Wherein said hash algorithm is: mod 2 index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets;
The index value that storage unit is used for generating forms the index value array and preserves;
Searching the unit is used for searching a certain index value that computing unit generates in the index value array.
Wherein, this system further comprises sequencing unit;
Sequencing unit is used for the index value array sort that will form.
Owing to when storage element itself, also will store the length information of a string in the prior art.With the English word is example, and when the average length of word was 4 to 5 characters in the set, storage need take 8 to 10 bytes, and the present invention utilizes the element of hash algorithm in will gather to be stored as the form of index value, only needs to store an integer, i.e. 4 bytes.Therefore adopt the present invention the memory capacity of the thing of same content can be reduced to 1/2nd to 1/3rd of prior art.
And when the value of K was better, identical situation appearred in the key value that can farthest avoid hash algorithm to calculate, thereby improved the accuracy rate of searching element.
When having formed element with the index value array of index value storage, when searching owing to be comparison to integer, and prior art need compare each character in the character string, when if average each character string has 4 to 5 characters, utilization the present invention can improve seek rate four to five times.
In sum, the present invention contrasts prior art, has reduced resource shared when element is stored, and has improved the efficient when searching.
Description of drawings
Fig. 1 is one embodiment of the invention process flow diagram;
Fig. 2 is one embodiment of the invention process flow diagram;
Fig. 3 is one embodiment of the invention process flow diagram;
Fig. 4 is one embodiment of the invention system diagram;
Fig. 5 is one embodiment of the invention system diagram;
Fig. 6 is one embodiment of the invention system diagram.
Embodiment
The problem to be solved in the present invention be search efficiency not high with take the big problem of resource, and by method of the present invention, original element by special method generation index value, has promptly effectively been reduced the wasting of resources, improved seek rate again.
For reaching above-mentioned effect, introduce the bright implementation procedure of this law below in detail.
For example all elements in certain set all is the English character string.Owing in these English character strings of storage, also need to preserve the additional information except that the English character string at present, really do not need but these additional informations are for we, so these additional informations taken too much resource.For this reason, the present invention adopts these character strings is changed, and realizes by the hash algorithm that the present invention provides.
Referring to Fig. 1, this figure is the flow process of store character string of the present invention.
Step 100: obtain the element in the set, an English character string in the set is calculated with specific hash algorithm, obtain and this English character string index value one to one.
Described specific hash algorithm is:
Suppose element character string N=N0N1N2......N (M-1);
M is the character number of character string N;
Hash key assignments=K;
Show that by experimental data repeatedly it is better relatively that K gets 31 o'clock effect.
Index value I=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) mod 2 N
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets; If obtain index value is 32 integer data, and N just is taken as 32 so.
Step 101: this index value is saved in the interim index value array.
Step 102: every other English character string execution in step 100,101 successively all in the pair set.
Step 103: store this index value array.
Give a concrete illustration below the computation index value in this storage means is described in detail.
With character string N=hello is example, searches to obtain its unicode coding and be from pre-set Hash table: h:104; E:101; L:108; O:111;
Situation 1: get K=31, N=32;
I=(((((h*31+e) * 31+l) * 31+l) * 31+o)) mod 2 so 32
=(((((h*31+101)*31+108)*31+108)*31+111))mod 2 32
=99162322 (10 systems)
=d218e905 (16 system)
3 groups of experimental datas enumerating below are based on an identical set, the distribution situation of the index value I that calculates with different K values.
Numerical value in the table is respectively the index value I that calculates when different K values.
Table 1K=53, distribution situation
0 0 2
0 26 0
234 15 807
1713 3473
Table 2K=31, distribution situation
0 0 2
0 26 56
178 723 423
1246 3616
Table 3K=389, distribution situation
0 0 2
0 0 27
0 8 351
1053 4829
By above-mentioned table 1 to the experimental data situation that other conditions are identical as can be seen of table 3, when K gets different values, the distribution situation difference of index value I.
Table 4 different K values distribution situation statistical form
Figure place in the table 4 refers to the figure place of index value I; Number refers to that index value I is with respect to the number of isotopic number not.We can clearly find out at K and got 31 o'clock by table 4, and the not isotopic number numerical value number of index value I is with respect to other K value distribution uniform.
Though formed the index value array that constitutes by index value in the above step, the element in its numerical value is not sorted, when the element in needing pair set carries out index so, will the efficient of index be exerted an influence.Because obviously will be higher than index efficient to the index efficient of subordinate ordered array, therefore can before the step 103 of above-mentioned flow process, index value numerical value be sorted to unordered array.
Corresponding to said method, the present invention also provides a kind of system that is used for storage element, comprises referring to this system of Fig. 4: computing unit (01), acquiring unit (03), storage unit (02);
Acquiring unit (03) is used to obtain element, and the element that for example obtains is various ways such as English character string;
Computing unit (01) be used for will obtain element generate and element index value one to one;
The index value that storage unit (02) is used for generating forms the index value array and preserves.
Referring to Fig. 2, this process step is specific as follows.
Step 200: obtain the element in the set, an English character string in the set is calculated with specific hash algorithm, obtain and this English character string index value one to one.
Described specific hash algorithm is:
Suppose element character string N=N0N1N2......N (M-1);
M is the character number of character string N;
Hash key assignments=K;
Show that by experimental data repeatedly it is better relatively that K gets 31 o'clock effect, experimental data is not stated tired at this with above-mentioned basic identical.
Index value I=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) mod 2 N
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets; If obtain index value is 32 integer data, and N just is taken as 32 so.
Step 201: this index value is saved in the interim index value array.
Step 202: every other English character string execution in step 100,101 successively all in the pair set
Step 203: described index value array is sorted.
Step 204: store this index value array.
Corresponding to above-mentioned second method, the invention provides the system that is used for storage element, comprise referring to this system of Fig. 5: computing unit (01), acquiring unit (03), storage unit (02) and sequencing unit (04);
Acquiring unit (03) is used to obtain element, and the element that for example obtains is various ways such as English character string;
Computing unit (01) be used for will obtain element generate and element index value one to one; Wherein said hash algorithm is: mod 2 index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets;
The index value that storage unit (02) is used for generating forms the index value array and preserves;
Sequencing unit (04) is used for the index value array sort that will form.
More than two embodiment be that Stored Procedure to element is described, after having formed above-mentioned index value array, can when the element in this array of index, improve the speed of index.
The concrete steps of index are referring to Fig. 3:
Step 300: the index value array that will generate is loaded in the internal memory.
Step 301: the element of input is calculated with specific hash algorithm, obtain and described element corresponding index value.
Specific hash algorithm described herein is identical with the hash algorithm of above-mentioned two embodiment, is not repeated at this.
Step 302: in the index value array, search this index value, if find the described element of expression in this set.
The described method of searching index value of step 302 can be the sequential search method; Under being orderly situation, the index value array can use binary search or the like.
Corresponding to this lookup method, the invention provides a kind of system that is used to search element, referring to Fig. 6, this system comprises: computing unit (01), acquiring unit (03), storage unit (02), search unit (05) and sequencing unit (04);
Acquiring unit (03) is used to obtain element;
Computing unit (01) be used for will obtain element generate and element index value one to one; Wherein said hash algorithm is: mod 2 index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets;
The index value that storage unit (02) is used for generating forms the index value array and preserves;
Searching unit (05) is used for searching a certain index value that computing unit generates in the index value array;
Sequencing unit (04) is used for the index value array sort that will form.
Above-described element is not limited to the English character string, can be various ways such as Chinese character, Japanese character or Korea character, as long as can be applied to specific hash algorithm.
More than the method and system of the media file update prompting based on immediate communication tool provided by the present invention are described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (10)

1, a kind of method of storage element is characterized in that, this method comprises:
All elements uses hash algorithm generation and element index value one to one in will gathering, and forms the index value array; The index value array is preserved;
Wherein said hash algorithm is:
Index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) mod 2 N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets.
2, the method for storage element according to claim 1 is characterized in that, the index value array of preserving is sorted.
According to the method for the arbitrary described storage element of claim 1 to 2, it is characterized in that 3, described Hash key assignments is double figures or 31.
4, a kind of method of searching element is characterized in that, this method comprises:
All elements uses hash algorithm generation and element index value one to one in will gathering, and forms the index value array; The index value array is preserved;
The element that input will be searched, the utilization hash algorithm generates and this element corresponding index value, searches this index value in described index value array;
Wherein said hash algorithm is:
Index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) mod 2 N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets.
5, method of searching element according to claim 4 is characterized in that, the index value array of preserving is sorted.
6, according to the arbitrary described method of searching element of claim 4 to 5, it is characterized in that described Hash key assignments is double figures or 31.
7, a kind of system of storage element is characterized in that, this system comprises: computing unit, acquiring unit, storage unit;
Acquiring unit is used to obtain element;
Computing unit be used for will obtain element generate and element index value one to one; Wherein said hash algorithm is: mod 2 index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets;
The index value that storage unit is used for generating forms the index value array and preserves.
8, the system of storage element according to claim 7 is characterized in that, this system further comprises sequencing unit;
Sequencing unit is used for the index value array sort that will form.
9, a kind of system that is used to search element is characterized in that, this system comprises: computing unit, acquiring unit, storage unit, search the unit;
Acquiring unit is used to obtain element;
Computing unit be used for will obtain element generate and element index value one to one; Wherein said hash algorithm is: mod 2 index value=(((L0*K+L1) * K+L2) * K+L3) * K+...+L (M-1)) N
K is the Hash key assignments that presets;
The coded data of L0......L (M-1) for finding out by the Hash table that presets, L0......L (M-1) is corresponding to each character of element;
N is the storage bit number of the index value that presets;
The index value that storage unit is used for generating forms the index value array and preserves;
Searching the unit is used for searching a certain index value that computing unit generates in the index value array.
10, the system that is used to search element according to claim 9 is characterized in that, this system further comprises sequencing unit;
Sequencing unit is used for the index value array sort that will form.
CN 200610144123 2006-11-27 2006-11-27 Method and system for storing element and method and system for searching element Active CN100476824C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610144123 CN100476824C (en) 2006-11-27 2006-11-27 Method and system for storing element and method and system for searching element

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610144123 CN100476824C (en) 2006-11-27 2006-11-27 Method and system for storing element and method and system for searching element

Publications (2)

Publication Number Publication Date
CN1949221A CN1949221A (en) 2007-04-18
CN100476824C true CN100476824C (en) 2009-04-08

Family

ID=38018736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610144123 Active CN100476824C (en) 2006-11-27 2006-11-27 Method and system for storing element and method and system for searching element

Country Status (1)

Country Link
CN (1) CN100476824C (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100561482C (en) * 2008-01-29 2009-11-18 北京北方烽火科技有限公司 A kind of implementation method of embedded system data base
CN101996217B (en) * 2009-08-24 2012-11-07 华为技术有限公司 Method for storing data and memory device thereof
CN103077199B (en) * 2012-12-26 2016-07-13 北京思特奇信息技术股份有限公司 A kind of file resource Search and Orientation method and device
CN105159987B (en) * 2015-08-31 2019-03-29 深圳市茁壮网络股份有限公司 A kind of storage of data, lookup method and device
CN107357632A (en) * 2017-07-17 2017-11-17 郑州云海信息技术有限公司 A kind of order line analysis method and device
CN108629049A (en) * 2018-05-14 2018-10-09 芜湖岭上信息科技有限公司 A kind of image real-time storage and lookup device and method based on hash algorithm
CN111814003B (en) * 2019-04-12 2024-04-23 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for establishing metadata index
CN111737264A (en) * 2020-07-20 2020-10-02 智者四海(北京)技术有限公司 Information processing method and system

Also Published As

Publication number Publication date
CN1949221A (en) 2007-04-18

Similar Documents

Publication Publication Date Title
CN100476824C (en) Method and system for storing element and method and system for searching element
US10579661B2 (en) System and method for machine learning and classifying data
US10521441B2 (en) System and method for approximate searching very large data
CN104679778B (en) A kind of generation method and device of search result
Munro et al. Succinct representations of permutations and functions
CN102184205B (en) Based on the Multi-Pattern Matching method of easily extensible precision chaos Hash
US20130141259A1 (en) Method and system for data compression
Larsson et al. Faster suffix sorting
CN1152365A (en) Method for storing and retrieving data and memory arrangement
CN108304409B (en) Carry-based data frequency estimation method of Sketch data structure
CN104331269A (en) Executable code compression method of embedded type system and code uncompressing system
CN106326475A (en) High-efficiency static hash table implement method and system
CN103345496A (en) Multimedia information searching method and system
CN104809161B (en) A kind of method and system that sparse matrix is compressed and is inquired
CN110059129A (en) Date storage method, device and electronic equipment
WO2023143095A1 (en) Method and system for data query
JP2009512950A (en) Architecture and method for efficiently bulk loading Patricia Tri
Kang et al. PIM-trie: A Skew-resistant Trie for Processing-in-Memory
CN108932738B (en) Bit slice index compression method based on dictionary
CN107798117B (en) Data storage and reading method and device
Vassilakopoulos et al. Dynamic inverted quadtree: A structure for pictorial databases
Gupta et al. A framework for dynamizing succinct data structures
CN110929160A (en) Method for optimizing system sequencing result
Arseneau et al. STILT: Unifying spatial, temporal and textual search using a generalized multi-dimensional index
Hentschel et al. Entropy-Learned Hashing: 10x Faster Hashing with Controllable Uniformity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING KINGSOFT OFFICE SOFTWARE CO., LTD.

Free format text: FORMER OWNER: BEIJING JINSHAN SOFTWARE CO., LTD.

Effective date: 20140312

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100083 HAIDIAN, BEIJING TO: 100085 HAIDIAN, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20140312

Address after: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road

Patentee after: Beijing Kingsoft WPS Office Co., Ltd.

Address before: 100083, Beijing, Haidian District No. 238 North Fourth Ring Road, No. 20, Bai Yan building

Patentee before: Beijing Jinshan Software Co., Ltd.

C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road

Patentee after: Beijing Kingsoft office software Limited by Share Ltd

Address before: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road

Patentee before: Beijing Kingsoft WPS Office Co., Ltd.