CN102629274B - Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure - Google Patents

Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure Download PDF

Info

Publication number
CN102629274B
CN102629274B CN201210075876.3A CN201210075876A CN102629274B CN 102629274 B CN102629274 B CN 102629274B CN 201210075876 A CN201210075876 A CN 201210075876A CN 102629274 B CN102629274 B CN 102629274B
Authority
CN
China
Prior art keywords
leaf
ciphertext
index
participle
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210075876.3A
Other languages
Chinese (zh)
Other versions
CN102629274A (en
Inventor
霍林
黄保华
胡和平
覃海生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN201210075876.3A priority Critical patent/CN102629274B/en
Publication of CN102629274A publication Critical patent/CN102629274A/en
Application granted granted Critical
Publication of CN102629274B publication Critical patent/CN102629274B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an index update method for a ciphertext full-text searching system based on a dynamic-succeed tree- index structure. The method comprises the followings: add operation, delete operation and modify operation, and the updating granularity is in a file section grade. The adding operation comprises the followings: building leaf information in a relative position for a new added text; deciphering a leaf information set influenced by the added text; inserting new built leaf information into an original index; only altering the related position of the front drive leaf of the added text during the inserting process, so as to enable the position to point to the leaf position of the first character of the added text, and meanwhile writing the original related position value of the front drive leaf into the related position of the last character leaf of the added text; and after each insertion of the new position information, judging the length of the leaf information set, if the length is larger than a preset value, leaf information set division is performed; and encrypting the obtained leaf information. After the adoption of the method, the system is enabled to achieve index creation and dynamic update safely and efficiently under a ciphertext state.

Description

The index updating method of the ciphertext full-text search system based on dynamic descendence tree index structure
The application is to be dividing an application of May 31, application number in 2010 are 201010187384.4, invention and created name is " ciphertext full-text search system " Chinese invention patent application the applying date.
Technical field
The invention belongs to information retrieval and information security field, be specifically related to a kind of index updating method of the ciphertext full-text search system based on dynamic descendence tree index structure.
Background technology
Along with the fast development of computing machine with infotecies such as communicating by letter, the various application such as electronic medium increase sharply, and traditional industries informationization is rapid deployment also, and industry and scientific research datamation and semi-automatic generation, accumulate various data in a large number; Making rapid progress of Development of storage technology also makes the growth of data total amount more and more violent on the other hand.According to statistics, since the 1980s, whole world informational capacity increases with exponential.Can say, the speed that nowadays information produces is far longer than the mankind these information is carried out to the fully speed of digestion.People carry out the needed quantity of information of effective decision-making to problem and also greatly increase, and this just makes user in face of mass data, want to find the information oneself being satisfied with to become more and more difficult.Under such background, if not by means of effective search mechanism, the effect that quantity of information is excessive produced is the same with the effect that can look into without information.
Full text information retrieval technology results from the U.S. of the 1950's the earliest.Nineteen fifty Calvin N.Mooers has initiated this term of information retrieval, Luhn in 1958 have proposed basic theories and the method for statistical information retrieval, nineteen sixty Marson and Kuhns have proposed the probability model of information retrieval, within 1986, Gerard has founded information retrieval vector space model, nineteen sixty-eight Rocchio and Salton have proposed the method for query expansion jointly, and the DIALOG system of Lockheed company release in 1972 is the first commercial online information inquiry service in the world.From eighties of last century nineties, along with the explosive growth of the successful research and development of cheap mass data storage device, the particularly birth of Internet technology and the thing followed network information, make information retrieval technique enter a brand-new developing period.In this period, representative theoretical result comprises potential semantic indexing technology, Bayesian network and Neural Network Technology.
Global search technology has developed comparatively ripely, and external full-text search software has obtained application earlier.Although the principle of Chinese and western languages full-text search is consistent, the feature of Chinese itself makes Chinese Full-Text Retrieval System than the complexity of western language.The research of domestic global search technology started from about 1987, at present market share surpasses 90% at home, representative text retrieval system is as the TRS of Yi Beibao letter company exploitation, support conceptual retrieval, multimedia data retrieval and the retrieval of former formatted file, support mass memory structural data to process, and the database interface of WWW is provided.
Index model is the core technology of information retrieval, and it is the prerequisite of carrying out information retrieval that the pending data of information retrieval system are carried out to tissue efficiently, retrieval rate and the storage space of index stores structure influence system.Current main index model has: signature file, inverted file, bitmap, Pat tree, Pat array and IRST etc.First three plants index model is in fact all document to be regarded as to the set of index entry, and index data must have document-index entry structure, thereby is difficult to realize complex query.Pat tree and Pat array are regarded index data as the stack of one group of semiinfinite string, can realize complex query, but the shortcoming such as Existential Space expense is large.IRST model is the index model of processing a kind of novelty of the semiinfinite character strings such as Chinese, it creates, and efficiency is high, inquiry velocity is fast, the same with Pat tree have the feature that query function is complete and set the little serial advantages such as die swell ratio than Pat, but also have the deficiency of the aspects such as storage organization, dynamic index renewal.
At present in the full-text search field based on ciphertext, only has a small amount of research both at home and abroad, in the result obtaining by each famous large database and search engine retrieving, in Chinese ciphertext full-text search field, the Li Xin only finding by Chinese Academy of Sciences's computer network research centre delivers in correlative study achievements such as Chinese applications for a patent for invention " distributed cryptograph full-text retrieval system " (application number is 200910062129.4) in Chinese application for a patent for invention " ciphertext global search technology " (application number is 200410070113.5) and the Central China University of Science and Technology.The former invention is the transformation to global search technology, has almost retained most of technology of full-text search, only the index terms of index file is encrypted; The latter has realized the full text information retrieval under ciphertext condition, has guaranteed the safe retrieval of sensitive data, has high safety, the feature that execution efficiency is high, its index file is inverted file, but can not carry out ciphertext substring query and the inquiry of potential participle, and can not carry out ciphertext and dynamically update.
Summary of the invention
Object of the present invention, is to provide a kind of establishment, retrieval and index updating method of setting the ciphertext full-text search system of index structure based on dynamic descendence.
Concrete technical scheme comprises following content:
One, the ciphertext full-text index creation method in ciphertext full-text search system, comprises the following steps:
(1) the concerning security matters urtext document original text of user being submitted to is converted into plain text, extracts theme, text and other adeditive attributes in original text file, and forms document summary;
(2) theme in original text file, text, adeditive attribute are carried out to word segmentation processing, and extract proper vector;
(3) plain text document obtaining in step (1), document summary are encrypted respectively;
(4) the document ciphertext distributed store obtaining in step (3) is arrived to corresponding document ciphertext storehouse, the document summary ciphertext distributed store obtaining in step (3) is arrived to corresponding document summary ciphertext storehouse;
(5) participle, the proper vector that in step (2), obtain are encrypted respectively;
(6) the proper vector ciphertext obtaining in step (5) is stored into proper vector ciphertext storehouse;
(7) the ciphertext participle obtaining in step (5) is distributed to each index server;
(8) each index server obtains corresponding participle position ciphertext according to the ciphertext index in classification in step (7);
(9) the participle position ciphertext obtaining in step (8) is decrypted;
(10) pass the participle position after deciphering in step (9) back respective index server;
(11) index server creates index according to participle position;
(12) index obtaining in step (11) is encrypted;
(13) store the ciphertext index obtaining in step (12) into corresponding ciphertext index storehouse;
The ciphertext full-text search method corresponding with above-mentioned ciphertext full-text index creation method, comprises the following steps:
(1) term/string of user being submitted to carries out participle and makes query expansion;
(2) the expansion minute word set obtaining in step (1) is encrypted;
(3) the expansion ciphertext minute word set obtaining in step (2) is distributed to each index server with broadcast mode;
(4) each index server is retrieved;
(5) the document code collection that each index server of systematic collection returns;
(6) system reads respective document proper vector ciphertext according to the document code collection obtaining in step (5);
(7) the proper vector ciphertext obtaining in system decrypts step (6);
(8) utilize the proper vector obtaining in step (7) to sort to document code collection;
(9) according to the orderly document code collection obtaining in step (8), read respective document summary ciphertext;
(10) by document summary decrypt ciphertext;
(11) the document summary after deciphering is shown to user;
(12) system is obtained corresponding document ciphertext according to user's selection;
(13) by document decrypt ciphertext;
(14) document after deciphering is shown to user.
Two, the ciphertext full-text index creation method based on dynamic descendence tree index structure
Described dynamic descendence tree index structure is ciphertext dynamic descendence tree index structure; Described ciphertext dynamic descendence tree index is a forest, and described forest is comprised of subtree; The structure of each stalk tree includes the ciphertext of tree root, the ciphertext of leaf, and, the ciphertext of the leaf information set being formed by pseudo-document code, leaf position, leaf relative position, leaf mutation;
Described tree root, is used in reference to the participle that is positioned at tree root;
Described leaf, tree root is follow-up, is used in reference to the participle that is positioned at leaf;
Described pseudo-document code is an element of pseudo-document code group;
Described leaf position, is used in reference to the position of current leaf in document;
Described leaf relative position, is used in reference to the pointer of the follow-up participle that points to current leaf;
Described leaf mutation, is used in reference to a string character string that replaces former leaf;
The concrete method for building up of described ciphertext dynamic descendence tree index is: tree root, leaf in each stalk tree are encrypted respectively, pseudo-document code, leaf position, leaf relative position, leaf mutation are carried out to bulk encryption, can obtain described ciphertext dynamic descendence tree index.
This ciphertext full-text index creation method based on dynamic descendence tree index structure, can adopt aforesaid ciphertext full-text index creation method, it is characterized in that,
1) participle in above-mentioned steps (5) adopts following encryption method:
A, according to participle block encryption information table, participle is expressly divided into groups, obtain cipher generating parameter and the cryptographic algorithm numbering of this participle, and send to key management unit;
B, key management unit calculate participle packet key according to cipher generating parameter, number in cryptographic algorithm storehouse and extract cryptographic algorithm according to cryptographic algorithm simultaneously;
C, according to resulting participle packet key and cryptographic algorithm, participle is encrypted.
2) ciphertext index in above-mentioned steps (11) creates and adopts with the following method:
A, to each ciphertext participle, according to document code information table, choose at random a pseudo-document code and replace the document code that former ciphertext participle carries;
B, with the forerunner of ciphertext participle, in ciphertext index storehouse, search tree root, with ciphertext participle, itself search leaf, obtain corresponding leaf information set;
The leaf information set obtaining in c, decryption step b, is inserted into corresponding leaf information by the positional information of ciphertext participle and concentrates;
If the leaf information set length after d inserts surpasses limit value, leaf information set is divided, when running into terminal symbol, represent to handle in full;
E, unencrypted leaf information set in this index is encrypted;
Leaf information set in above-mentioned steps d, e is divided and encryption method is:
If a) length of leaf information set is greater than leaf information set average length, be divided into several without common factor subset, each subset length is in the scope of default;
B) be the random spanning tree leaf mutation respectively of each subset except first leaf information set subset, make the corresponding leaf mutation of each subset or leaf;
C), according to ciphertext participle, in the enciphered message table of participle position, obtain cipher generating parameter and cryptographic algorithm numbering, and send key management unit to;
D) key management unit calculates participle location key according to cipher generating parameter, extracts cryptographic algorithm according to cryptographic algorithm numbering in cryptographic algorithm storehouse simultaneously;
E) according to d) in the participle position cryptographic algorithm that obtains, and participle location key is expressly encrypted leaf information set.
From above-mentioned steps, described leaf information set encryption method can be carried out grouping management to described leaf information set, and every group of information appliance has different cryptographic algorithm and the key of certain Cipher Strength to be encrypted; Leaf information set length to high frequency words is carried out equalization processing, for high frequency words produces leaf mutation at random, its leaf information set is divided into a plurality of leaf information set subsets and encrypts, and leaf mutation makes the dynamic change of ciphertext participle quantity, prevents statistical attack.
Three, this ciphertext full-text search method based on dynamic descendence tree index structure, adopted aforesaid ciphertext full-text search method, it is characterized in that, the retrieval in above-mentioned steps (4), with term " qs1; qs2 ..., qsi; ...; qsn " be example explanation, n is the participle number of term/string, its step is as follows:
1) the participle number of judgement term/string: if n=1 proceeds to 2); If n=2, proceeds to 3); Otherwise proceed to 4);
2) judge whether tree root table exists this participle; If exist, retrieve hit results collection for the set of the leaf position of the leaf information set of all leaves in the leaf table of this participle; Retrieval finishes;
3) judge whether tree root table exists qs1; Whether the leaf table that judges qs1 there is qs2; If qs1, qs2 exist, retrieve hit results and integrate the set as the leaf position of the leaf information set of qs2; Retrieval finishes;
4) judge whether tree root table exists qs1, if existed, judge whether the leaf table of qs1 exists qs2;
5) if qs1, qs2 exist, obtain the set of relative position of the leaf information set of qs2, be designated as Urpi;
6) with 3≤j≤n, circulate, judge whether the leaf table of qsi exists qsj; If exist, obtain the set of leaf position of the leaf information set of qsj, be designated as spj; The common factor that calculates spj and Urpi, the leaf position subset of qsj, is designated as Uspj; The subclass of relative position that obtains the leaf information set of qsj, is designated as Urpj; Circulation just obtains PRELIMINARY RESULTS Urpj n-2 time at most;
7) retrieval hit results integrates the set as the leaf position of all leaf information sets in Urpj; Retrieval finishes.
Four, this ciphertext full-text index update method based on dynamic descendence tree index structure, is characterized in that, has adopted and has upgraded the ciphertext dynamic descendence tree index updating method that granularity is document local level, and the method includes increases operation, deletion action and retouching operation;
1) described increase operation, its concrete steps are as follows:
A, with relative position, set up leaf information for the new text adding;
In b, decrypted original index, added the leaf information set of the leaf that text affects;
C, newly-established leaf information is inserted in former index; In this insertion process, only to adding the forerunner's of text leaf relative position, revise, make it point to the initial character leaf position of adding text, the original relative position value of forerunner's leaf is write to the trailing character leaf relative position that adds text simultaneously;
D, insert after new positional information, judgement leaf information set length, if be greater than setting value, carries out the division of leaf information set at every turn;
E, the leaf information set obtaining in steps d is encrypted;
2) described deletion action, its concrete steps are as follows:
If a delete position relates to a plurality of leaf information sets, first by its deciphering and be merged into a leaf information set;
B, in the position that needs text suppression, directly revise the forerunner's of deletion leaf relative position;
C, deletion need the positional information of deletion;
D, the leaf information set after deleting is carried out to length equalization processing, encrypt and deposit;
3) described retouching operation, realizes in the mode of text suppression and interpolation.
The ciphertext dynamic descendence tree index updating method of ciphertext dynamic descendence that ciphertext full-text search system of the present invention provides based on us tree index structure, participle group technology, document local level, realized index creation, index safely and efficiently dynamically update and ciphertext state under full-text search and substring query.Compare with existing ciphertext full-text search system, the present invention has following advantage:
(1) high security: participle group technology has guaranteed the security of index terms.Participle in dynamic descendence tree index structure is encrypted, shielded the real semanteme of participle, be updated periodically participle ciphertext and make assailant become invalid to the ciphertext participle analysis in index file vocabulary.Leaf information set is encrypted, shielded participle positional information.Thereby prevented by pseudo-document code group the content that assailant pieces together out one piece of ciphertext document by obtaining the positional information of ciphertext participle.Leaf information set is divided, and the leaf information subset obtaining and leaf mutation binding are encrypted, both prevented ciphertext length statistical attack, further guaranteed again the security of participle.During retrieval, do not need decrypting ciphertext index terms, only decipher the leaf information set needing in retrieving, unwanted leaf information set is still kept to ciphertext state.
(2) high efficiency and the recall precision of creating: through single pass, can and create index tree to original text participle.The participle that is positioned at tree root forms tree root table, and the participle that is positioned at leaf forms leaf table, and the positional information of leaf participle in former document forms leaf information set.The corresponding leaf table of each tree root list item, the corresponding leaf information set of each leaf list item, the list item of tree root table and leaf table HashTree with lexicographic order in internal memory stores, and during retrieval, deciphers as required, has improved and has searched speed.
(3) height of index upgrade is dynamic: the renewal granularity that the present invention proposes is the place that the indexes dynamic update method of document local level can be upgraded at needs, directly node is increased, deletes, changes operation, do not need headspace, also without additional index, realized dynamically updating in real time of index file.
(4) realized ciphertext substring query: native system model utilizes leaf position and leaf relative position to record the position relationship of string substring to be matched, at index terms, under DecryptDecryption state, do not realize substring query, not only guaranteed the safety in ciphertext index storehouse, the expense of also having saved ciphertext substring query simultaneously.
Accompanying drawing explanation
Fig. 1 is the system assumption diagram of an embodiment of ciphertext full-text search system of the present invention.
Fig. 2 is that the structure of an embodiment of ciphertext full-text search system of the present invention forms schematic diagram.
Fig. 3 is the establishment schematic diagram in ciphertext full-text search system document ciphertext storehouse of the present invention and document summary ciphertext storehouse.
Fig. 4 is the structural representation that the present invention is based on an embodiment of the ciphertext dynamic descendence tree index that the ciphertext full-text search system of dynamic descendence tree index structure adopts.
Fig. 5 is the index creation process schematic diagram of an embodiment that the present invention is based on the ciphertext full-text search system of dynamic descendence tree index structure.
Fig. 6 is the schematic diagram of ciphertext full-text search system searching ciphertext process of the present invention.
Embodiment
Below in conjunction with drawings and Examples, ciphertext full-text search system of the present invention and ciphertext full-text search system and principle of work thereof based on dynamic descendence tree index structure are further described.
As shown in Figure 1, system of the present invention comprises: urtext processing module 100, word-dividing mode 200, encrypting module 300, document ciphertext memory module 400, ciphertext index module 500, searching ciphertext module 600, result for retrieval processing module 700 and system management module 800.
System Working Principle step is as follows:
(1) user realizes after secure log by system management module 800, and it is to carry out to create index file or carry out search function that system judgement user selects, if retrieval enters the 15th step;
(2) the concerning security matters urtext document original text that system is submitted user to is converted into plain text, extracts theme, text and other adeditive attributes in original text file, and forms document summary;
(3) system is carried out word segmentation processing to theme, text, adeditive attribute, and extracts proper vector;
(4) system is encrypted respectively the plain text document obtaining in step (2), document summary;
(5) system, the document ciphertext distributed store obtaining in step (4) to corresponding document ciphertext storehouse, arrives corresponding document summary ciphertext storehouse the document summary ciphertext distributed store obtaining in step (4);
(6) system is encrypted respectively the participle, the proper vector that obtain in step (3);
(7) system stores proper vector ciphertext storehouse into the proper vector ciphertext obtaining in step (6).
(8) the ciphertext participle obtaining in step (6) is distributed to each index server;
(9) each index server obtains corresponding participle position ciphertext according to the ciphertext index in classification in step (8);
(10) the participle position ciphertext obtaining in step (9) is decrypted;
(11) pass the participle position after deciphering in step (10) back respective index server;
(12) index server creates index according to participle position;
(13) index obtaining in step (12) is encrypted;
(14) store the ciphertext index obtaining in step (13) into corresponding ciphertext index storehouse;
(15) user submits term/string to;
(16) system is carried out participle and is carried out query expansion term/string that user is submitted to;
(17) system is encrypted the expansion minute word set obtaining in step (16);
(18) system is distributed to each index server to the expansion ciphertext minute word set obtaining in step (17) with broadcast mode;
(19) each index server is retrieved;
(20) the document code collection that each index server of systematic collection returns;
(21) system reads respective document proper vector ciphertext according to the document code collection obtaining in step (20);
(22) the proper vector ciphertext obtaining in decryption step (21);
(23) system utilizes the proper vector obtaining in step (22) to sort to document code collection;
(24) system reads respective document summary ciphertext according to the orderly document code collection obtaining in step (23);
(25) by document summary decrypt ciphertext;
(26) the document summary after deciphering is shown to user;
(27) according to user's selection, obtain corresponding document ciphertext;
(28) document ciphertext is decrypted;
(29) document after deciphering is shown to user.
(2) below in conjunction with Fig. 2, to each module, the effect in above-mentioned steps is described in more detail respectively:
1, urtext processing module 100:
The pre-service of text mainly includes two aspects: physically, be to document electronic disposal in kind; In logic, be that electronic document normalization and classification are processed.As shown in Figure 2, this urtext processing module 100 includes conversion unit 110, extraction unit 120 and summary unit 130, wherein, conversion unit 110 is realized electronic paper document, make exactly paper document after the modes such as overscanning, obtain manageable electronic original document, and, need electronic document unification to be processed is converted into plain text document; Extraction unit 120 is responsible for the document information in above-mentioned plain text document to extract, and the information of extraction includes but not limited to theme (title, summary, key word), text, adeditive attribute (author, author unit, source, time); Summary unit 130 is organized into document summary by theme, summary, author, time, source etc.
2, word-dividing mode 200:
Word-dividing mode 200 is carried out participle and extracts proper vector for document subject matter, text and adeditive attribute etc. that described urtext processing module is provided, and, for term/string that described searching ciphertext module is provided, carry out participle and query expansion.Wherein, 210 pairs of the participle unit theme sending, text, adeditive attribute, term/string etc. carry out participle; Proper vector unit 220 extracts file characteristics word from word segmentation result, forms proper vector; 230 pairs of query expansion unit term/string participle carries out query expansion.
3, encrypting module 300:
Encrypting module 300, comprises encryption and decryption function is provided, and specifically includes file encryption unit 310, participle ciphering unit 320, participle position ciphering unit 330:
(1) file encryption unit 310, responsible plain text document, the document summary that urtext pretreatment module 100 is sent, and the proper vector that word-dividing mode 200 sends is encrypted; In the document ciphering unit 310, include file encryption information table, cryptographic calculation device; Described file encryption information table is encrypted required key and cryptographic algorithm for obtaining to plain text document, document summary, proper vector;
(2) participle ciphering unit 320, and the participle of being responsible for word-dividing mode 200 to send is expressly encrypted.In participle ciphering unit 320, include participle grouping submodule, participle block encryption information table, cryptographic calculation device.Described participle block encryption information table is encrypted required key and cryptographic algorithm for obtaining to participle; In its ciphering process, also need to use key management unit, cryptographic algorithm storehouse.When participle is encrypted, utilize the calculation of parameter participle encryption key in participle block encryption information table, for identical participle provides identical cryptographic algorithm and key, guarantee that the ciphertext that identical participle is encrypted comes to the same thing.Described participle grouping submodule is responsible for participle expressly to divide into groups, its participle group technology comprises that participle grouping creates and two kinds of operations are upgraded in participle grouping, wherein participle grouping establishment is that the participle from word-dividing mode 200 is expressly carried out to random packet, it is to make each participle within the different cycles, belong to different participle groupings that participle grouping is upgraded, and strengthens the randomness of participle grouping.
1) participle grouping creates
When participle grouping creates, first participle ciphering unit 320 receives the participle plaintext from word-dividing mode 200, according to participle block encryption information table, participle is expressly divided into groups, and the participle grouping information that processing is obtained sends to key management unit; Described key management unit extracts cryptographic algorithm according to participle grouping information in cryptographic algorithm storehouse, and calculates participle packet key, then participle is encrypted, and the ciphertext participle obtaining is sent to ciphertext index module 500.
2) participle grouping is upgraded
Renewal between being divided into groups by the current participle grouping of the system triggers next participle adjacent with it.Because key management unit is to utilize participle grouping information to generate participle packet key, so system can trigger the renewal of index terms ciphertext simultaneously.In order to prevent index terms ciphertext than upgrading more frequently, the cycle of upgrading between minute phrase is set to the integral multiple in the cycle of participle grouping renewal.
When participle grouping is upgraded, the participle in the participle grouping that two adjacent needs of the random exchange of participle grouping submodule upgrade, obtains and preserves new participle grouping information; According to participle block encryption information table, search cryptographic algorithm storehouse, obtain participle block encryption algorithm and the new participle block encryption algorithm in next cycle that two adjacent participles divide into groups current; According to new and old participle grouping and participle grouping current period, calculate respectively the current participle packet key of two adjacent participle groupings and the new participle packet key in next cycle; According to new and old key to cryptographic algorithm pair, encrypt each participle in participle grouping, obtain new and old participle ciphertext to collection; According to new and old participle ciphertext pair, new and old participle ciphertext; Upgrade and finish.
(3) participle position ciphering unit 330: include participle position enciphered message table, cryptographic calculation device; Described participle position enciphered message table is encrypted required key and cryptographic algorithm for obtaining to participle positional information;
Described file encryption unit 310, participle ciphering unit 320, participle position ciphering unit 330 also need to use key management unit, cryptographic algorithm storehouse in ciphering process;
4, document ciphertext memory module 400
Document ciphertext memory module 400 for distributed store, document summary ciphertext and document ciphertext are provided, include document ciphertext distribution proxy module 410 and distribution type file ciphertext administration module 420.Document ciphertext distribution proxy module 410 is distributed to corresponding document ciphertext server according to document standard code by document ciphertext and document summary ciphertext.Distribution type file ciphertext administration module 420 is used for management document ciphertext and document summary ciphertext, can be comprised of several document ciphertext servers, document summary ciphertext storehouse and document ciphertext storehouse.Document ciphertext server is responsible for affiliated document summary library and document ciphertext storehouse to carry out access.
Below in conjunction with Fig. 3, introduce the constructive process in document ciphertext storehouse and document summary ciphertext storehouse:
(1) 100 pairs of documents of urtext processing module carry out pre-service, generate document code, document standard code, plain text document, document summary;
(2) plain text document and document summary are encrypted; Ciphering process is as follows: encrypting module 300 is received one piece of plain text document and document summary, cryptographic algorithm storehouse to encrypting module 300 obtains a random file encryption algorithm, key management unit by encrypting module 300 generates file encryption key, and key management unit is to the newly-increased record of file encryption information table simultaneously; Use this algorithm and file encryption key to encrypt plain text document and document summary, obtain document ciphertext and document summary ciphertext, by encrypting module 300, document ciphertext and document summary ciphertext are passed to document ciphertext memory module 400;
(3) the document ciphertext distribution proxy module 410 in document ciphertext memory module 400 is distributed to corresponding document ciphertext server according to document standard code to document ciphertext and document summary ciphertext; Document ciphertext server is stored in corresponding document ciphertext storehouse and document summary ciphertext storehouse document ciphertext and document summary ciphertext.
5, ciphertext index module 500:
Ciphertext index module 500 includes ciphertext participle distribution proxy module 510, distributed cryptograph index management module 520.Wherein, ciphertext participle distribution proxy module 510 is responsible for ciphertext participle to be distributed on index server; Distributed cryptograph index distribution management module 520 includes at least one index server and several ciphertext index storehouses, by each ciphertext index storehouse of index service management; Each index server the ciphertext participle and the participle positional information that from described encrypting module, provide is provided and creates index, and storage ciphertext index is to corresponding ciphertext index storehouse; Index server includes document code information table; Described document code information table is for recording the pseudo-document code group that document is corresponding; Described pseudo-document code group is a set that one-to-many mapping forms of document code, by system, is generated.
(1) ciphertext dynamic descendence tree index structure
In ciphertext full-text search system of the present invention, also application has the ciphertext dynamic descendence tree index structure that the inventor proposes first, below in conjunction with Fig. 4 and example, further introduces:
Ciphertext dynamic descendence tree index is a forest, and the structure of its each stalk tree comprises the ciphertext of tree root, the ciphertext of leaf, and the ciphertext of the leaf information set that forms of pseudo-document code, leaf position, leaf relative position, leaf mutation;
Tree root, is used in reference to the participle that is positioned at tree root;
Leaf, tree root is follow-up, is used in reference to the participle that is positioned at leaf;
Pseudo-document code is an element of pseudo-document code group;
Leaf position, is used in reference to the position of current leaf in document;
Leaf relative position, is used in reference to the pointer of the follow-up participle that points to current leaf;
Leaf mutation, is used in reference to a string character string that replaces former leaf;
The index structure of subtree can be expressed as: tree root<leaf ([pseudo-document code, { (leaf position, leaf relative position) }], leaf mutation)>, wherein different between the leaf ciphertext under same tree root; [pseudo-document code, { (leaf position, leaf relative position) }] be a leaf information, by pseudo-document code, list of locations, formed, pseudo-document code is used for disturbing the real numbering of document, to prevent that assailant from obtaining the ciphertext of piecing together out this piece of document after positional information; The corresponding leaf information set of leaf and a leaf mutation, leaf mutation allows for sky, but leaf information set can not be sky.
(2) ciphertext dynamic descendence tree index creation
During ciphertext dynamic descendence tree index creation, as shown in Figure 5, theme, text and additional information that word-dividing mode 200 receives from urtext processing module 100, carry out word segmentation processing by participle unit 210 to these information; Participle unit 210 sends the participle obtaining to encrypting module 300, by 320 pairs of participles of its participle ciphering unit, is encrypted; Participle ciphering unit 320 sends ciphertext participle to ciphertext index module 500, by ciphertext participle distribution proxy module 510, extract document standard code corresponding to ciphertext participle, according to document standard code, ciphertext participle is distributed to index server corresponding in distributed cryptograph index management module 520; Index server is that each newly-increased document code distributes pseudo-document code group, and is recorded to document code information table; Search current available index database, and the leaf and the leaf information set that affected by newly-increased document are sent to encrypting module 300; After the leaf information set deciphering of 330 pairs of leaves that receive of participle position ciphering unit, send it back corresponding index server; Index server is chosen at random pseudo-document and is numbered respective document establishment index; Index server sends to participle position ciphering unit 330 by corresponding leaf information set; The leaf information set that 330 pairs of participle position ciphering units receive is encrypted, and passes index server after encryption back; Index server stores the ciphertext index of receiving into corresponding ciphertext index storehouse;
In ciphertext dynamic descendence tree index structure, all root vertexes form a tree root table, and all leaf nodes under identical tree root form a leaf table; The corresponding leaf table of each tree root list item, the corresponding leaf information set of each leaf list item and a leaf mutation.It creates algorithm as shown in algorithm 1.
Algorithm 1: ciphertext dynamic descendence tree index creation algorithm
Input: ciphertext participle
Output: ciphertext index file
For (each ciphertext participle)
{
Ciphertext participle is inserted to index;
If (leaf information set length is greater than average length)
Carry out the division of leaf information set;
}
Encrypt leaf information set deposit;
Algorithm finishes.
(3) leaf information set is encrypted
For guaranteeing the safety of index, leaf information set is encrypted to the positional information of shielding leaf participle in former document.Because the leaf information set length of high frequency words may be long, easily allow assailant obtain high frequency words relevant information by adding up ciphertext length, so let us adopt the method for leaf information set division.The method equilibrium leaf information set length, can prevent ciphertext length statistical attack.Several definition are described below:
Leaf information set length a: byte number that leaf information set comprises;
Leaf information set average length: in ciphertext dynamic descendence tree index, the mean value of the byte number that all leaf information set comprises;
Leaf information set is divided: in ciphertext dynamic descendence tree index, if the length of any one leaf information set is greater than leaf information set average length, be divided into a plurality of without common factor subset, each subset length is in the scope of default, and be the random spanning tree leaf mutation respectively of each subset except first leaf information set subset, make the corresponding leaf mutation of each subset or leaf.
Further describe the ciphering process of leaf information set below:
1) to a leaf information set expressly, participle position ciphering unit 330 is to participle location key corresponding to the key management unit request in encrypting module 300 and participle position cryptographic algorithm numbering;
2), according to ciphertext participle, in the enciphered message table of participle position, obtain cipher generating parameter and participle position cryptographic algorithm numbering, and send key management unit to;
3) participle position ciphering unit 330 obtains corresponding participle position cryptographic algorithm according to participle position cryptographic algorithm numbering from cryptographic algorithm storehouse, and by participle location key, leaf information set is expressly encrypted.
(4) index ciphertext is upgraded and is comprised the renewal of participle ciphertext and the renewal of leaf information set ciphertext:
The renewal of participle ciphertext: by system triggers, press participle grouping and progressively upgrade, take ciphertext index storehouse to return to that successfully to upgrade replacement Updating time after response be end.Its detailed process is shown in the renewal of participle grouping in participle group technology.
Leaf information set ciphertext is upgraded: by system triggers, press participle grouping and progressively upgrade, take ciphertext index storehouse to return to that successfully to upgrade the rear replacement Updating time of response be end.First key management unit receives enciphered message corresponding to leaf information set of upgrading from needing in the enciphered message table of participle position; According to enciphered message, calculate participle packet key, new and old participle location key; Search cryptographic algorithm storehouse, obtain participle block encryption algorithm and new and old participle position cryptographic algorithm; Cryptographic calculation device is encrypted and obtains ciphertext participle participle; According to ciphertext participle, in ciphertext index storehouse, search corresponding leaf information set ciphertext; Participle position ciphering unit 330 to leaf information set decrypt ciphertext, obtains leaf information set expressly according to old participle location key and algorithm; By new participle location key and algorithm, leaf information set is expressly encrypted, obtains new leaf information set ciphertext; By new leaf information set ciphertext, replace the old leaf information set ciphertext in ciphertext index storehouse.
(5) upgrade the ciphertext dynamic descendence tree index updating method that granularity is document local level
The index upgrade of ciphertext dynamic descendence tree is by renewal granularity, to be the indexes dynamic update algorithm of document local level under ciphertext state, realizes the high dynamic of ciphertext dynamic descendence tree index, improves the efficiency that ciphertext index upgrades.The upper renewal of ciphertext dynamic descendence tree granularity is that the indexes dynamic renewal of document local level mainly comprises increase, deletion and retouching operation.
Increasing operation is to modify to adding the forerunner's of text leaf relative position, makes it point to the initial character leaf position of adding text, the original relative position value of forerunner's leaf is write to the trailing character leaf relative position that adds text simultaneously.Each insertion after new positional information, compares leaf information set length and the average length of this ciphertext participle, if be less than average length,, without further processing corresponding leaf information set, increase and has operated; If be greater than average length, need the leaf information set of this participle to carry out the division of leaf information set, according to the subset quantity n dividing, produce at random n-1 leaf mutation simultaneously, leaf mutation is inserted in the leaf table of corresponding tree root.
When carrying out deletion action, if need to operate comprising plural leaf information set, first by its deciphering and be merged into a leaf information set.On single leaf information set, revise the forerunner's of deletion leaf relative position, and delete the positional information that needs deletion, then the leaf information set after deleting is carried out to length equalization processing, encrypt and deposit.
Retouching operation: the mode with text suppression and interpolation realizes, can regard the composition operation of deleting and increasing as.
Renewal process is as shown in algorithm 2.
Algorithm 2: upgrading granularity under ciphertext state is the indexes dynamic update algorithm of document local level
Input: document change part
Output: index file after changing
Leaf information set to each participle in index is decrypted;
Contextual location information document change part in old document is saved in interim state table;
If (increase) creates leaf information to change part with relative position and is incorporated in former index again;
else
If (deletion) directly deletes change part in former index;
Else directly revises change part in former index;
Leaf information set to each participle in index is encrypted;
Algorithm finishes.
6, searching ciphertext module 600
Searching ciphertext module 600 is that the information retrieval function of appropriate level is provided for the validated user of system, comprises retrieve statement commit unit 610 and retrieval server 620.Retrieval server 620, for expansion ciphertext minute word set is distributed to each index server with broadcast mode, receives and processes the result that each index server returns, and then the unordered document code collection obtaining is sent to result for retrieval processing module 700.
Searching ciphertext process as shown in Figure 6.In retrieving, validated user input term/string, and submit to word-dividing mode 200 by retrieve statement commit unit 610, the 230 pairs of term/strings in participle unit 210 in word-dividing mode 200 and query expansion unit carry out participle and the expansion minute word set that is expanded, 320 pairs of expansion minute word sets of participle ciphering unit in encrypting module 300 are encrypted and obtain ciphertext participle again, ciphertext participle is distributed to each index server in distributed cryptograph index management module 520 by retrieval server 620 with broadcast mode, by each index server, retrieved, finally by retrieval server 620, collect the document code collection that each index server returns, duplicate removal is also submitted to result for retrieval processing module 700.
The retrieval of dynamic descendence tree index structure, is divided into single index in classification, two index in classification, many index in classifications according to the number of participle, with term " qs1, qs2 ..., qsi ..., qsn " for example illustrates, n is the participle number of term/string, its step is as follows:
1) the participle number of judgement term/string: if n=1 proceeds to 2); If n=2, proceeds to 3); Otherwise proceed to 4);
2) judge whether tree root table exists this participle; If exist, retrieve hit results collection for the set of the leaf position of the leaf information set of all leaves in the leaf table of this participle; Retrieval finishes;
3) judge whether tree root table exists qs1; Whether the leaf table that judges qs1 there is qs2; If qs1, qs2 exist, retrieve hit results and integrate the set as the leaf position of the leaf information set of qs2; Retrieval finishes;
4) judge whether tree root table exists qs1, if existed, judge whether the leaf table of qs1 exists qs2;
5) if qs1, qs2 exist, obtain the set of relative position of the leaf information set of qs2, be designated as Urpi;
6) with 3≤j≤n, circulate, judge whether the leaf table of qsi exists qsj; If exist, obtain the set of leaf position of the leaf information set of qsj, be designated as spj; The common factor that calculates spj and Urpi, the leaf position subset of qsj, is designated as Uspj; The subclass of relative position that obtains the leaf information set of qsj, is designated as Urpj; Circulation just obtains PRELIMINARY RESULTS Urpj n-2 time at most;
7) retrieval hit results integrates the set as the leaf position of all leaf information sets in Urpj; Retrieval finishes.
Above structure shows, this retrieval can realize ciphertext substring query and retrieving is deciphered as required:
(1) ciphertext substring query is the important leverage of ciphertext full-text search system recall ratio.Native system model utilizes leaf position and leaf relative position to record the position relationship of substring to be matched, in index terms ciphertext, under the state of DecryptDecryption, do not realize substring query, not only guaranteed the safety of index database, expense while also having saved ciphertext substring query simultaneously, there is search efficiency high, without undetected feature.
(2) retrieval based on dynamic descendence tree index structure, because this index structure combines vocabulary method and Single Chinese character, can realize deciphering as required, only need be decrypted the leaf information set ciphertext retrieving, and has improved security of system and recall precision.
7, result for retrieval processing module 700
Result for retrieval processing module 700 is responsible for sequence and the demonstration of retrieval set, comprises result for retrieval sequencing unit 720 and result display unit 730.Result for retrieval sequencing unit 720 sorts for the document code collection to unordered.Result display unit 730 comprises that document summary shows and document shows, document summary displaying contents comprises: theme, summary, author, time, source etc.In result for retrieval processing module 700 shown in Fig. 2, from facilitating analytic system workflow and keeping the clean and tidy angle of drawing to consider, also include proper vector ciphertext storehouse 710, but according to those skilled in the art's general knowledge, described proper vector ciphertext storehouse 710, as a database, does not limit and is positioned in described result for retrieval processing module 700.
Result for retrieval processing module 700 receives the concentrated document code collection of result for retrieval, by result for retrieval sequencing unit 720, is extracted file characteristics vector ciphertext and is sent to encrypting module 300 deciphering, then result set is sorted.According to orderly result set, extract corresponding document summary ciphertext and also decipher and return to user, then carry out the extraction of document ciphertext according to the document of user's selection, after encrypting module 300 deciphering, entire chapter document is returned to user.
Result for retrieval sequence and procedure for displaying:
(1) result for retrieval processing module 700 receives unordered result set, reads the proper vector ciphertext of respective document to proper vector ciphertext storehouse 710, and proper vector ciphertext and document code are sent to encrypting module 300;
(2) key management unit in encrypting module 300 reads file encryption information table, dynamically generates file encryption key, and encryption key and file encryption algorithm are passed to file encryption unit 310.310 pairs of file encryption unit proper vector ciphertext is decrypted, and obtains proper vector expressly;
(3) at result for retrieval sequencing unit 720, calculate query string and the proper vector degree of correlation expressly, document code collection is obtained to orderly document code collection by degree of correlation descending sort, and orderly document code collection is sent to document ciphertext distribution proxy module 410;
(4) document ciphertext distribution proxy module 410 sends to corresponding document ciphertext server according to document standard code by document code;
(5) document ciphertext server obtains the document summary ciphertext of each document from document summary ciphertext storehouse, and sends to encrypting module 300 and be decrypted, and its decryption method is as described in step (2);
(6) the document summary after deciphering shows user by result display unit 730;
(7) when user selects after a certain piece of writing document, system gets the document ciphertext from document ciphertext storehouse, deciphers and be shown to user.
8, system management module 800
System management module 800 is used for leading subscriber authority ,Dui department, role, user's essential information and mapping relations between them and carries out maintenance update etc.; This module comprises user message table, role-security table, department information table, user department relation table and user role relation table; Wherein, role-security table is used for recording role's essential information and role-security, can be comprised of 16 authority bit strings.

Claims (1)

1. based on dynamic descendence, set the index updating method of the ciphertext full-text search system of index structure, it is characterized in that:
(1) described dynamic descendence tree index structure is ciphertext dynamic descendence tree index structure; Described ciphertext dynamic descendence tree index is a forest, and described forest is comprised of subtree; The structure of each stalk tree includes the ciphertext of tree root, the ciphertext of leaf, and, the ciphertext of the leaf information set being formed by pseudo-document code, leaf position, leaf relative position, leaf mutation;
Described tree root, is used in reference to the participle that is positioned at tree root;
Described leaf, is the follow-up of tree root, is used in reference to the participle that is positioned at leaf;
Described pseudo-document code is an element of pseudo-document code group;
Described leaf position, is used in reference to the position of current leaf in document;
Described leaf relative position, is used in reference to the pointer of the follow-up participle that points to current leaf;
Described leaf mutation, is used in reference to a string character string that replaces former leaf;
The concrete method for building up of described ciphertext dynamic descendence tree index is: tree root, leaf in each stalk tree are encrypted respectively, pseudo-document code, leaf position, leaf relative position, leaf mutation are carried out to bulk encryption, can obtain described ciphertext dynamic descendence tree index;
(2) this index updating method has adopted and has upgraded the ciphertext dynamic descendence tree index updating method that granularity is document local level, and the method includes increases operation, deletion action and retouching operation;
1) described increase operation, its concrete steps are as follows:
A, with relative position, set up leaf information for the new text adding;
In b, decrypted original index, added the leaf information set of the leaf that text affects;
C, newly-established leaf information is inserted in former index; In this insertion process, only to adding the forerunner's of text leaf relative position, revise, make it point to the initial character leaf position of adding text, the original relative position value of forerunner's leaf is write to the trailing character leaf relative position that adds text simultaneously;
D, insert after new positional information, judgement leaf information set length, if be greater than setting value, carries out the division of leaf information set at every turn;
E, the leaf information set obtaining in steps d is encrypted;
2) described deletion action, its concrete steps are as follows:
If a delete position relates to a plurality of leaf information sets, first by its deciphering and be merged into a leaf information set;
B, in the position that needs text suppression, directly revise the forerunner's of deletion leaf relative position;
C, deletion need the positional information of deletion;
D, the leaf information set after deleting is carried out to length equalization processing, encrypt and deposit;
3) described retouching operation, realizes in the mode of text suppression and interpolation.
CN201210075876.3A 2010-05-31 2010-05-31 Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure Expired - Fee Related CN102629274B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210075876.3A CN102629274B (en) 2010-05-31 2010-05-31 Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210075876.3A CN102629274B (en) 2010-05-31 2010-05-31 Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN 201010187384 Division CN101859323B (en) 2010-05-31 2010-05-31 Ciphertext full-text search system

Publications (2)

Publication Number Publication Date
CN102629274A CN102629274A (en) 2012-08-08
CN102629274B true CN102629274B (en) 2014-01-22

Family

ID=46587534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210075876.3A Expired - Fee Related CN102629274B (en) 2010-05-31 2010-05-31 Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure

Country Status (1)

Country Link
CN (1) CN102629274B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102841945B (en) * 2012-08-27 2015-06-17 广西大学 Dynamic successive tree index cutting method based on query expansion likelihood model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007120625A2 (en) * 2006-04-10 2007-10-25 Sawteeth, Inc Secure and granular index for information retrieval
CN101136013A (en) * 2006-09-01 2008-03-05 北大方正集团有限公司 Method for quick updating data domain in full text retrieval system
CN101655858A (en) * 2009-08-26 2010-02-24 华中科技大学 Cryptograph index structure based on blocking organization and management method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007120625A2 (en) * 2006-04-10 2007-10-25 Sawteeth, Inc Secure and granular index for information retrieval
CN101136013A (en) * 2006-09-01 2008-03-05 北大方正集团有限公司 Method for quick updating data domain in full text retrieval system
CN101655858A (en) * 2009-08-26 2010-02-24 华中科技大学 Cryptograph index structure based on blocking organization and management method thereof

Also Published As

Publication number Publication date
CN102629274A (en) 2012-08-08

Similar Documents

Publication Publication Date Title
CN101859323B (en) Ciphertext full-text search system
CN108712366B (en) Searchable encryption method and system supporting word form and word meaning fuzzy retrieval in cloud environment
Wang et al. Searchable encryption over feature-rich data
CN106815350B (en) Dynamic ciphertext multi-keyword fuzzy search method in cloud environment
US11709948B1 (en) Systems and methods for generation of secure indexes for cryptographically-secure queries
Drew et al. Polymorphic malware detection using sequence classification methods and ensembles: BioSTAR 2016 Recommended Submission-EURASIP Journal on Information Security
CN109885640B (en) Multi-keyword ciphertext sorting and searching method based on alpha-fork index tree
CN111680198B (en) File management system and method based on file segmentation and feature extraction
CN111797409A (en) Big data Chinese text carrier-free information hiding method
CN109213731B (en) Multi-keyword ciphertext retrieval method based on iterative encryption in cloud environment
CN112328606A (en) Keyword searchable encryption method based on block chain
CN108829899A (en) Tables of data storage, modification, inquiry and statistical method
CN109492410A (en) Data can search for encryption and keyword search methodology, system and terminal, equipment
Prathima et al. Automatic extractive text summarization using K-means clustering
CN102855292B (en) Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method
CN108650268B (en) Searchable encryption method and system for realizing multi-level access
Mittal et al. Privacy preserving synonym based fuzzy multi-keyword ranked search over encrypted cloud data
CN102629274B (en) Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure
CN107291851A (en) Ciphertext index building method and its querying method based on encryption attribute
CN116579001A (en) Multi-keyword searchable encryption method based on blockchain
CN115795504A (en) Searchable method and system supporting fuzzy search of Chinese word meaning
CN105426490A (en) Tree structure based indexing method
CN114528370A (en) Dynamic multi-keyword fuzzy ordering searching method and system
CN113626836A (en) Symmetric searchable encryption method and system based on LSM
US20170169079A1 (en) Method and apparatus for secured information storage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140122