CN102855292A - Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method - Google Patents

Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method Download PDF

Info

Publication number
CN102855292A
CN102855292A CN2012102842842A CN201210284284A CN102855292A CN 102855292 A CN102855292 A CN 102855292A CN 2012102842842 A CN2012102842842 A CN 2012102842842A CN 201210284284 A CN201210284284 A CN 201210284284A CN 102855292 A CN102855292 A CN 102855292A
Authority
CN
China
Prior art keywords
ciphertext
document
participle
index
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012102842842A
Other languages
Chinese (zh)
Other versions
CN102855292B (en
Inventor
霍林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Priority to CN201210284284.2A priority Critical patent/CN102855292B/en
Publication of CN102855292A publication Critical patent/CN102855292A/en
Application granted granted Critical
Publication of CN102855292B publication Critical patent/CN102855292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a safety overlay network constructing method of ciphertext full text search system and a corresponding full text search method. The invention initially provides a safety overlay network concept used for peer-to-peer full test search, solves the problems of distribution, storage and search of massive ciphertext data index files under the premise that recall ratio and precision ratio are not influenced, realizes storage and search of massive data to peer-to-peer nodes of a distributed P2P (peer-to-peer) network and introduces an index file duplicate copying mechanism based on a pivot node, wherein the mechanism ensures that resource can be rapidly obtained by searches in different directions by virtue of a duplicate of the pivot node without increasing occupation of network bandwidth, so that the safety overlay network constructing method and the corresponding full text search method disclosed by the invention are more applicable to peer-to-peer full text search when information amount is large, and efficiency of full text search under the peer-to-peer network environment is further improved.

Description

The safe nerve of a covering construction method of ciphertext full-text search system and corresponding text searching method
The application be that May 31, application number in 2010 are 201010187384.4 the applying date, invention and created name divides an application for the Chinese invention patent application of " ciphertext full-text search system ".
Technical field
The invention belongs to information retrieval and information security field, be specifically related to method and corresponding text searching method that a kind of ciphertext full-text search system and safe nerve of a covering thereof make up.
Background technology
Along with the fast development of computing machine with infotecies such as communicating by letter, the various application such as electronic medium increase sharply, and the traditional industries informationization is rapid deployment also, and industry and scientific research datamation and semi-automatic generation are so that various data are accumulated in a large number; Development of storage technology makes rapid progress also so that the growth of data total amount is more and more violent on the other hand.According to statistics, whole world informational capacity increases with exponential since the 1980s.Can say, the speed of nowadays information generation is far longer than the human speed that these information are fully digested.People carry out the needed quantity of information of effective decision-making to problem and also greatly increase, and this is just so that the user wants to find own information of being satisfied with to become more and more difficult in face of mass data.Under such background, if not by means of effective search mechanism, the excessive institute of quantity of information produce an effect is the same with the effect that can look into without information.
The full text information retrieval technology results from the U.S. of the 1950's the earliest.Nineteen fifty Calvin N.Mooers has initiated this term of information retrieval, Luhn in 1958 have proposed basic theories and the method for statistical information retrieval, nineteen sixty Marson and Kuhns have proposed the probability model of information retrieval, Gerard had founded information retrieval vector space model in 1986, nineteen sixty-eight Rocchio and Salton have proposed the method for query expansion jointly, and the DIALOG system of Lockheed company release in 1972 is the first commercial online information inquiry service in the world.From eighties of last century nineties, along with the explosive growth of the successful research and development of cheap mass data storage device, the particularly birth of Internet technology and the thing followed network information, make information retrieval technique enter a brand-new developing period.In this period, representative theoretical result comprises potential semantic indexing technology, Bayesian network and Neural Network Technology.
Global search technology has developed comparatively ripely, and external full-text search software has obtained application earlier.Although the principle of Chinese and western languages full-text search is consistent, the characteristics of Chinese itself so that Chinese Full-Text Retrieval System than the complexity of western language.The research of domestic global search technology started from about 1987, at present at home market share surpass 90%, representative text retrieval system such as the TRS of Yi Beibao letter company exploitation, support conceptual retrieval, multimedia data retrieval and the retrieval of former formatted file, support the mass memory structural data to process, and the database interface of WWW is provided.
Index model is the core technology of information retrieval, and it is the prerequisite of carrying out information retrieval that the pending data of information retrieval system are organized efficiently, retrieval rate and the storage space of index stores structure influence system.Current main index model has: signature file, inverted file, bitmap, Pat tree, Pat array and IRST etc.First three plants index model in fact all is the set of document being regarded as index entry, and index data must have document-index entry structure, thereby is difficult to realize complex query.Pat tree and Pat array are regarded index data as the stack of one group of semiinfinite string, can realize complex query, but the shortcoming such as the Existential Space expense is large.The IRST model is the index model of processing a kind of novelty of the semiinfinite character strings such as Chinese, it creates, and efficient is high, inquiry velocity is fast, the same with the Pat tree have the full characteristics of query function and the serial advantages such as die swell ratio less than the Pat tree, but also have the deficiency of the aspects such as storage organization, dynamic index renewal.
At present only has a small amount of research in the full-text search field based on ciphertext both at home and abroad, among the result who obtains by each famous large database and search engine retrieving, in the ciphertext full-text search field of Chinese, the Li Xin that only finds by Chinese Academy of Sciences computer network research centre delivers in the correlative study achievements such as application for a patent for invention " distributed cryptograph full-text retrieval system " (application number is 200910062129.4) of China in application for a patent for invention " ciphertext global search technology " (application number is 200410070113.5) and the Central China University of Science and Technology of China.The former invention is the transformation to global search technology, has almost kept most of technology of full-text search, only the index terms of index file is encrypted processing; The latter has realized the full text information retrieval under the ciphertext condition, has guaranteed the safe retrieval of sensitive data, has high safety, carry out the high characteristics of efficient, its index file is inverted file, but can not carry out ciphertext substring query and the inquiry of potential participle, and can not carry out ciphertext and dynamically update.
New peer-to-peer network (the Peer-to-Peer that rises, P2P) be considered to a kind of more efficient, network of load balancing and better fault-tolerance advantage more that has, by the P2P network, a large amount of computing machines can organize together and form a high-performance, high reliability, high scalability and low consumed computing integral body.Because the P2P storage system faces the problems such as node isomerism, Node selfishness, node security and node be ageing, most researchs concentrate on search and location mechanism, have ignored replicanism.Carrying out Chinese Full Text Retrieval and Chinese ciphertext full-text search at this net environment, introduce index file copy replication mechanism, is to improve recall precision, reduces network consumption and the effective way that realizes the distributed node load balancing.
Summary of the invention
One of purpose of the present invention is to provide the ciphertext full-text search system that a kind of data security is high, index creation speed fast, data retrieval efficient is high.
Another object of the present invention, also be on the basis of above-mentioned ciphertext full-text search system, a kind of ciphertext full-text search system based on dynamic descendence tree index structure and establishment thereof, search method are provided, and this system supports dynamically updating of index, and can realize the ciphertext substring query.
Another purpose of the present invention also is to provide the safe nerve of a covering construction method of above-mentioned ciphertext full-text search system and the text searching method of correspondence.
Concrete technical scheme is as follows:
One, ciphertext full-text search system
Ciphertext full-text search system of the present invention includes urtext processing module, word-dividing mode, encrypting module, document ciphertext memory module, ciphertext index module, searching ciphertext module, result for retrieval processing module, system management module; It is characterized in that:
Described urtext processing module, be used for the urtext of document is formatd pre-service, include but not limited to electronic paper document and/or format electronics original document, and extract the information such as its theme, text and adeditive attribute, and, form the document summary.Wherein, electronic paper document makes paper document exactly after the modes such as overscanning, obtains manageable electronic original document; Electronic document unification to be processed is converted into plain text document to formatted document to need exactly.
Described word-dividing mode is used for document subject matter, text and adeditive attribute etc. that described urtext processing module provides are carried out participle and extracted proper vector, and, carry out participle and query expansion for the term that described searching ciphertext module is provided/string.
Described encrypting module, to plain text document, the document summary that includes but not limited to that described urtext processing module sends, proper vector, participle that described word-dividing mode sends are encrypted operation, and deposit described proper vector ciphertext in proper vector ciphertext storehouse; The participle positional information that is sent by the ciphertext index module is carried out the enciphering/deciphering operation; To including but not limited to that document ciphertext, document summary ciphertext that described document ciphertext memory module sends are decrypted; The proper vector ciphertext that sends through described result for retrieval processing module is decrypted; And provide the corresponding data through enciphering/deciphering to described document ciphertext memory module, result for retrieval processing module, searching ciphertext module, ciphertext index module.
Described document ciphertext memory module, be used for distributed store, document ciphertext and document summary ciphertext be provided: described distribution be according to region, document security level and document classification decide corresponding ciphertext deposit to destination document ciphertext server, each document ciphertext server receives, and document ciphertext and the document summary ciphertext that provides from described encrypting module also is provided; This module also can be accepted the ciphertext read requests of described result for retrieval processing module, and the ciphertext that needs deciphering is provided for encrypting module.
Described ciphertext index module, be used for distribution ciphertext participle and create, store ciphertext index, provide the ciphertext index that needs deciphering, and the document code that retrieves: described distribution be according to region, document security level and document classification decide corresponding ciphertext index deposit to the target index server; Each index server the line index of going forward side by side of the ciphertext participle that provides from described encrypting module and participle positional information is provided creates, and the storage ciphertext index is to corresponding ciphertext index storehouse after encrypting; This module is also according to the ciphertext index in classification request of described searching ciphertext module, from the ciphertext index storehouse, retrieve the index participle positional information ciphertext that needs deciphering and be sent to encrypting module, and, will send to the searching ciphertext module from the document code collection that encrypting module returns.
Described searching ciphertext module provides the information retrieval service of appropriate level for the validated user of system.The term of this module reception validated user input/string is submitted to described word-dividing mode after examination is filtered; The expansion ciphertext minute word set that the reception encrypting module sends also forms the request of ciphertext index in classification, then is sent to described ciphertext index module and retrieves; Receive the document code collection that the ciphertext index module is returned, and submit to described result for retrieval processing module.
Described result for retrieval processing module is used for receiving and processing the document code collection that described searching ciphertext module provides, and the result set that will obtain after will processing through ordering returns to retrieval user; Document code collection according to described searching ciphertext module provides takes out corresponding proper vector ciphertext from proper vector ciphertext storehouse, set is sorted to document code after the encrypting module deciphering; Orderly document code collection is sent to document ciphertext memory module; Reception is through the respective document summary of described encrypting module deciphering and be shown to the user; Document summary according to user selection is expressly extracted the respective document ciphertext, is shown to the user after the encrypting module deciphering, and its extracting mode is expressly identical with extraction document summary.
Described system management module is used for the leading subscriber authority, and department, role, user's essential information and the mapping relations between them are carried out maintenance update etc.
Further, described urtext processing module includes conversion unit, extraction unit, summary unit; Described conversion unit is used for electronic paper document, and, need electronic document unification to be processed be converted into plain text document; Described extraction unit, the document information that is used for plain text document that conversion unit is provided extracts, and the information of extraction includes but not limited to theme, text, adeditive attribute; Described summary unit, include but not limited to theme, summary, author, time, source etc. that described extraction unit is provided are organized into the document summary;
Further, described word-dividing mode includes the participle unit, proper vector unit, query expansion unit; Described participle unit is used for include but not limited to theme, text, adeditive attribute, the term/string etc. that sends carried out participle; Described proper vector unit extracts the file characteristics word from word segmentation result, form proper vector; Described query expansion unit carries out query expansion to term/string participle;
Further, described encrypting module includes file encryption unit, participle ciphering unit, participle position ciphering unit; Responsible plain text document, the document summary that the urtext pretreatment module is sent in described file encryption unit, the proper vector that word-dividing mode sends is encrypted processing; Include file encryption information table, cryptographic calculation device in the file encryption unit; Described file encryption information table is used for obtaining encrypts required key and cryptographic algorithm to plain text document, document summary, proper vector; The participle that described participle ciphering unit is responsible for word-dividing mode is sent is encrypted processing; Include participle grouping submodule, participle block encryption information table, cryptographic calculation device in the participle ciphering unit; Described participle block encryption information table is used for obtaining encrypts required key and cryptographic algorithm to participle; Described participle position ciphering unit is responsible for the participle positional information that the distributed cryptograph index management module sends is carried out enciphering/deciphering; Include participle position enciphered message table, cryptographic calculation device in the ciphering unit of participle position; Described participle position enciphered message table is used for obtaining encrypts required key and cryptographic algorithm to the participle positional information; Described file encryption unit, participle ciphering unit, participle position ciphering unit also need to use key management unit, cryptographic algorithm storehouse in ciphering process;
Further, described document ciphertext memory module includes document ciphertext distribution proxy module and distribution type file ciphertext administration module; Described distribution type file ciphertext administration module is used for managing and storage document ciphertext and document summary ciphertext, includes document ciphertext server, document summary ciphertext storehouse and document ciphertext storehouse; Described document ciphertext server is responsible for affiliated document summary library and document ciphertext storehouse are carried out access; Described document ciphertext distribution proxy module is used for document ciphertext and document summary ciphertext are distributed to corresponding document ciphertext server;
Further, described ciphertext index module includes ciphertext participle distribution proxy module and distributed cryptograph index distribution management module; Described distributed cryptograph index distribution management module includes index server and ciphertext index storehouse, by each ciphertext index storehouse of index service management; Each index server the ciphertext participle and the participle positional information that provide from described encrypting module is provided creates index, and the storage ciphertext index is to corresponding ciphertext index storehouse; Index server includes the document code information table; Described document code information table is used for pseudo-document numbering group corresponding to record document; Described pseudo-document numbering group is a set that the one-to-many mapping forms of document code, is generated by system; Described ciphertext participle distribution proxy module is responsible for the ciphertext participle is distributed on the index server;
Further, described searching ciphertext module includes retrieve statement commit unit and retrieval server; The term of described retrieve statement commit unit reception validated user input/string is submitted to described word-dividing mode after examination is filtered; Described retrieval server, be used for to expand ciphertext minute word set is distributed to described ciphertext index module with broadcast mode each index server, reception is also processed the result that each index server returns, and the unordered document code collection that then will obtain sends to the result for retrieval processing module;
Further, described result for retrieval processing module includes result for retrieval sequencing unit and display unit as a result; Described result for retrieval sequencing unit is used for unordered document code collection is sorted; Described as a result display unit comprises that the document summary shows and document shows.
Two, the ciphertext full-text index creation method corresponding with above-mentioned ciphertext full-text search system may further comprise the steps:
(1) the concerning security matters urtext document original text of the user being submitted to is converted into plain text, extracts theme, text and other adeditive attributes in the original text file, and forms the document summary;
(2) theme in the original text file, text, adeditive attribute are carried out word segmentation processing, and extract proper vector;
(3) plain text document, the document summary that obtain in the step (1) are encrypted respectively;
(4) the document ciphertext distributed store that obtains in the step (3) is arrived corresponding document ciphertext storehouse, the document summary ciphertext distributed store that obtains in the step (3) is arrived corresponding document summary ciphertext storehouse;
(5) participle, the proper vector that obtain in the step (2) are encrypted respectively;
(6) the proper vector ciphertext that obtains in the step (5) is stored into proper vector ciphertext storehouse;
(7) the ciphertext participle that obtains in the step (5) is distributed to each index server;
(8) each index server obtains corresponding participle position ciphertext according to the ciphertext index in classification in the step (7);
(9) the participle position ciphertext that obtains in the step (8) is decrypted;
(10) pass the participle position after the deciphering in the step (9) back the respective index server;
(11) index server creates index according to the participle position;
(12) index that obtains in the step (11) is encrypted;
(13) store the ciphertext index that obtains in the step (12) into corresponding ciphertext index storehouse;
The ciphertext full-text search method corresponding with above-mentioned ciphertext full-text index creation method may further comprise the steps:
(1) term of the user being submitted to/string carries out participle and makes query expansion;
(2) an expansion minute word set that obtains in the step (1) is encrypted;
(3) the expansion ciphertext minute word set that obtains in the step (2) is distributed to each index server with broadcast mode;
(4) each index server is retrieved;
(5) the document code collection that returns of each index server of systematic collection;
(6) system reads respective document proper vector ciphertext according to the document code collection that obtains in the step (5);
(7) the proper vector ciphertext that obtains in the system decrypts step (6);
(8) utilize the proper vector that obtains in the step (7) that the document code collection is sorted;
(9) read respective document summary ciphertext according to the orderly document code collection that obtains in the step (8);
(10) with document summary decrypt ciphertext;
(11) the document summary after will deciphering is shown to the user;
(12) system obtains corresponding document ciphertext according to user's selection;
(13) with the document decrypt ciphertext;
(14) document after will deciphering is shown to the user.
Three, set the ciphertext full-text search system of index structure based on dynamic descendence
Based on the ciphertext full-text search system of dynamic descendence tree index structure, be the further improvement to above-mentioned ciphertext full-text search system technical scheme, include ciphertext dynamic descendence tree index structure.
Described ciphertext dynamic descendence tree index is a forest, and described forest is comprised of subtree; The structure of each stalk tree includes the ciphertext of tree root, the ciphertext of leaf, and, the ciphertext of the leaf information set that is formed by pseudo-document numbering, leaf position, leaf relative position, leaf mutation;
Described tree root is used in reference to the participle of subrogating in tree root;
Described leaf, namely tree root is follow-up, is used in reference to the participle of subrogating in leaf;
Described pseudo-document numbering is an element of pseudo-document numbering group;
Described leaf position is used in reference to the position of current leaf of generation in document;
Described leaf relative position is used in reference to acute pyogenic infection of finger tip to the pointer of the follow-up participle of current leaf;
Described leaf mutation is used in reference to a string character string of replacing from generation to generation former leaf;
The concrete method for building up of described ciphertext dynamic descendence tree index is: tree root, leaf in each stalk tree are encrypted respectively, bulk encryption is carried out in pseudo-document numbering, leaf position, leaf relative position, leaf mutation, can be obtained described ciphertext dynamic descendence tree index.
Described searching ciphertext module utilizes leaf position and leaf relative position to record the position relationship of substring to be matched, thereby has realized at index terms the ciphertext not substring query under the DecryptDecryption state, i.e. ciphertext substring query.The ciphertext substring query is the important leverage of ciphertext full-text search system recall ratio; This method has not only guaranteed the safety of index database, the expense when also having saved the ciphertext substring query simultaneously.
Four, with above-mentioned corresponding ciphertext full-text index creation method based on dynamic descendence tree index structure, adopted aforesaid ciphertext full-text index creation method, it is characterized in that,
1) participle in the above-mentioned steps (5) adopts following encryption method:
A, according to participle block encryption information table participle is expressly divided into groups, obtain cipher generating parameter and the cryptographic algorithm numbering of this participle, and send to key management unit;
B, key management unit calculate the participle packet key according to cipher generating parameter, number in the cryptographic algorithm storehouse according to cryptographic algorithm simultaneously and extract cryptographic algorithm;
C, according to resulting participle packet key and cryptographic algorithm, participle is encrypted.
2) ciphertext index in the above-mentioned steps (11) creates the following method that adopts:
A, to each ciphertext participle, choose at random a pseudo-document numbering according to the document code information table and replace the document code that former ciphertext participle carries;
B, in the ciphertext index storehouse, search tree root with the forerunner of ciphertext participle, itself search leaf with the ciphertext participle, obtain corresponding leaf information set;
The leaf information set that obtains among c, the decryption step b is inserted into corresponding leaf information with the positional information of ciphertext participle and concentrates;
If the leaf information set length after d inserts surpasses limit value, then the leaf information set is divided, represent to handle in full when running into terminal symbol;
E, unencrypted leaf information set in this index is encrypted;
Leaf information set among above-mentioned steps d, the e is divided and encryption method is:
If a) length of leaf information set is greater than leaf information set average length, then it is divided into several without the common factor subset, each subset length is in the scope of default;
B) be respectively random spanning tree leaf mutation of each subset except first leaf information set subset, so that every subset all leaf mutation of correspondence or leaf;
C) according to the ciphertext participle, in the enciphered message table of participle position, obtain cipher generating parameter and cryptographic algorithm numbering, and send key management unit to;
D) key management unit calculates the participle location key according to cipher generating parameter, extracts cryptographic algorithm according to the cryptographic algorithm numbering in the cryptographic algorithm storehouse simultaneously;
E) according to d) in the participle position cryptographic algorithm that obtains, and the participle location key expressly is encrypted the leaf information set.
From above-mentioned steps as can be known, described leaf information set encryption method can be carried out grouping management to described leaf information set, and every group of information appliance has different cryptographic algorithm and the key of certain Cipher Strength to be encrypted; Leaf information set length to high frequency words is carried out equalization processing, for high frequency words produces the leaf mutation at random, its leaf information set is divided into a plurality of leaf information set subsets and encryption, and the leaf mutation makes the dynamic change of ciphertext participle quantity, prevents statistical attack.
Five, with above-mentioned corresponding ciphertext full-text search method based on dynamic descendence tree index structure, adopted aforesaid ciphertext full-text search method, it is characterized in that, the retrieval in the above-mentioned steps (4), with term " qs1; qs2 ..., qsi; ...; qsn " be the example explanation, n is the participle number of term/string, its step is as follows:
1) the participle number of judgement term/string: if n=1 then changes 2 over to); If n=2 then changes 3 over to); Otherwise change 4 over to);
2) judge whether the tree root table exists this participle; If exist, then retrieve the hit results collection and be the set of the leaf position of the leaf information set of all leaves in the leaf table of this participle; Retrieval finishes;
3) judge whether the tree root table exists qs1; Whether the leaf table of judging qs1 exists qs2; If qs1, qs2 exist, then retrieve hit results and integrate set as the leaf position of the leaf information set of qs2; Retrieval finishes;
4) judge whether the tree root table exists qs1, if exist, judge whether the leaf table of qs1 exists qs2;
5) if qs1, qs2 exist, then obtain the set of relative position of the leaf information set of qs2, be designated as Urpi;
6) circulate with 3≤j≤n, judge whether the leaf table of qsi exists qsj; If exist, then obtain the set of leaf position of the leaf information set of qsj, be designated as spj; Calculate the common factor of spj and Urpi, namely the leaf position subset of qsj is designated as Uspj; Obtain the subclass of relative position of the leaf information set of qsj, be designated as Urpj; Circulation just obtains PRELIMINARY RESULTS Urpj n-2 time at most;
7) the retrieval hit results integrates the set as the leaf position of all leaf information sets among the Urpj; Retrieval finishes.
Six, with above-mentioned corresponding ciphertext full-text index update method based on dynamic descendence tree index structure, it is characterized in that, adopted and upgraded the ciphertext dynamic descendence tree index updating method that granularity is the document local level, the method includes increases operation, deletion action and retouching operation;
1) described increase operation, its concrete steps are as follows:
A, set up leaf information with relative position for the new text that adds;
Added the leaf information set of the leaf that text affects in b, the decrypted original index;
C, newly-established leaf information is inserted in the former index; In this insertion process, only the forerunner's that adds text leaf relative position is revised, make it point to the initial character leaf position of adding text, the relative position value that forerunner's leaf is original writes the trailing character leaf relative position that adds text simultaneously;
D, at every turn insert new positional information after, judge leaf information set length, if greater than setting value, then carry out the leaf information set and divide;
E, the leaf information set that obtains in the steps d is encrypted;
2) described deletion action, its concrete steps are as follows:
If a delete position relates to a plurality of leaf information sets, then first with its deciphering and be merged into a leaf information set;
B, in the position that needs text suppression, directly revise the forerunner's of deletion leaf relative position;
C, deletion need the positional information of deletion;
D, the leaf information set after will deleting carry out the length equalization processing, encrypt also deposit;
3) described retouching operation is realized in the mode of text suppression and interpolation.
Seven, as the more further popularization to above-mentioned ciphertext full-text search system technical scheme, native system can be applied to peer-to-peer network: based on peer-to-peer network, we propose the concept of safe nerve of a covering, and reciprocity full-text search is provided on this basis; This includes centralized retrieval server and peer node based on the ciphertext equity text retrieval system of safe nerve of a covering;
1) described safe nerve of a covering is the network of the peer node in the described peer-to-peer network being organized formation according to its safe level dominance relation; In this safe nerve of a covering, the node that safe level is high is directly or indirectly arranged the node on safe level ground, and simultaneously, data are subjected to the node of its domination to safe level by the high node-flow of safe level.Safe nerve of a covering is satisfying between the peer node in the interoperability, can limit the range of nodes that operation relates to, and reduces propagation and the operation of garbage, improves whole efficiency;
The structure of described safe nerve of a covering, its step is as follows:
A, node (p) are broadcasted the message that adds safe nerve of a covering in peer-to-peer network;
The set of node of b, safe level dominate node (p) is replied with u, and node (p) is joined in the set of node that is subjected to oneself to arrange;
C, safe level are replied with d by the set of node of node (p) domination, and node (p) are joined in the set of node of domination oneself;
D, node (p) are replied according to other node, it is joined the own set of node of domination or the set of node that is subjected to oneself to arrange in;
2) described full-text search based on safe nerve of a covering forms by centralized retrieval with based on coordinate indexing two parts of safe nerve of a covering.
Described centralized retrieval, the server cluster that is comprised of a station server or multiple servers provides search function, and its retrieval is directly initiated to retrieval server by node; The result of retrieval provides the data of retrieval service by nodal cache as node;
Described coordinate indexing based on safe nerve of a covering is to increase the coordinate indexing function on the basis of above-mentioned centralized retrieval, and this coordinate indexing is to be initiated in safe nerve of a covering and carried out by peer node, its retrieval to liking the content of each nodal cache;
Below with node p nCarrying out retrieval request q is example, and the step of described full-text search based on safe nerve of a covering is described:
A, node p nOneself carries out q;
B, node p nQ is sent to the node that all safe level are arranged by oneself;
C, each node oneself that receives q are carried out q;
D, receive each node of q, q is continued to issue all nodes that safe level is subjected to oneself domination;
E, node p nCollect and the coordinate indexing result is presented to the user, finish to carry out if the user is satisfied;
F, node p nQ is issued centralized retrieval server, and the buffer memory result for retrieval also presents to the user.
3) described text searching method based on safe nerve of a covering also includes the index file copy replication mechanism based on hub node;
Described index file copy replication mechanism based on hub node is on the peer-to-peer network basis, combines a kind of new copy replication mechanism of the advantage proposition of traditional copy replication mechanism; This mechanism has guaranteed can obtain fast resource from searching of different directions by the copy of hub node, and does not increase the network bandwidth and take, and is more suitable for the reciprocity full-text search when quantity of information is huge;
Described index file copy replication mechanism based on hub node includes following at least two parts:
First finds and the storage hub node;
Second portion is by the copy replication of different threshold triggers source nodes or hub node being set, wherein the source node buffer memory being made as replenishing of hinge copy replication;
Described as follows based on the index file copy replication step under the index file copy replication mechanism of hub node:
A, after the node of save data receives new query requests, the routing iinformation during the copy replication mechanism in this node will be asked is kept in the routing table, by relatively finding hub node;
B, this replicanism are added up the request number of times of each hub node and threshold value are set and retrain;
The request number of times of c, this replicanism record queries source node;
D, when receiving a large amount of different paths to the request of same data and reaching respective threshold, by the hub node replicanism, copy data in the hub node in these paths;
E, on the basis of steps d, if still have some source node that the request of some data is reached another more during high threshold, then data are directly copied at this source node.
The ciphertext dynamic descendence tree index updating method of the ciphertext dynamic descendence tree index structure that ciphertext full-text search system of the present invention provides based on us, participle group technology, document local level, realized safely and efficiently index creation, index dynamically update and the ciphertext state under full-text search and substring query; Realize organization security nerve of a covering in the P2P network, and created on this basis the index file copy replication mechanism based on the peer-to-peer network hub node.Compare with existing ciphertext full-text search system, the present invention has following advantage:
(1) high security: the participle group technology has guaranteed the security of index terms.Participle in the dynamic descendence tree index structure is encrypted, shielded the real semanteme of participle, be updated periodically the participle ciphertext so that the assailant becomes invalid to the ciphertext participle analysis in the index file vocabulary.The leaf information set is encrypted, shielded the participle positional information.Thereby prevented the content that the assailant pieces together out one piece of ciphertext document by the positional information that obtains the ciphertext participle with pseudo-document numbering group.The leaf information set is divided, and the leaf information subset and the leaf mutation binding that obtain are encrypted, both prevented ciphertext length statistical attack, further guaranteed again the security of participle.Do not need the decrypting ciphertext index terms during retrieval, only decipher the leaf information set that needs in the retrieving, unwanted leaf information set is still kept the ciphertext state.
(2) high efficient and the recall precision of creating: through single pass, can and create index tree to the original text participle.The participle that is positioned at tree root forms the tree root table, and the participle that is positioned at leaf forms the leaf table, and the positional information of leaf participle in former document forms the leaf information set.The corresponding leaf table of each tree root list item, the corresponding leaf information set of each leaf list item, the list item of tree root table and leaf table HashTree with lexicographic order in internal memory stores, and deciphers as required during retrieval, has improved and has searched speed.
(3) index upgrade is dynamically high: the renewal granularity that the present invention proposes is that the indexes dynamic update method of document local level can be in the place that needs upgrade, directly node is increased, deletes, changes operation, do not need headspace, also without additional index, realized dynamically updating in real time of index file.
(4) realized the ciphertext substring query: the native system model utilizes leaf position and leaf relative position to record the position relationship of string substring to be matched, do not realize substring query under the DecryptDecryption state at index terms, not only guaranteed the safety in ciphertext index storehouse, the expense of also having saved the ciphertext substring query simultaneously.
(5) extensibility is good, solve distributed store and the search problem of mass data index file: the safe nerve of a covering concept that has proposed to be used for reciprocity full-text search, under the prerequisite that does not affect recall ratio and precision ratio, distributed store and the search problem of magnanimity encrypt data index file have been solved, realized mass data to the storage of distributed P 2 P network peer node and retrieval, the while still keeps having the ciphertext substring query and the ciphertext index file dynamically updates characteristic.
(6) reduce network consumption: based on the index file copy replication mechanism of hub node, this mechanism is a kind of new copy replication mechanism that combines the advantage proposition of traditional copy replication mechanism; This mechanism is by the copy of hub node, guaranteed to obtain fast resource from searching of different directions, and do not increase the network bandwidth and take, be more suitable for the reciprocity full-text search when quantity of information is huge, further improved the efficient of full-text search under the peer to peer environment.
Description of drawings
Fig. 1 is the system assumption diagram of an embodiment of ciphertext full-text search system of the present invention.
Fig. 2 is that the structure of an embodiment of ciphertext full-text search system of the present invention forms schematic diagram.
Fig. 3 is the establishment schematic diagram in ciphertext full-text search system document of the present invention ciphertext storehouse and document summary ciphertext storehouse.
Fig. 4 is the structural representation that the present invention is based on an embodiment of the ciphertext dynamic descendence tree index that the ciphertext full-text search system of dynamic descendence tree index structure adopts.
Fig. 5 is the index creation process schematic diagram of an embodiment that the present invention is based on the ciphertext full-text search system of dynamic descendence tree index structure.
Fig. 6 is the schematic diagram of ciphertext full-text search system searching ciphertext process of the present invention.
Fig. 7 is the safe nerve of a covering exemplary plot of an embodiment of ciphertext full-text search system of the present invention.
Fig. 8 is that ciphertext full-text search system of the present invention is based on the full-text search principle schematic of safe nerve of a covering.
Fig. 9 is that ciphertext full-text search system of the present invention is based on the copy replication mechanism of hub node.
Embodiment
Below in conjunction with drawings and Examples ciphertext full-text search system of the present invention, the ciphertext full-text search system based on dynamic descendence tree index structure, the safe nerve of a covering construction method of ciphertext full-text search system and text searching method and the principle of work thereof of correspondence are further described.
As shown in Figure 1, system of the present invention comprises: urtext processing module 100, word-dividing mode 200, encrypting module 300, document ciphertext memory module 400, ciphertext index module 500, searching ciphertext module 600, result for retrieval processing module 700 and system management module 800.
The System Working Principle step is as follows:
(1) after the user realized secure log by system management module 800, system judged that user selection is to carry out to create index file or carry out search function, if retrieval then enters step (15);
(2) system is converted into plain text to the concerning security matters urtext document original text that the user submits to, extracts theme, text and other adeditive attributes in the original text file, and forms the document summary;
(3) system carries out word segmentation processing to theme, text, adeditive attribute, and extracts proper vector;
(4) system encrypts respectively the plain text document, the document summary that obtain in the step (2);
(5) system to corresponding document ciphertext storehouse, arrives corresponding document summary ciphertext storehouse to the document ciphertext distributed store that obtains in the step (4) to the document summary ciphertext distributed store that obtains in the step (4);
(6) system encrypts respectively the participle, the proper vector that obtain in the step (3);
(7) system stores the proper vector ciphertext that obtains in the step (6) into proper vector ciphertext storehouse.
(8) the ciphertext participle that obtains in the step (6) is distributed to each index server;
(9) each index server obtains corresponding participle position ciphertext according to the ciphertext index in classification in the step (8);
(10) the participle position ciphertext that obtains in the step (9) is decrypted;
(11) pass the participle position after the deciphering in the step (10) back the respective index server;
(12) index server creates index according to the participle position;
(13) index that obtains in the step (12) is encrypted;
(14) store the ciphertext index that obtains in the step (13) into corresponding ciphertext index storehouse;
(15) user submits term/string to;
(16) system carries out participle and carries out query expansion term that the user is submitted to/string;
(17) system is encrypted an expansion minute word set that obtains in the step (16);
(18) system is distributed to each index server to the expansion ciphertext minute word set that obtains in the step (17) with broadcast mode;
(19) each index server is retrieved;
(20) the document code collection that returns of each index server of systematic collection;
(21) system reads respective document proper vector ciphertext according to the document code collection that obtains in the step (20);
(22) the proper vector ciphertext that obtains in the decryption step (21);
(23) system utilizes the proper vector that obtains in the step (22) that the document code collection is sorted;
(24) system reads respective document summary ciphertext according to the orderly document code collection that obtains in the step (23);
(25) with document summary decrypt ciphertext;
(26) the document summary after will deciphering is shown to the user;
(27) obtain corresponding document ciphertext according to user's selection;
(28) the document ciphertext is decrypted;
(29) document after will deciphering is shown to the user.
(2) effect in above-mentioned steps is described in more detail to each module respectively below in conjunction with Fig. 2:
1, the urtext processing module 100:
The pre-service of text mainly includes two aspects: physically, be to document electronic disposal in kind; In logic, be that electronic document normalization and classification are processed.As shown in Figure 2, this urtext processing module 100 includes conversion unit 110, extraction unit 120 and summary unit 130, wherein, conversion unit 110 is realized electronic paper document, make exactly paper document after the modes such as overscanning, obtain manageable electronic original document, and, need electronic document unification to be processed be converted into plain text document; Extraction unit 120 is responsible for the document information in the above-mentioned plain text document is extracted, and the information of extraction includes but not limited to theme (title, summary, key word), text, adeditive attribute (author, author unit, source, time); Summary unit 130 is organized into the document summary with theme, summary, author, time, source etc.
2, word-dividing mode 200:
Word-dividing mode 200 is used for document subject matter, text and adeditive attribute etc. that described urtext processing module provides are carried out participle and extracted proper vector, and, carry out participle and query expansion for the term that described searching ciphertext module is provided/string.Wherein, the 210 pairs of themes that send in participle unit, text, adeditive attribute, term/string etc. carry out participle; Proper vector unit 220 extracts the file characteristics word from word segmentation result, form proper vector; The 230 pairs of terms in query expansion unit/string participle carries out query expansion.
3, encrypting module 300:
Encrypting module 300 comprises providing the encryption and decryption function, specifically includes file encryption unit 310, participle ciphering unit 320, participle position ciphering unit 330:
(1) the file encryption unit 310, responsible plain text document, the document summary that urtext pretreatment module 100 is sent, and the proper vector that word-dividing mode 200 sends is encrypted processing; In the document ciphering unit 310, include file encryption information table, cryptographic calculation device; Described file encryption information table is used for obtaining encrypts required key and cryptographic algorithm to plain text document, document summary, proper vector;
(2) the participle ciphering unit 320, and the participle of being responsible for word-dividing mode 200 is sent expressly is encrypted processing.Include participle grouping submodule, participle block encryption information table, cryptographic calculation device in the participle ciphering unit 320.Described participle block encryption information table is used for obtaining encrypts required key and cryptographic algorithm to participle; Also need to use key management unit, cryptographic algorithm storehouse in its ciphering process.When participle is encrypted, utilize the calculation of parameter participle encryption key in the participle block encryption information table, for identical participle provides identical cryptographic algorithm and key, guarantee that the ciphertext that identical participle is encrypted comes to the same thing.
Described participle grouping submodule is responsible for participle is expressly divided into groups, its participle group technology comprises that the participle grouping creates and two kinds of operations are upgraded in the participle grouping, wherein participle grouping establishment is that the participle from word-dividing mode 200 is expressly carried out random packet, it is to make each participle belong to different participle groupings within the different cycles that the participle grouping is upgraded, and strengthens the randomness of participle grouping.
1) the participle grouping creates
When the participle grouping created, participle ciphering unit 320 at first received the participle plaintext from word-dividing mode 200, according to participle block encryption information table participle is expressly divided into groups, and the participle grouping information that processing obtains is sent to key management unit; Described key management unit extracts cryptographic algorithm according to the participle grouping information in the cryptographic algorithm storehouse, and calculates the participle packet key, then participle is encrypted processing, and the ciphertext participle that obtains is sent to ciphertext index module 500.
2) the participle grouping is upgraded
Renewal between being divided into groups by the current participle of the system triggers grouping next participle adjacent with it.Because key management unit is to utilize the participle grouping information to generate the participle packet key, so system can trigger the renewal of index terms ciphertext simultaneously.In order to prevent the index terms ciphertext than upgrading more frequently, the cycle of upgrading between minute phrase is set to the integral multiple in the cycle of participle grouping renewal.
When the participle grouping was upgraded, the participle in the participle grouping that two adjacent needs of the random exchange of participle grouping submodule upgrade obtained and preserves new participle grouping information; Search the cryptographic algorithm storehouse according to participle block encryption information table, obtain two participle block encryption algorithm and new participle block encryption algorithms in next cycle that adjacent participle grouping is current; Calculate respectively the current participle packet key of two adjacent participle groupings and the new participle packet key in next cycle according to new and old participle grouping and participle grouping current period; According to new and old key to cryptographic algorithm pair, encrypt each participle in the participle grouping, obtain new and old participle ciphertext to collection; According to new and old participle ciphertext pair, new and old participle ciphertext; Upgrade and finish.
(3) participle position ciphering unit 330: include participle position enciphered message table, cryptographic calculation device; Described participle position enciphered message table is used for obtaining encrypts required key and cryptographic algorithm to the participle positional information;
Described file encryption unit 310, participle ciphering unit 320, participle position ciphering unit 330 also need to use key management unit, cryptographic algorithm storehouse in ciphering process;
4, document ciphertext memory module 400
Document ciphertext memory module 400 is used for distributed store, document summary ciphertext and document ciphertext is provided, and includes document ciphertext distribution proxy module 410 and distribution type file ciphertext administration module 420.Document ciphertext distribution proxy module 410 is distributed to corresponding document ciphertext server according to the document standard code with document ciphertext and document summary ciphertext.Distribution type file ciphertext administration module 420 is used for management document ciphertext and document summary ciphertext, can be comprised of several document ciphertext servers, document summary ciphertext storehouse and document ciphertext storehouse.Document ciphertext server is responsible for affiliated document summary library and document ciphertext storehouse are carried out access.
Introduce the constructive process in document ciphertext storehouse and document summary ciphertext storehouse below in conjunction with Fig. 3:
(1) 100 pairs of documents of urtext processing module carry out pre-service, generate document code, document standard code, plain text document, document summary;
(2) plain text document and document summary are encrypted; Ciphering process is as follows: encrypting module 300 is received one piece of plain text document and document summary, cryptographic algorithm storehouse to encrypting module 300 obtains a random file encryption algorithm, key management unit by encrypting module 300 generates the file encryption key, and key management unit is to the newly-increased record of file encryption information table simultaneously; Use this algorithm and file encryption key that plain text document and document summary are encrypted, obtain document ciphertext and document summary ciphertext, by encrypting module 300 document ciphertext and document summary ciphertext are passed to document ciphertext memory module 400;
(3) the document ciphertext distribution proxy module 410 in the document ciphertext memory module 400 is distributed to corresponding document ciphertext server according to the document standard code to document ciphertext and document summary ciphertext; Document ciphertext server is stored in corresponding document ciphertext storehouse and document summary ciphertext storehouse to document ciphertext and document summary ciphertext.
5, the ciphertext index module 500:
Ciphertext index module 500 includes ciphertext participle distribution proxy module 510, distributed cryptograph index management module 520.Wherein, ciphertext participle distribution proxy module 510 is responsible for the ciphertext participle is distributed on the index server; Distributed cryptograph index distribution management module 520 includes at least one index server and several ciphertext index storehouses, by each ciphertext index storehouse of index service management; Each index server the ciphertext participle and the participle positional information that provide from described encrypting module is provided creates index, and the storage ciphertext index is to corresponding ciphertext index storehouse; Index server includes the document code information table; Described document code information table is used for pseudo-document numbering group corresponding to record document; Described pseudo-document numbering group is a set that the one-to-many mapping forms of document code, is generated by system.
(1) ciphertext dynamic descendence tree index structure
In ciphertext full-text search system of the present invention, also use the ciphertext dynamic descendence tree index structure that has the inventor to propose first, further introduce below in conjunction with Fig. 4 and example:
Ciphertext dynamic descendence tree index is a forest, and the structure of its each stalk tree comprises the ciphertext of tree root, the ciphertext of leaf, and the ciphertext of the leaf information set of pseudo-document numbering, leaf position, leaf relative position, leaf mutation composition;
Tree root is used in reference to the participle of subrogating in tree root;
Leaf, namely tree root is follow-up, is used in reference to the participle of subrogating in leaf;
Pseudo-document numbering is an element of pseudo-document numbering group;
The leaf position is used in reference to the position of current leaf of generation in document;
The leaf relative position is used in reference to acute pyogenic infection of finger tip to the pointer of the follow-up participle of current leaf;
The leaf mutation is used in reference to a string character string of replacing from generation to generation former leaf;
The index structure of subtree can be expressed as: tree root<leaf ([pseudo-document numbering, { (leaf position, leaf relative position) }], leaf mutation)>and, wherein different between the leaf ciphertext under the same tree root; [pseudo-document numbering, { (leaf position, leaf relative position) }] be a leaf information, formed by pseudo-document numbering, list of locations, pseudo-document numbering is used for disturbing the real numbering of document, obtains the ciphertext of piecing together out this piece document after the positional information to prevent the assailant; The corresponding leaf information set of leaf and a leaf mutation, the leaf mutation allows for sky, but the leaf information set can not be sky.
(2) ciphertext dynamic descendence tree index creation
During ciphertext dynamic descendence tree index creation, as shown in Figure 5, theme, text and additional information that word-dividing mode 200 receives from urtext processing module 100 are carried out word segmentation processing by participle unit 210 to these information; Participle unit 210 sends the participle that obtains to encrypting module 300, is encrypted by 320 pairs of participles of its participle ciphering unit; Participle ciphering unit 320 sends the ciphertext participle to ciphertext index module 500, extract document standard code corresponding to ciphertext participle by ciphertext participle distribution proxy module 510, according to the document standard code ciphertext participle is distributed to index server corresponding in the distributed cryptograph index management module 520; Index server distributes pseudo-document numbering group for each newly-increased document code, and is recorded to the document code information table; Search current available index database, and leaf and the leaf information set that affected by newly-increased document sent to encrypting module 300; Send it back corresponding index server after the leaf information set deciphering of 330 pairs of leaves that receive of participle position ciphering unit; Index server is chosen at random pseudo-document and is numbered respective document establishment index; Index server sends to participle position ciphering unit 330 with corresponding leaf information set; The leaf information set that 330 pairs of participle position ciphering units receive is encrypted, and passes index server after the encryption back; Index server stores the ciphertext index of receiving into corresponding ciphertext index storehouse;
All root vertexes form a tree root table in the ciphertext dynamic descendence tree index structure, and all leaf nodes under the identical tree root form a leaf table; The corresponding leaf table of each tree root list item, the corresponding leaf information set of each leaf list item and a leaf mutation.It creates algorithm shown in algorithm 1.
Algorithm 1: ciphertext dynamic descendence tree index creation algorithm
Input: ciphertext participle
Output: ciphertext index file
Each ciphertext participle of for()
{
The ciphertext participle is inserted index;
If (leaf information set length is greater than average length)
Carrying out the leaf information set divides;
}
Encrypt leaf information set and deposit;
Algorithm finishes.
(3) the leaf information set is encrypted
For guaranteeing the safety of index, the leaf information set is encrypted the positional information of shielding leaf participle in former document.Because the leaf information set length of high frequency words may be long, easily allows the assailant obtain the high frequency words relevant information by adding up ciphertext length, so we have adopted the method for leaf information set division.The method equilibrium leaf information set length, can prevent ciphertext length statistical attack.Several definition are described below:
Leaf information set length: the byte number that the leaf information set comprises;
Leaf information set average length: in the ciphertext dynamic descendence tree index, the mean value of the byte number that whole leaf information sets comprise;
The leaf information set is divided: in ciphertext dynamic descendence tree index, if the length of any one leaf information set is greater than leaf information set average length, then it is divided into a plurality of without the common factor subset, each subset length is in the scope of default, and be respectively random spanning tree leaf mutation of each subset except first leaf information set subset, make every subset all a corresponding leaf mutation or leaf.
The below further describes the ciphering process of leaf information set:
1) to a leaf information set expressly, participle position ciphering unit 330 is to the key management unit request in the encrypting module 300 corresponding participle location key and participle position cryptographic algorithm numbering;
2) according to the ciphertext participle, in the enciphered message table of participle position, obtain cipher generating parameter and participle position cryptographic algorithm numbering, and send key management unit to;
3) participle position ciphering unit 330 obtains corresponding participle position cryptographic algorithm according to participle position cryptographic algorithm numbering from the cryptographic algorithm storehouse, and with the participle location key leaf information set expressly is encrypted.
(4) the index ciphertext is upgraded and is comprised the renewal of participle ciphertext and the renewal of leaf information set ciphertext:
The renewal of participle ciphertext: by system triggers, press the participle grouping and progressively upgrade, return take the ciphertext index storehouse and successfully upgrade the rear replacement Updating time of response as end.Its detailed process is seen the renewal of participle grouping in the participle group technology.
Leaf information set ciphertext renewal: by system triggers, press the participle grouping and progressively upgrade, return take the ciphertext index storehouse and successfully upgrade the rear replacement Updating time of response as end.Key management unit at first receives from enciphered message corresponding to leaf information set that needs in the enciphered message table of participle position to upgrade; Calculate participle packet key, new and old participle location key according to enciphered message; Search the cryptographic algorithm storehouse, obtain participle block encryption algorithm and new and old participle position cryptographic algorithm; The cryptographic calculation device is encrypted participle and obtains the ciphertext participle; In the ciphertext index storehouse, search corresponding leaf information set ciphertext according to the ciphertext participle; Participle position ciphering unit 330 to leaf information set decrypt ciphertext, obtains the leaf information set expressly according to old participle location key and algorithm; With new participle location key and algorithm the leaf information set expressly is encrypted, obtains new leaf information set ciphertext; With the old leaf information set ciphertext in the new leaf information set ciphertext replacement ciphertext index storehouse.
(5) upgrade the ciphertext dynamic descendence tree index updating method that granularity is the document local level
The index upgrade of ciphertext dynamic descendence tree is to be the indexes dynamic update algorithm of document local level with the renewal granularity under the ciphertext state, realizes the high dynamic of ciphertext dynamic descendence tree index, improves the efficient of ciphertext index renewal.The upper renewal of ciphertext dynamic descendence tree granularity is that the indexes dynamic renewal of document local level mainly comprises increase, deletion and retouching operation.
The increase operation is that the forerunner's who adds text leaf relative position is made amendment, and makes it point to the initial character leaf position of adding text, and the relative position value that forerunner's leaf is original writes the trailing character leaf relative position of interpolation text simultaneously.Each insert new positional information after, leaf information set length and the average length of this ciphertext participle compared, if less than average length, then need not further to process corresponding leaf information set, increase to operate and finish; If greater than average length, then need the leaf information set of this participle is carried out the division of leaf information set, according to the subset quantity n that divides, produce at random n-1 leaf mutation simultaneously, the leaf mutation is inserted in the leaf table of corresponding tree root.
When carrying out deletion action, if need to operate comprising plural leaf information set, then first with its deciphering and be merged into a leaf information set.On single leaf information set, to revise the forerunner's of deletion leaf relative position, and delete the positional information that needs deletion, the leaf information set after will deleting again carries out the length equalization processing, encrypts and deposit.
Retouching operation: the mode with text suppression and interpolation realizes, the composition operation that can regard deletion as and increase.
Renewal process is shown in algorithm 2.
Algorithm 2: upgrading granularity under the ciphertext state is the indexes dynamic update algorithm of document local level
Input: document change part
Output: index file after changing
Leaf information set to each participle in the index is decrypted;
The contextual location information of document change part in old document is saved in interim state table;
If(increases) the change part is incorporated in the former index with relative position establishment leaf information again;
else
The if(deletion) directly part is changed in deletion in former index;
Else directly revises the change part in former index;
Leaf information set to each participle in the index is encrypted;
Algorithm finishes.
6, the searching ciphertext module 600
Searching ciphertext module 600 is information retrieval function that appropriate level is provided for the validated user of system, comprises retrieve statement commit unit 610 and retrieval server 620.Retrieval server 620 is used for expanding ciphertext minute word set and is distributed to each index server with broadcast mode, receives and process the result that each index server returns, and the unordered document code collection that then will obtain sends to result for retrieval processing module 700.
The searching ciphertext process as shown in Figure 6.In retrieving, validated user input term/string, and submit to word-dividing mode 200 by retrieve statement commit unit 610, carry out participle and the expansion minute word set that is expanded by the 230 pairs of term/strings in the participle unit 210 in the word-dividing mode 200 and query expansion unit, be encrypted by 320 pairs of expansions of the participle ciphering unit in the encrypting module 300 minute word set again and obtain the ciphertext participle, the ciphertext participle is distributed to each index server in the distributed cryptograph index management module 520 by retrieval server 620 with broadcast mode, retrieved by each index server, at last collect the document code collection that each index server returns by retrieval server 620, go heavy and submit to result for retrieval processing module 700.
The retrieval of dynamic descendence tree index structure is divided into single index in classification, two index in classification, many index in classifications according to the number of participle, with term " qs1, qs2 ..., qsi ..., qsn " for example illustrates, n is the participle number of term/string, its step is as follows:
1) the participle number of judgement term/string: if n=1 then changes 2 over to); If n=2 then changes 3 over to); Otherwise change 4 over to);
2) judge whether the tree root table exists this participle; If exist, then retrieve the hit results collection and be the set of the leaf position of the leaf information set of all leaves in the leaf table of this participle; Retrieval finishes;
3) judge whether the tree root table exists qs1; Whether the leaf table of judging qs1 exists qs2; If qs1, qs2 exist, then retrieve hit results and integrate set as the leaf position of the leaf information set of qs2; Retrieval finishes;
4) judge whether the tree root table exists qs1, if exist, judge whether the leaf table of qs1 exists qs2;
5) if qs1, qs2 exist, then obtain the set of relative position of the leaf information set of qs2, be designated as Urpi;
6) circulate with 3≤j≤n, judge whether the leaf table of qsi exists qsj; If exist, then obtain the set of leaf position of the leaf information set of qsj, be designated as spj; Calculate the common factor of spj and Urpi, namely the leaf position subset of qsj is designated as Uspj; Obtain the subclass of relative position of the leaf information set of qsj, be designated as Urpj; Circulation just obtains PRELIMINARY RESULTS Urpj n-2 time at most;
7) the retrieval hit results integrates the set as the leaf position of all leaf information sets among the Urpj; Retrieval finishes.
Above structure shows, this retrieval can realize that ciphertext substring query and retrieving decipher as required:
(1) the ciphertext substring query is the important leverage of ciphertext full-text search system recall ratio.The native system model utilizes leaf position and leaf relative position to record the position relationship of substring to be matched, do not realize substring query under the state of DecryptDecryption in the index terms ciphertext, not only guaranteed the safety of index database, expense when also having saved the ciphertext substring query simultaneously, has search efficiency high, without undetected characteristics.
(2) based on the retrieval of dynamic descendence tree index structure, because this index structure combines vocabulary method and Single Chinese character, can realize deciphering as required, namely only need the leaf information set ciphertext that retrieves is decrypted, improve security of system and recall precision.
7, the result for retrieval processing module 700
Result for retrieval processing module 700 is responsible for ordering and the demonstration of retrieval sets, comprises result for retrieval sequencing unit 720 and display unit 730 as a result.Result for retrieval sequencing unit 720 is used for unordered document code collection is sorted.Display unit 730 comprises that the document summary shows and document shows as a result, and document summary displaying contents comprises: theme, summary, author, time, source etc.In the result for retrieval processing module 700 shown in Figure 2, from making things convenient for the analytic system workflow and keeping the clean and tidy angle of drawing to consider, also include proper vector ciphertext storehouse 710, but the general knowledge according to those skilled in the art, described proper vector ciphertext storehouse 710 is as a database, do not limit to be positioned in the described result for retrieval processing module 700.
Result for retrieval processing module 700 receives the document code collection that result for retrieval is concentrated, and is extracted file characteristics vector ciphertext and is sent to encrypting module 300 deciphering by result for retrieval sequencing unit 720, then result set is sorted.Extract corresponding document summary ciphertext and deciphering returns to the user according to orderly result set, carry out the document ciphertext according to the document of user selection again and extract, through after encrypting module 300 deciphering entire chapter document being returned to the user.
Result for retrieval ordering and procedure for displaying:
(1) result for retrieval processing module 700 receives unordered result set, reads the proper vector ciphertext of respective document to proper vector ciphertext storehouse 710, and proper vector ciphertext and document code are sent to encrypting module 300;
(2) key management unit in the encrypting module 300 reads the file encryption information table, dynamically generates the file encryption key, and encryption key and file encryption algorithm are passed to file encryption unit 310.The 310 pairs of proper vector ciphertexts in file encryption unit are decrypted, and obtain proper vector expressly;
(3) calculate query string and the proper vector degree of correlation expressly at result for retrieval sequencing unit 720, the document code collection is obtained orderly document code collection by degree of correlation descending sort, and orderly document code collection is sent to document ciphertext distribution proxy module 410;
(4) document ciphertext distribution proxy module 410 sends to corresponding document ciphertext server according to the document standard code with document code;
(5) document ciphertext server obtains the document summary ciphertext of each document from document summary ciphertext storehouse, and sends to encrypting module 300 and be decrypted, and its decryption method is as described in the step (2);
(6) the document summary after the deciphering shows the user by display unit 730 as a result;
(7) behind a certain piece of writing of user selection document, system gets access to the document ciphertext from document ciphertext storehouse, decipher and be shown to the user.
8, system management module 800
System management module 800 is used for the leading subscriber authority, and department, role, user's essential information and the mapping relations between them are carried out maintenance update etc.; This module comprises user message table, role-security table, department information table, user department relation table and user role relation table; Wherein, the role-security table is used for recording role's essential information and role-security, can be comprised of 16 authority bit strings.
After the user login, read user's the relevant informations such as role, department by system management module, obtain user's authority bit string, obtain active user's operating right.In retrieving, the region in system's retrieval user authority, the ciphertext index storehouse of level of confidentiality.
When carrying out user management, user account is carried out authority classification, the authority of different stage account is different, and according to the level of confidentiality classification, the ciphertext of different stage is also different to the requirement of access privilege to ciphertext.For preventing that the low rights user from accessing high authority user's information, stipulate that the read-write rule of system is: look down, upwards write.System can set up separation of the three powers mechanism, namely distinguishes setting system keeper, security of system person, system audit person.The system manager is responsible for operation, maintenance, the password setting of operating system and Database Systems.Security of system person is responsible for the security maintenance of database, the key setting of encrypting database.System audit person is responsible for the quantitative check system journal, retrieves various illegal operations.By separation of the three powers, just can control largely high authority personnel's behavior like this.
(3) safe nerve of a covering and constituted mode thereof
The present invention is directed to the needs of classified information safe retrieval, propose the concept of safe nerve of a covering in conjunction with Computing Technology, and make up on this basis the reciprocity text retrieval system of safety.The equity full-text search utilizes computational resource and result for retrieval buffer memory abundant in the peer-to-peer network to improve the efficient of full-text search, safe nerve of a covering (Secure Overlay Network, SON) be the network of peer node being organized formation according to the dominance relation of safe level between peer node, the peer node that coordinate indexing is only occurred in meet safety requirements is concentrated, guarantee the security of coordinate indexing, improved the efficient of coordinate indexing.
For realizing safe text retrieval system, must in system, implement access control to retrieval of content.The level of security of peer network node is described in unification of the present invention with safe level, the set of safe level is a paritially ordered set<L :≤, for 11,12 ∈ L arbitrarily, if 11≤12, expression 12 dominations 11.By the dominance relation of safe level between peer node peer node is organized and just to have been formed safe nerve of a covering.Fig. 7 has provided the exemplary plot (representing node with circle among the figure) of a safe nerve of a covering, therefrom can find out, the node that level of security is high is directly or indirectly arranged the low node of level of security, and level of security is lower, and governable nodes is just fewer.
In the network of reality, node want the record security level be subjected to oneself the domination neighbor node.For raising the efficiency the neighbor node of all right record security level domination of peer node oneself, the needs of recalling to satisfy safety inspection or operation.
Use the node of UTabp record security level dominate node p, DTabp record security level is subjected to the node of node p domination, and both all are positioned on the node p, are safeguarded by node p oneself.
The algorithm steps of node p adding SON is as follows:
(1) node p broadcasts the message that adds SON in the P2P network;
(2) set of node of safe level domination p is replied u and p is joined among the DTabp of oneself
(3) safe level is subjected to the set of node of p domination to reply d and p is joined among the UTab of oneself;
(4) node p joins the set of node of replying u among the set of node UTabp of domination oneself according to the replying of other node; The set of node of replying d is joined among the set of node DTabp that is subjected to oneself to arrange.
(4) based on the reciprocity full-text search of safe nerve of a covering
Full-text search is the computation-intensive process, need a large amount of computational resources, and the P2P network is rich in the resources such as calculating, storage, information; Based on network topology structure and the application of this reality, when index file was mass data, we were stored in the P2P network with information such as index files.On the basis of centralized retrieval (Centralized Search, CSearch), increase coordinate indexing function (P2P Search, PSearch) based on the reciprocity full-text search (P2P Full text Search) of safe nerve of a covering.Take full advantage of resource in the peer-to-peer network based on the reciprocity full-text search of safe nerve of a covering, to retrieve by safe nerve of a covering and to participate in node and be limited in the little scope that satisfies safety requirements, under the demand for security prerequisite that guarantees the classified information retrieval, reduced system load, reduce network traffics, improved recall precision.
Centralized retrieval is directly initiated to retrieval server by node, and the result of retrieval provides the data of retrieval service by nodal cache as node.Coordinate indexing is initiated in safe nerve of a covering and is carried out by peer node, and the object that can retrieve is exactly the content of each nodal cache.
The full-text search step based on safe nerve of a covering that node p carries out retrieval request q is as follows:
1. node p oneself carries out q;
2. node p sends to node among all DTabp with q;
3. the node oneself that receives q is carried out q;
4. the node that receives q is issued all nodes among its DTabp with q;
5. node p collects and the coordinate indexing result is presented to the user, finishes to carry out if the user is satisfied;
6. node p issues centralized retrieval server with q, and the buffer memory result for retrieval also presents to the user.
The schematic diagram of safe nerve of a covering equity full-text search as shown in Figure 8.Among the figure, represent centralized retrieval server with great circle, represent peer node with small circle among the figure.The p2 node is initiated retrieval: node p2 issues p3 with request, sends to p4 and p5 by p3, if result for retrieval does not meet the demands, then p2 is directly to centralized retrieval server request retrieval.The p1 node is initiated retrieval: node p1 sends to p2 with request, and p2 sends to p3 with request, and then p3 sends to p4 and p5, if result for retrieval meets the demands, does not then initiate retrieval to centralized retrieval server.
(5) machine-processed based on the index file copy replication of hub node
In the characteristics that are rich in the aspect resources such as calculating, storage, when index file was mass data, we were stored in the P2P network with the information such as index file of text retrieval system based on the P2P network.Node in the P2P network is with inundation mode query resource, and node can freely add or withdraw from, network topology structure and application based on this reality, when index file is mass data, we consider by index file copy replication mechanism (the Junction Replication Mechanism based on hub node, JRM) realize the distributed store of index file in the P2P network, to reduce unnecessary data traffic in the network.This mechanism is by the copy of hub node, guaranteed to obtain fast resource from searching of different directions, and do not increase the network bandwidth and take, be more suitable for the reciprocity full-text search when quantity of information is huge, further improved the efficient of full-text search under the peer to peer environment.
Algorithm based on the index file copy replication mechanism of hub node comprises two parts: first finds and the storage hub node, second portion is by the different copy replication mechanism of different threshold triggers being set, wherein the source node buffer memory being made as replenishing of hinge copy replication.Two kinds of different threshold values can be set, and threshold alpha is as triggering hinge copy replication mechanism, and threshold value beta is as trigger source nodal cache mechanism.Because the source node buffer memory just as replenishing that hinge copies, generally, arranges α<β to guarantee the right of priority of JRM mechanism.As the part of complete procedure, the utilization of JRM mechanism is accessed at most recently (Least Recently Used, LRU) and is come the copy in the deletion of node, thereby reduces the storage pressure of node.
Described as follows based on the index file copy replication step under the index file copy replication mechanism of hub node:
A, after the node of save data receives new query requests, the routing iinformation during the copy replication mechanism in this node will be asked is kept in the routing table, by relatively finding hub node;
B, this replicanism are added up the request number of times of each hub node and threshold value are set and retrain;
The request number of times of c, this replicanism record queries source node;
D, when receiving a large amount of different paths to the request of same data and reaching respective threshold, by the hub node replicanism, copy data in the hub node in these paths;
E, on the basis of steps d, if still have some source node that the request of some data is reached another more during high threshold, then data are directly copied at this source node.
Based on the copy replication mechanism of hub node as shown in Figure 9.Wherein, A6 is the node that has source file; The A4 node is hub node, can be used to stored copies.Its process is: the inquiry that A1 and A3 send crosses at the A4 place and is delivered to A6, and then A4 is extracted as hub node by the copy replication mechanism based on hub node, and A4 and A6 are the total routes of inquiry; After this number of certain resource request to A6 exceeds threshold value, then start copy replication mechanism, directly data trnascription is copied to A4.The method has directly reduced the offered load of A4 to the A6 section.If when A1 surpassed a higher threshold value to the request of A4, then system started the source node caching mechanism, copies the index copy to A1 from A4.Reduced like this offered load between A1 to A4.

Claims (4)

1. the safe nerve of a covering construction method of ciphertext full-text search system is characterized in that:
(1) described ciphertext full-text search system includes urtext processing module, word-dividing mode, encrypting module, document ciphertext memory module, ciphertext index module, searching ciphertext module, result for retrieval processing module, system management module;
Described urtext processing module (100) is used for the urtext of document is formatd pre-service, and comprise electronic paper document and/or format electronics original document, and extract its theme, text and adeditive attribute information, and, form the document summary;
Described word-dividing mode (200), be used for document subject matter, text and adeditive attribute that described urtext processing module provides are carried out participle and extracted proper vector, and, carry out participle and query expansion for the term that described searching ciphertext module is provided/string;
Described encrypting module (300), to plain text document, the document summary that comprises that described urtext processing module sends, proper vector, participle that described word-dividing mode sends are encrypted operation, and deposit described proper vector ciphertext in proper vector ciphertext storehouse; The participle positional information that is sent by the ciphertext index module is carried out the enciphering/deciphering operation; To comprising that document ciphertext, document summary ciphertext that described document ciphertext memory module sends are decrypted; The proper vector ciphertext that sends through described result for retrieval processing module is decrypted; And provide the corresponding data through enciphering/deciphering to described document ciphertext memory module, result for retrieval processing module, searching ciphertext module, ciphertext index module;
Described document ciphertext memory module (400), be used for distributed store, document ciphertext and document summary ciphertext be provided: described distribution be according to region, document security level and document classification decide corresponding ciphertext deposit to destination document ciphertext server, each document ciphertext server receives, and document ciphertext and the document summary ciphertext that provides from described encrypting module also is provided; This module is also accepted the ciphertext read requests of described result for retrieval processing module, and the ciphertext that needs deciphering is provided for encrypting module;
Described ciphertext index module (500), be used for distribution ciphertext participle and create, store ciphertext index, provide the ciphertext index that needs deciphering, and the document code that retrieves: described distribution be according to region, document security level and document classification decide corresponding ciphertext index deposit to the target index server; Each index server the line index of going forward side by side of the ciphertext participle that provides from described encrypting module and participle positional information is provided creates; The storage ciphertext index is to corresponding ciphertext index storehouse after encrypting; This module is also according to the ciphertext index in classification request of described searching ciphertext module, from the ciphertext index storehouse, retrieve the index participle positional information ciphertext that needs deciphering and be sent to encrypting module, and, will send to the searching ciphertext module from the document code collection that encrypting module returns;
Described searching ciphertext module (600) provides the information retrieval service of appropriate level for the validated user of system; The term of this module reception validated user input/string is submitted to described word-dividing mode after examination is filtered; The expansion ciphertext minute word set that the reception encrypting module sends also forms the request of ciphertext index in classification, then is sent to described ciphertext index module and retrieves; Receive the document code collection that the ciphertext index module is returned, and submit to described result for retrieval processing module;
Described result for retrieval processing module (700) is used for receiving and processing the document code collection that described searching ciphertext module provides, and the result set that will obtain after will processing through ordering returns to retrieval user; Document code collection according to described searching ciphertext module provides takes out corresponding proper vector ciphertext from proper vector ciphertext storehouse, set is sorted to document code after the encrypting module deciphering; Orderly document code collection is sent to document ciphertext memory module; Reception is through the respective document summary of described encrypting module deciphering and be shown to the user; Document summary according to user selection is expressly extracted the respective document ciphertext, is shown to the user after the encrypting module deciphering, and its extracting mode is expressly identical with extraction document summary;
Described system management module (800) comprises the leading subscriber authority, and department, role, user's essential information and the mapping relations between them are carried out maintenance update;
(2) described ciphertext full-text search system is a reciprocity text retrieval system of the ciphertext based on safe nerve of a covering, includes centralized retrieval server and peer node; Described safe nerve of a covering is the network of the peer node in the peer-to-peer network being organized formation according to its safe level dominance relation; In this safe nerve of a covering, the node that safe level is high is directly or indirectly arranged the low node of safe level, and simultaneously, data are subjected to the node of its domination to safe level by the high node-flow of safe level;
(3) the safe nerve of a covering construction method of described ciphertext full-text search system specifically comprises the employing following steps:
1a, node (p) are broadcasted the message that adds safe nerve of a covering in peer-to-peer network;
The set of node of 1b, safe level dominate node (p) is replied with u, and node (p) is joined in the set of node that is subjected to oneself to arrange;
1c, safe level are replied with d by the set of node of node (p) domination, and node (p) are joined in the set of node of domination oneself;
1d, node (p) are replied according to other node, it is joined the own set of node of domination or the set of node that is subjected to oneself to arrange in.
2. the safe nerve of a covering construction method of ciphertext full-text search system according to claim 1 is characterized in that:
Described urtext processing module (100) includes conversion unit (110), extraction unit (120), summary unit (130); Described conversion unit (110) is used for electronic paper document, and, need electronic document unification to be processed be converted into plain text document; Described extraction unit (120), the document information that is used for plain text document that conversion unit (110) is provided extracts, and the information of extraction comprises theme, text, adeditive attribute; Described summary unit (130), comprise theme, summary, author, time and source that described extraction unit (120) is provided are organized into the document summary;
Described word-dividing mode (200) includes participle unit (210), proper vector unit (220), query expansion unit (230); Described participle unit (210) is used for comprise theme, text, adeditive attribute, the term/string that sends carried out participle; Described proper vector unit (220) extracts the file characteristics word from word segmentation result, form proper vector; Described query expansion unit (230) carries out query expansion to term/string participle;
Described encrypting module (300) includes file encryption unit (310), participle ciphering unit (320), participle position ciphering unit (330); Responsible plain text document, the document summary that urtext pretreatment module (100) is sent in described file encryption unit (310), the proper vector that word-dividing mode (200) sends is encrypted processing; Include file encryption information table, cryptographic calculation device in the file encryption unit (310); Described file encryption information table is used for obtaining encrypts required key and cryptographic algorithm to plain text document, document summary, proper vector; The participle that described participle ciphering unit (320) is responsible for word-dividing mode (200) is sent is encrypted processing; Include participle grouping submodule, participle block encryption information table, cryptographic calculation device in the participle ciphering unit (320); Described participle block encryption information table is used for obtaining encrypts required key and cryptographic algorithm to participle; Described participle position ciphering unit (330) is responsible for the participle positional information that distributed cryptograph index management module (520) sends is carried out enciphering/deciphering; Include participle position enciphered message table, cryptographic calculation device in the participle position ciphering unit (330); Described participle position enciphered message table is used for obtaining encrypts required key and cryptographic algorithm to the participle positional information; Described file encryption unit (310), participle ciphering unit (320), participle position ciphering unit (330) also need to use key management unit, cryptographic algorithm storehouse in ciphering process;
Described document ciphertext memory module (400) includes document ciphertext distribution proxy module (410) and distribution type file ciphertext administration module (420); Described distribution type file ciphertext administration module (420), be used for managing and storage document ciphertext and document summary ciphertext, include document ciphertext server, document summary ciphertext storehouse and document ciphertext storehouse, described document ciphertext server is responsible for affiliated document summary library and document ciphertext storehouse are carried out access; Described document ciphertext distribution proxy module (410) is used for document ciphertext and document summary ciphertext are distributed to corresponding document ciphertext server;
Described ciphertext index module (500) includes ciphertext participle distribution proxy module (510) and distributed cryptograph index distribution management module (520); Described distributed cryptograph index distribution management module (520) includes index server and ciphertext index storehouse, by each ciphertext index storehouse of index service management; Each index server the ciphertext participle and the participle positional information that provide from described encrypting module is provided creates index, and the storage ciphertext index is to corresponding ciphertext index storehouse; Index server includes the document code information table; Described document code information table is used for pseudo-document numbering group corresponding to record document; Described pseudo-document numbering group is a set that the one-to-many mapping forms of document code; Described ciphertext participle distribution proxy module (510) is responsible for the ciphertext participle is distributed on the index server;
Described searching ciphertext module (600) includes retrieve statement commit unit (610) and retrieval server (620); Described retrieval server, be used for to expand ciphertext minute word set is distributed to described ciphertext index module (500) with broadcast mode each index server, reception is also processed the result that each index server returns, and the unordered document code collection that then will obtain sends to result for retrieval processing module (700);
Described result for retrieval processing module (700) includes result for retrieval sequencing unit (720) and display unit (730) as a result; Described result for retrieval sequencing unit (720) is used for unordered document code collection is sorted; Described as a result display unit (730) comprises that the document summary shows and document shows.
3. the text searching method corresponding with the safe nerve of a covering construction method of claim 1 or 2 described ciphertext full-text search systems is characterized in that:
(1) should be based on the full-text search of safe nerve of a covering, include centralized retrieval and based on coordinate indexing two parts of safe nerve of a covering:
Described centralized retrieval, the server cluster that is comprised of a station server or multiple servers provides search function, and its retrieval is directly initiated to retrieval server by node; The result of retrieval provides the data of retrieval service by nodal cache as node;
Described coordinate indexing based on safe nerve of a covering is initiated in safe nerve of a covering and is carried out by peer node, its retrieval to as if the content of each nodal cache;
(2) should based on the full-text search of safe nerve of a covering, specifically comprise the steps:
Hereinafter, q is retrieval request, p nFor carrying out the node of retrieval request q;
3a, node p nOneself carries out q;
3b, node p nQ is sent to the node that all safe level are arranged by oneself;
3c, each node oneself that receives q are carried out q;
3d, receive each node of q, q is continued to issue all nodes that safe level is subjected to oneself domination;
3e, node p nCollect and the coordinate indexing result is presented to the user, finish to carry out if the user is satisfied;
3f, node p nQ is issued centralized retrieval server, and the buffer memory result for retrieval also presents to the user.
4. text searching method according to claim 3 is characterized in that, described text searching method based on safe nerve of a covering also includes the index file copy replication mechanism based on hub node;
Described index file copy replication mechanism based on hub node includes following at least two parts:
First finds and the storage hub node;
Second portion is by the copy replication of different threshold triggers source nodes or hub node being set, wherein the source node buffer memory being made as replenishing of hinge copy replication;
Described as follows based on the index file copy replication step under the index file copy replication mechanism of hub node:
4a, after the node of save data receives new query requests, the routing iinformation during the copy replication mechanism in this node will be asked is kept in the routing table, by relatively finding hub node;
4b, this replicanism are added up the request number of times of each hub node and threshold value are set and retrain;
The request number of times of 4c, this replicanism record queries source node;
4d, when receiving a large amount of different paths to the request of same data and reaching respective threshold, by the hub node replicanism, copy data in the hub node in these paths;
E, on the basis of step 4d, if still have some source node that the request of some data is reached another more during high threshold, then data are directly copied at this source node.
CN201210284284.2A 2010-05-31 2010-05-31 Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method Active CN102855292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210284284.2A CN102855292B (en) 2010-05-31 2010-05-31 Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210284284.2A CN102855292B (en) 2010-05-31 2010-05-31 Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN 201010187384 Division CN101859323B (en) 2010-05-31 2010-05-31 Ciphertext full-text search system

Publications (2)

Publication Number Publication Date
CN102855292A true CN102855292A (en) 2013-01-02
CN102855292B CN102855292B (en) 2015-04-08

Family

ID=47401880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210284284.2A Active CN102855292B (en) 2010-05-31 2010-05-31 Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method

Country Status (1)

Country Link
CN (1) CN102855292B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995900A (en) * 2014-06-10 2014-08-20 福建师范大学 Ciphertext cloud data inquiring method
CN109033283A (en) * 2018-07-12 2018-12-18 广州市闲愉凡生信息科技有限公司 A kind of distributed search methods of cloud computing platform
CN109871426A (en) * 2018-12-18 2019-06-11 国网浙江桐乡市供电有限公司 A kind of monitoring recognition methods of confidential data
CN111143870A (en) * 2019-12-30 2020-05-12 兴唐通信科技有限公司 Distributed encryption storage device, system and encryption and decryption method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932816A (en) * 2006-09-30 2007-03-21 华中科技大学 Full text search system based on ciphertext
CN101520800A (en) * 2009-03-27 2009-09-02 华中科技大学 Cryptogram-based safe full-text indexing and retrieval system
US7716211B2 (en) * 2004-02-10 2010-05-11 Microsoft Corporation System and method for facilitating full text searching utilizing inverted keyword indices

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7716211B2 (en) * 2004-02-10 2010-05-11 Microsoft Corporation System and method for facilitating full text searching utilizing inverted keyword indices
CN1932816A (en) * 2006-09-30 2007-03-21 华中科技大学 Full text search system based on ciphertext
CN101520800A (en) * 2009-03-27 2009-09-02 华中科技大学 Cryptogram-based safe full-text indexing and retrieval system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995900A (en) * 2014-06-10 2014-08-20 福建师范大学 Ciphertext cloud data inquiring method
CN109033283A (en) * 2018-07-12 2018-12-18 广州市闲愉凡生信息科技有限公司 A kind of distributed search methods of cloud computing platform
CN109871426A (en) * 2018-12-18 2019-06-11 国网浙江桐乡市供电有限公司 A kind of monitoring recognition methods of confidential data
CN109871426B (en) * 2018-12-18 2021-08-10 国网浙江桐乡市供电有限公司 Method for monitoring and identifying confidential data
CN111143870A (en) * 2019-12-30 2020-05-12 兴唐通信科技有限公司 Distributed encryption storage device, system and encryption and decryption method
CN111143870B (en) * 2019-12-30 2022-05-13 兴唐通信科技有限公司 Distributed encryption storage device, system and encryption and decryption method

Also Published As

Publication number Publication date
CN102855292B (en) 2015-04-08

Similar Documents

Publication Publication Date Title
CN101859323B (en) Ciphertext full-text search system
Fu et al. Secure data storage and searching for industrial IoT by integrating fog computing and cloud computing
CN108712366B (en) Searchable encryption method and system supporting word form and word meaning fuzzy retrieval in cloud environment
Chuah et al. Privacy-aware bedtree based solution for fuzzy multi-keyword search over encrypted data
CN101561815B (en) Distributed cryptograph full-text retrieval system
CN100424704C (en) Full text search system based on ciphertext
Song et al. Discovering high utility itemsets based on the artificial bee colony algorithm
CN109885650B (en) Outsourcing cloud environment privacy protection ciphertext sorting retrieval method
Elkana Ebinazer et al. ESKEA: enhanced symmetric key encryption algorithm based secure data storage in cloud networks with data deduplication
CN109213731B (en) Multi-keyword ciphertext retrieval method based on iterative encryption in cloud environment
CN102855292B (en) Safety overlay network constructing method of ciphertext full text search system and corresponding full text search method
Mittal et al. Privacy preserving synonym based fuzzy multi-keyword ranked search over encrypted cloud data
Krishna et al. Dynamic cluster based privacy-preserving multi-keyword search over encrypted cloud data
Hua et al. An enhanced wildcard-based fuzzy searching scheme in encrypted databases
Fu et al. Confusing-keyword based secure search over encrypted cloud data
CN113626836A (en) Symmetric searchable encryption method and system based on LSM
CN110324402B (en) Trusted cloud storage service platform based on trusted user front end and working method
CN102629274B (en) Index update method for ciphertext full-text searching system based on dynamic succeed tree index structure
Zhou et al. SAPMS: a semantic-aware privacy-preserving multi-keyword search scheme in cloud
Gampala et al. An efficient Multi-Keyword Synonym Ranked Query over Encrypted Cloud Data using BMS Tree
Wang et al. Block-Based Privacy-Preserving Healthcare Data Ranked Retrieval in Encrypted Cloud File Systems
Mbinkeu et al. Reducing disk storage with sqlite into bitcoin architecture
Cao et al. Analysis of One Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data.
CN110569327A (en) multi-keyword ciphertext retrieval method supporting dynamic updating
Gao et al. Mimir: a term-distributed retrieval system for secret documents

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant