Data can search for encryption and keyword search methodology, system and terminal, equipment
Technical field
The invention belongs to can search for encryption technology field, in particular to a kind of data can search for encryption and keyword search side
Method, system and terminal, equipment.
Background technique
With the development of Internet application, more and more users are often through the input search key in searched page
Word, and search operation is triggered to realize search.Specifically, searched page gets the search key of input and the search of triggering
After operation, corresponding associational word will be set out according to the search key of input, user click a certain associational word will obtain with
It is corresponding in detail can to browse the search result by the search result checked desired by click for the relevant search result of the associational word
Thin information.Since the multi-pass operation that user is inputted that needs to rely on of above-mentioned search process can obtain the desired information of user,
Have the shortcomings that search efficiency is low.
In previous SSE research, from the keyword that keyword extracts in data file.The content sum number of keyword
Amount is limited by keyword extraction algorithm, and is fixed and invariable.It is modified when operating to keyword set, needs to resubmit
Or the keyword set that building is new, this process increase the time of data set building, while are also more troublesome operation.It is existing
Keyword search methodology process is probably as follows:
Step 1, ciphering process: user encrypts clear text file in local using key, and is uploaded to service
Device.
Step 2, trapdoor generating process: having the user of retrieval capability, generates falling into for keyword to be checked using key
Door, it is desirable that trapdoor cannot reveal any information of keyword.
Step 3, retrieving: server is input with keyword trapdoor, executes searching algorithm, is returned all comprising being somebody's turn to do
Trapdoor corresponds to the cryptograph files of keyword, it is desirable that server is in addition to can know that whether cryptograph files include some particular keywords
Outside, more information can not be obtained.
Step 4, decrypting process: the cryptograph files that user is returned using key decryption server obtain query result.
In addition, when David Cash et al. is proposed for disparate databases size, especially larger data library, Yi Zhongan
Complete efficient data processing scheme can search for the server for encrypting with tens billion of recording keys pair to efficient and secret
Database.Their basic theories construction supports single keyword search, and provides the server index size of asymptotic optimization, completely
Parallel search and minimum leakage.
In these above-mentioned methods, since keyword is obtained by keyword extraction algorithm, so key words content and quantity by
The limitation of keyword extraction algorithm, is fixed and invariable.Therefore, can searching keyword it is limited, and modify to keyword set
When operation, need to resubmit or construct new keyword set, this process increases the time of data set building, while also making
Operation is more troublesome.
Summary of the invention
The first object of the present invention is the shortcomings that overcoming the prior art and insufficient, and providing a kind of data can search for encryption side
Method, this method encrypts respectively for the data file and its abstract of data owner's upload, while it is corresponding to generate keyword
Dictionary, this method enables to make keyword search not only by dictionary, can also be scanned for, can be made by abstract
The search efficiency for obtaining keyword is higher.
The second object of the present invention, which is to provide a kind of data, can search for encryption system.
The third object of the present invention is to provide a kind of terminal.
The fourth object of the present invention is to provide a kind of keyword that can search for encryption method realization based on above-mentioned data and searches
Suo Fangfa, this method can automatically update keyword in search process, substantially increase the search efficiency of keyword.
The fifth object of the present invention is to provide a kind of calculating equipment.
The first object of the present invention is achieved through the following technical solutions: a kind of data can search for encryption method, and step is such as
Under:
Obtain the data file that data owner uploads;
Extract the keyword of each data file;
Abstract extraction is carried out to each data file, obtains Summary file;
According to the corresponding relationship between each keyword and each data file, generated after carrying out data processing by Encryption Algorithm
Dictionary γ, wherein stores that each keyword corresponds to the label of each data file and each keyword corresponds to each data in dictionary γ
The index information of file, wherein being directed to each keyword, the keyword corresponds to the label of each data file in dictionary γ
The index information of each data file is corresponded to the keyword as pairs of relationship one by one;
It is encrypted to obtain data file encryption for each data file;
It is encrypted to obtain encrypted Summary file for each Summary file.
Preferably, it according to the corresponding relationship between each keyword and each data file, is carried out at data by Encryption Algorithm
Dictionary γ is generated after reason, and detailed process is as follows:
S11, an empty table L is established, chooses master key K for table L;
S12, it is directed to each keyword, gets each data file including the keyword, and pass through K pairs of master key
The keyword generates a pair of of sub-key K1, K2:
K1← F (K, 1 | | ω);
K2← F (K, 2 | | ω);
Wherein ω is keyword;
It is directed to each keyword, each data file for including the keyword is numbered, obtains corresponding to each data text
The corresponding document number of part, and each document number is ranked up, obtain the serial number of each document number;
It is directed to each keyword, using key K1Label is sequentially generated for the corresponding each document number of the keyword, together
Shi Caiyong key K2It is successively encrypted for the corresponding each document number of the keyword, using encrypted result as the keyword
The index information for corresponding to data file obtains tab indexes to (Li, di):
Li←F(K1, i);
di←Enc(K2, idi);
I=0,1 ..., N-1;
Wherein LiTo use key K1For corresponding i-th of document number id of keyword ωiThe label of generation;diTo use
Key K2Corresponding i-th of document number id of cryptography key word ωiAfter obtain as a result, corresponding to text for the result as the keyword
Piece number is idiData file index, N be include keyword ω data file sum;
S13, it is directed to each keyword, often obtains a tab indexes to (Li, di), by its according to dictionary γ sequence according to
In secondary insertion table L;And to each tab indexes to (Li, di) addition timestamp timei, included (Li, di, timei)
Table L, create to obtain dictionary γ by table L;Wherein timeiInitial time be that encryption completes each tab indexes to (Li, di)
Middle index information diTime.
Preferably, the process for getting each Summary file is specific as follows:
Summary file is extracted from data file by documentation summary extraction algorithm first;Then with corresponding text of making a summary
Piece number is index, abstract is stored in corresponding position, and carry out Character Filling to rest position, forms a Summary file;
Substring search encryption is carried out to each Summary file using Burrows-Wheeler transfer algorithm and FM index technology,
Detailed process is as follows:
It is directed to each character different in Summary file, chained list is respectively created;It is directed to each character chained list, wherein each
It is<nptr that node, which stores tuple, addr>, nptr be the pointer for being directed toward the character chained list next node, and addr is literary in abstract
Position of the character of a certain position in FM index in part, wherein different nodes store the addr in tuples in the character chained list
Position of the character of different location in FM index respectively in Summary file;
It is directed to each character different in Summary file, by first node, that is, linked list head storage member of each character chained list
Group is encrypted, and obtains:
Wherein < nptr1, addr1> it is first node, that is, linked list head storage tuple of each character chained list, cmFor Summary file
M-th of character in different characters, Y are the total number of character different in Summary file;K ' is assistant key, FK′(cm) indicate logical
Assistant key K ' is crossed for character cmIt is encrypted;
It is directed to each different character in Summary file, is encrypted first, by variant character encryption
Data afterwards obtain chained list indexed set as chain table index, and respectively map the corresponding each chain table index of variant character
To the linked list head of each kinds of characters chained list, the mapping relations between the chain table index of each kinds of characters and linked list head are obtained;
Wherein in Summary file after each different character encryption are as follows:
K is master key, FK(cm) indicate to be directed to character c by master key KmIt is encrypted;FK′(cm) indicate to pass through assistant key
K ' is directed to character cmIt is encrypted.
The second object of the present invention is achieved through the following technical solutions: a kind of data can search for encryption system, comprising:
Data file obtaining unit, for obtaining the data file of data owner's upload;
Keyword extracting unit, for extracting the keyword of each data file,
Abstract extraction unit obtains Summary file for carrying out abstract extraction to each data file;
Dictionary creation unit is carried out according to the corresponding relationship between each keyword and each data file by Encryption Algorithm
After data processing generate dictionary γ, wherein stored in dictionary γ each keyword correspond to each data file label and each key
Word corresponds to the index information of each data file, wherein being directed to each keyword, which corresponds to each data file
Label and the keyword correspond to the index information of each data file as pairs of relationship one by one;
Data file encryption unit, for being encrypted to obtain data file encryption for each data file;
Summary file encryption unit carries out substring search encryption for each Summary file and obtains encrypted Summary file.
The third object of the present invention is achieved through the following technical solutions: a kind of terminal, including processor and for storing
The memory of processor executable program realizes an object of the present disclosure when the processor executes the program of memory storage
The data can search for encryption method.
The fourth object of the present invention is achieved through the following technical solutions: a kind of keyword search methodology, steps are as follows:
Step X1, get first data described in an object of the present disclosure can search for dictionary γ that encryption method obtains,
Encrypted data file and encrypted Summary file;
In each keyword that the needs for receiving user's sending scan for, determined whether first by searching for dictionary γ
Having includes the keyword in encrypted data file;If so, being returned using the encrypted data file of correspondence as query result
It is decrypted back to user;
If it is not, then entering step X2;
Step X2, Summary file after encryption, which is concentrated, carries out substring search for each keyword for needing to scan for;
If by substring search for after, the keyword is inquired in Summary file, then by above-mentioned Summary file it is corresponding plus
Data file after close returns to user as query result;In the case where user is confirmed as correct situation, it is determined that above-mentioned conduct
The corresponding data file of query result includes the keyword, calculates label and correspondence that the keyword corresponds to above-mentioned data file
The index information of data file, and be added in dictionary γ, realize the update to dictionary γ;
If failing to search in Summary file after searching for by substring, then the result of search failure is returned to user.
Preferably, in the step X1, it is directed to each keyword for needing to scan for, is by searching for dictionary γ determination
It is no have in encrypted data file include the keyword detailed process is as follows:
Step X11, it is directed to each keyword that user needs to scan for, the master key K issued according to user is to this
Keyword generates a pair of of sub-key K '1, K '2:
K′1← F (K, 1 | | ω ');
K′2← F (K, 2 | | ω ');
Wherein ω ' is the keyword that user needs to scan for;
Step X12, it is directed to and needs each keyword for scanning for, the corresponding document number serial number of ergodic data file,
Pass through sub-key K '1Generate the label that the keyword corresponds to the data file of each document number:
Li′←F(K′1, i ');I '=0,1,2 ... I;
Wherein i ' is the corresponding document number serial number of data file of traversal, and I is the data file respective file serial number of traversal
Maximum value;Li′The label of the data file of document number serial number i ' is corresponded to for keyword ω ';
Step X13, it is directed to each keyword for needing to scan for, search whether there is above-mentioned steps in dictionary γ
The keyword that X12 is generated corresponds to the label L of the data file of each document numberi′;
If it is not, then entering step X2;
If so, the index information pairs of with the label is got in dictionary γ, it is then close by the son of the keyword
Key K '2Index information is decrypted, corresponding encrypted data file is got by the index information after decryption, to tie as inquiry
Fruit returns to user and is decrypted;Meanwhile by the update of time stamp of the tab indexes pair stored in dictionary γ be the keyword
Sub-key K '2Decrypt the time that index information is completed;
Wherein, the index information pairs of with the label is got in dictionary γ are as follows:
di′← Get (γ, Li′);
Wherein di′To be got and label L in dictionary Γi′Pairs of index information;
Wherein, pass through the sub-key K ' of the keyword2Index information after being decrypted are as follows:
di←Dec(K′2, di′);
Wherein diFor di′Pass through the sub-key K ' of keyword ω '2Index information after decryption, wherein the index letter after decryption
Cease diThe as document number of the data file including keyword ω '.
Preferably, in the step X2, Summary file after encryption, which is concentrated, uses Burrows-Wheeler transfer algorithm
Substring search is carried out with FM index technology, detailed process is as follows:
Step X21, it is directed to the keyword ω ' for needing to scan for, generates keyword query token tkT, S:
tkT, S=F (K, ω ' [1 ... M])=F (K, ω ' [1]), F (K, ω ' [2]) ... F (K, ω ' [M]), F (K ', ω '
[M]);
Wherein, [1] ω ', ω ' [2] ..., ω ' [M] are each character for needing the keyword ω ' scanned for, and M is key
The character sum of word ω ';K ' is assistant key, and K '=F (K, 2), K is master key;
Step X22, it is directed to each character ω ' [m] of keyword ω ' for needing to scan for, m=1,2,3 ... M, first
It is encrypted to obtain:Then ciphertext is found from chained list indexed set isChain table index, between the chain table index and linked list head by each character ω ' [m]
Mapping relations get the chained list of each character ω ' [m];
Step X23, it is directed to the last character ω ' [M] for needing the keyword ω ' scanned for, by character ω '
Each node in the chained list of [M] is mapped to encryption FM tuple:
WhereinThe data that the corresponding F in FM is arranged;
WhereinCorrespond to the data that the L in FM is arranged;
Wherein E (posj) correspond to the data for being in the SA column jth row of FM, posjIndicate that the data of SA column jth row are corresponding
Position ciphertext of the character in Summary file, n are total line number of FM;
Wherein,For the corresponding character of data in FM F column jth row,For the number in FM F column jth row
According to corresponding characterPosition number;For the corresponding character of data in FM L column jth row,For in FM L
The corresponding character of data of column jth rowPosition number;
It is directed to each encryption FM tuple that each byte in the chained list of ω ' [M] is be mapped to:
F is used firstK(ω ' [m]) is directed to the data that the F in FM is arrangedCarry out exclusive or
Operation is realized decryption, is obtained
Then it usesElement as L of the key decryption in the FM data first part arrangedIt obtainsIt willWith the element of the data Part arranged of the L in FMXor operation is carried out, obtains xor operation as a result, subsequently into step X24;
Step X24, each xor operation that previous step obtains is directed to as a result, search data are to be somebody's turn to do in the F column of FM
Then the row of xor operation result gets the FM tuple of the row, search out the chained list that node is mapped to the FM tuple, to obtain
Get the corresponding character c of the chained listx, the character that is arrived as current search;Wherein x is currently to search for data in the F column of FM
Number;Enter step X25;
Step X25, each character c got in judgment step X24xIn whether have the identical word with character ω ' [M-x]
Symbol;
If so, whether the number x that data are currently searched in judgement in the F column of FM is equal to M-1;If so, terminating substring
Search, and substring is searched for successfully, includes the keyword ω ' for needing to scan in corresponding Summary file;If it is not, then entering step
Rapid X26;
If it is not, then terminate substring search, and return substring search failure as a result, corresponding in Summary file and not including
Keyword ω ';
Step X26, it is directed to step X24 the is got and identical character c of character ω ' [M-x]x, obtain step X24 and obtain
Take character cxWhen accessed each FM tuple, and each FM tuple is performed the following operation:
F is used firstK(cx) in FM F arrange data beXor operation is carried out,
It realizes decryption, obtains
Then it usesElement as L of the key decryption in the FM data first part arrangedIt obtainsIt willWith the element of the data Part arranged of the L in FMXor operation is carried out, xor operation result is obtained;Subsequently into step X24.
Preferably, dictionary γ is set as to the dictionary of regular length, in the step X2, realizes to the updated of dictionary γ
Journey is as follows:
When the index information of the label and corresponding data file that need to add new keyword corresponding data file is to word
When allusion quotation γ, i.e., when needing to add the tab indexes of new keyword to dictionary γ, if the current stored full scale label of dictionary
Index pair, then by the tab indexes of new keyword to the smallest tab indexes pair of timestamp in dictionary γ are replaced, when new
When the tab indexes of keyword are to being multiple, then the smallest multiple tab indexes pair of timestamp in dictionary γ are replaced.
The fifth object of the present invention is achieved through the following technical solutions: a kind of calculating equipment, including processor and is used for
The memory of storage processor executable program realizes the present invention the 4th when the processor executes the program of memory storage
Purpose keyword search methodology.
The present invention has the following advantages and effects with respect to the prior art:
(1) data of the present invention can search in encryption method, first the data file of acquisition data owner upload;It extracts
The keyword of each data file;And abstract extraction is carried out to each data file and obtains Summary file;According to each keyword word and
Corresponding relationship between each data file generates dictionary γ, is encrypted to obtain data file encryption for each data file;Together
When for each Summary file carry out substring search encryption obtain encrypted Summary file;Dictionary γ that the above method is got,
When data file encryption and cryptographic digest file are uploaded for search, enables to keyword search not only by dictionary, may be used also
To scan for by Summary file, the search efficiency of keyword can be made higher.
(2) in keyword search methodology of the present invention, getting above-mentioned data first can search for the dictionary that encryption method obtains
γ, encrypted data file and encrypted Summary file;In each pass that the needs for receiving user's sending scan for
Keyword determines whether to include the keyword in encrypted data file by searching for dictionary γ first;If so, will correspond to
Encrypted data file returns to user as query result and is decrypted;If it is not, abstract then further after encryption is literary
Part, which is concentrated, carries out substring search for each keyword for needing to scan for;If being looked into Summary file after being searched for by substring
The keyword is ask, then returns to user using the corresponding encrypted data file of above-mentioned Summary file as query result, and
And calculate the keyword and correspond to the label of above-mentioned data file and the index information of corresponding data file, it is added to dictionary γ
In.It can be seen from the above, in the present invention, when by dictionary γ search less than corresponding keyword, then it represents that the keyword corresponding label
Not in dictionary, i.e., keyword is not in the keyword set for initially generating dictionary, and in this case, the present invention is searched by substring
The mode of rope search key in Summary file, when searching the keyword in Summary file, by the corresponding mark of keyword
Label and data file index information are added in dictionary γ, then can search correspondence by dictionary γ when search next time
Keyword;This combination substring way of search of the present invention, the more method of new keywords dictionary in search process can to close
The content of keyword dictionary is more accurate and flexible, is not limited by keyword extraction algorithm, substantially increases the search of keyword
Efficiency.
(3) in keyword search methodology of the present invention, dictionary γ is set as to the dictionary γ of length, when needing to add new pass
When the index information of the label of keyword corresponding data file and corresponding data file is to dictionary Γ, i.e., when needing to add new pass
When the tab indexes of keyword are to dictionary γ, if the current stored full scale label index pair of dictionary, by the mark of new keyword
Label index is to replacing the smallest tab indexes pair of timestamp in dictionary γ, when the tab indexes of new keyword are to being multiple
When, then replace the smallest multiple tab indexes pair of timestamp in dictionary γ.The above-mentioned renewable keyword dictionary of the present invention uses
The feedback mechanism of similar fast table can cause dictionary constantly to widen one's influence memory, in fixation to avoid incorrect inquiry record
Dictionary size in, the small keyword of frequency of use can be covered and be updated by dictionary.
Detailed description of the invention
Fig. 1 is that data of the present invention can search for encryption method flow chart.
Fig. 2 is keyword search methodology flow chart of the present invention.
Fig. 3 is chain table index, chained list and FM mapping graph of the present invention.
Fig. 4 is that data of the present invention can search for encryption and keyword search methodology the general frame.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
Embodiment 1
Present embodiment discloses a kind of data can search for encryption method, as shown in Figure 1, steps are as follows:
Step S1, the data file that data owner uploads is obtained;
Step S2, the keyword of each data file is extracted;Abstract extraction is carried out to each data file simultaneously, is plucked
Want file;Wherein, in the present embodiment, the process for getting each Summary file is specific as follows:
Summary file is extracted from data file by documentation summary extraction algorithm first;Then with corresponding text of making a summary
Piece number is index, abstract is stored in corresponding position, and carry out Character Filling to rest position, forms a Summary file.
Step S3, it according to the corresponding relationship between each keyword and each data file, is carried out at data by Encryption Algorithm
Dictionary γ is generated after reason, and label and each keyword correspondence that each keyword corresponds to each data file are wherein stored in dictionary γ
To the index information of each data file, wherein being directed to each keyword, the keyword corresponds to each data text in dictionary γ
The label of part and the keyword correspond to the index information of each data file as pairs of relationship one by one;It is above-mentioned according to each keyword with
Corresponding relationship between each data file, detailed process is as follows by generation dictionary γ after carrying out data processing by Encryption Algorithm:
S31, an empty table L is established, chooses master key K for table L;
S32, it is directed to each keyword, gets each data file including the keyword, and pass through K pairs of master key
The keyword generates a pair of of sub-key K1, K2:
K1← F (K, 1 | | ω);
K2← F (K, 2 | | ω);
Wherein ω is keyword;
Step S33, first against in each keyword, each data file for including the keyword is numbered, is obtained
The corresponding document number of each data file is corresponded to, and each document number is ranked up, obtains the serial number of each document number;Then it is directed to
In each keyword, using key K1Label is sequentially generated for the corresponding each document number of the keyword, while using key K2Needle
Each document number corresponding to the keyword successively encrypts, and corresponds to data file for encrypted result as the keyword
Index information, obtain tab indexes to (Li, di):
Li←F(K1, i);
di←Enc(K2, idi);
I=0,1 ..., N-1;
Wherein LiTo use key K1For corresponding i-th of document number id of keyword ωiThe label of generation;diTo use
Key K2Corresponding i-th of document number id of cryptography key word ωiAfter obtain as a result, corresponding to text for the result as the keyword
Piece number is idiData file index, N be include keyword ω data file sum;
S34, it is directed to each keyword, often obtains a tab indexes to (Li, di), by its according to dictionary γ sequence according to
In secondary insertion table L;And to each tab indexes to (Li, di) addition timestamp timei, included (Li, di, timei)
Table L, create to obtain dictionary γ by table L;Wherein timeiInitial time be that encryption completes each tab indexes to (Li, di)
Middle index information diTime;
Step S4, it is encrypted to obtain data file encryption for each data file;It is encrypted for each Summary file
After obtain encrypted Summary file.
In this step of the present embodiment, using Burrows-Wheeler transfer algorithm and FM index technology to each abstract text
Part carries out substring search encryption, wherein conversion Burrows-Wheeler conversion (BWT) algorithm converts number by the entropy of each character
According to stream.In brief, data flow S is converted into coding W, so that compression algorithm provides high compression ratio, the step of conversion substantially
It is as follows: firstly, the sequence by changing mark $ establishes matrix after termination flag $ is attached to the character string S of input by algorithm
W.The sequence changed in each iteration is attached in matrix W as new row.Finally, each row of W is arranged by lexicographic order ascending order.
Wherein, obtained data are mapped in FM by LF mapping techniques after the conversion of BWT algorithm, and LF mapping techniques take the of BWT conversion
One column F and last column L, rebuilds original character string S by the iterative process of algorithm.From first element of F and L each column, L makees
For the index of F column.The element of each L column is added in a last-in first-out stack.The value of L broomrape front position will be used as down
The index of F column in one circulation.Wherein, in first time iteration, pointer is directed toward first position of F, L simultaneously.Thus it finds last
One position F [7]=s.The character of current L (s) is added in stack D.When next iteration, the character of current L is exactly next
A F column index.Character i stacking.When being $ at L, EP (end of program).Algorithm pops up element all in stack D, obtains original character
String S.In FM index technology, FM is made of three columns groups.First is the F column in LF mapping, and second is L column, this correspondence
BWT (S).The last one is Suffix array clustering SA.SA includes each column i, position of the substring in original character string S, to pass through
The ith row of the W matrix obtained after BWT conversion.
Detailed process is as follows for the present embodiment substring search encryption:
Step S41, it is directed to each character different in Summary file, chained list is respectively created;It is directed to each character chained list,
Wherein each node storage tuple is<nptr, addr>, nptr be the pointer for being directed toward the character chained list next node, and addr is to locate
Position of the character of a certain position in FM index in Summary file, wherein different nodes store tuple in the character chained list
In addr be respectively position of the character of different location in Summary file in FM index;Such as certain Summary file has 10
There are character T in position, then in the chained list of the character T established, from the 1st node to the storage tuple of the 10th node
In addr be respectively position of the character T of 10 positions of Summary file in FM index.
It is directed to each character different in Summary file, by first node, that is, linked list head storage member of each character chained list
Group is encrypted, and obtains:
Wherein < nptr1, addr1> it is first node, that is, linked list head storage tuple of each character chained list, cmFor Summary file
M-th of character in different characters, Y are the total number of character different in Summary file;K ' is assistant key, FK′(cm) indicate logical
Assistant key K ' is crossed for character cmIt is encrypted.
Step S42, it is directed to each different character in Summary file, is encrypted first, by variant character
Data after encryption obtain chained list indexed set as chain table index, and respectively by the corresponding each chained list of variant character
Index is mapped to the linked list head of each kinds of characters chained list, obtains reflecting between the chain table index of each kinds of characters and linked list head
Penetrate relationship;Wherein in Summary file after each different character encryption are as follows:
K is master key, FK(cm) indicate to be directed to character c by master key KmIt is encrypted;FK′(cm) indicate to pass through assistant key
K ' is directed to character cmIt is encrypted.
As shown in figure 3, for chain table index LLSET, chained list LL and FM mapping graph that the present embodiment above-mentioned steps are got, often
A chain table index is respectively mapped to the linked list head of a chained list, i.e., arrives corresponding chained list by the way that chain table index is available;Each chain
Each node in table respectively corresponds each group of FM tuple for being mapped to FM table, i.e. every a line in FM table, passes through chained list and FM
Between mapping relations, corresponding chained list can be got by FM tuple.In Fig. 3Indicate character cmIn corresponding chained list
The tuple of i-th of byte storage, i=1,2,3 ....
The present embodiment, which also discloses a kind of data, can search for encryption system, comprising:
Data file obtaining unit, for obtaining the data file of data owner's upload;
Keyword extracting unit, for extracting the keyword of each data file,
Abstract extraction unit obtains Summary file for carrying out abstract extraction to each data file;
Dictionary creation unit is carried out according to the corresponding relationship between each keyword and each data file by Encryption Algorithm
After data processing generate dictionary γ, wherein stored in dictionary γ each keyword correspond to each data file label and each key
Word corresponds to the index information of each data file, wherein being directed to each keyword, which corresponds to each data file
Label and the keyword correspond to the index information of each data file as pairs of relationship one by one;
Data file encryption unit, for being encrypted to obtain data file encryption for each data file;
Summary file encryption unit carries out substring search encryption for each Summary file and obtains encrypted Summary file.
The present embodiment also discloses a kind of terminal, including processor and for the storage of storage processor executable program
Device when processor executes the program of memory storage, realizes that the above-mentioned data of the present embodiment can search for encryption method.In the present embodiment
In, as shown in figure 4, terminal can be a computer, which, which possesses as client for data, uploads data file
To wherein, then executing the above-mentioned data of the present embodiment by computer can search for encryption method, get dictionary γ, encrypted number
According to file and encrypted Summary file, and computer can be by the above-mentioned dictionary γ got, encrypted data text
Part and encrypted Summary file are uploaded onto the server, so that authorized user (user for possessing the master key that system issues) is logical
It crosses keyword and searches corresponding data file.
Embodiment 2
Present embodiment discloses a kind of keyword search methodologies, as shown in Fig. 2, steps are as follows:
Step X1, it obtains first and can search for dictionary γ, the encrypted data text that encryption method obtains by embodiment data
Part and encrypted Summary file;
In each keyword that the needs for receiving user's sending scan for, determined whether first by searching for dictionary γ
Having includes the keyword in encrypted data file;If so, being returned using the encrypted data file of correspondence as query result
It is decrypted back to user;If it is not, then entering step X2;
Wherein in the present embodiment, it is directed to each keyword for needing to scan for, is determined whether by searching for dictionary γ
Have in encrypted data file includes that detailed process is as follows for the keyword:
Step X11, it is directed to each keyword that user needs to scan for, the master key K issued according to user is to this
Keyword generates a pair of of sub-key K '1, K '2:
K′1← F (K, 1 | | ω ');
K′2← F (K, 2 | | ω ');
Wherein ω ' is the keyword that user needs to scan for;Wherein the function F () in the present embodiment indicates hash letter
Number.Each keyword that master key K and needs scan for is by user while sending;
Step X12, it is directed to and needs each keyword for scanning for, the corresponding document number serial number of ergodic data file,
Pass through sub-key K '1Generate the label that the keyword corresponds to the data file of each document number:
Li′←F(K′1, i ');I '=0,1,2 ... I;
Wherein i ' is the corresponding document number serial number of data file of traversal, and I is the data file respective file serial number of traversal
Maximum value, it includes the sum for needing the data file of the keyword scanned for that I+1, which is preset,;Li′For keyword
ω ' corresponds to the label of the data file of document number serial number i ';
Step X13, it is directed to each keyword for needing to scan for, search whether there is above-mentioned steps in dictionary γ
The keyword that X12 is generated corresponds to the label L of the data file of each document numberi′;
If it is not, indicate that by dictionary γ search include the data file for needing the keyword scanned for less than corresponding to, this
When enter step X2.
If so, the index information pairs of with the label is got in dictionary γ, it is then close by the son of the keyword
Key K '2Index information is decrypted, corresponding encrypted data file is got by the index information after decryption, to tie as inquiry
Fruit returns to user and is decrypted;Meanwhile by the update of time stamp of the tab indexes pair stored in dictionary γ be the keyword
Sub-key K '2Decrypt the time that index information is completed;
Wherein, the index information pairs of with the label is got in dictionary γ are as follows:
di′← Get (γ, Li′);
Wherein di′To be got and label L in dictionary γi′Pairs of index information;
Wherein, pass through the sub-key K ' of the keyword2Index information after being decrypted are as follows:
di←Dec(K′2, di′);
Wherein diFor di′Pass through the sub-key K ' of keyword ω '2Index information after decryption, wherein the index letter after decryption
Cease diThe as document number of the data file including keyword ω ';
Step X2, Summary file after encryption, which is concentrated, carries out substring search for each keyword for needing to scan for;
If by substring search for after, the keyword is inquired in Summary file, then by above-mentioned Summary file it is corresponding plus
Data file after close returns to user as query result;In the case where user is confirmed as correct situation, it is determined that above-mentioned conduct
The corresponding data file of query result includes the keyword, calculates label and correspondence that the keyword corresponds to above-mentioned data file
The index information of data file, and be added in dictionary γ, realize the update to dictionary γ.Wherein in the present embodiment, will
Dictionary γ is set as the dictionary γ of regular length, when the label for needing to add new keyword corresponding data file and corresponding number
When according to the index information of file to dictionary γ, i.e., when needing to add the tab indexes of new keyword to dictionary γ, if word
Pawn preceding stored full scale label index pair, then it is the tab indexes of new keyword are minimum to timestamp in dictionary γ is replaced
Tab indexes pair it is the smallest more then to replace timestamp in dictionary γ when the tab indexes of new keyword are to being multiple
A tab indexes pair.
If failing to search in Summary file after searching for by substring, then the result of search failure is returned to user.
In the present embodiment above-mentioned steps X2, encrypted Summary file is concentrated corresponding using Burrows-Wheeler conversion
Algorithm and FM index technology carry out substring search, and detailed process is as follows:
Step X21, it is directed to the keyword ω ' for needing to scan for, generates keyword query token tkT, S:
tkT, S=F (K, ω ' [1 ... M])=F (K, ω ' [1]), F (K, ω ' [2]) ... F (K, ω ' [M]), F (K ', ω '
[M]);
Wherein, [1] ω ', ω ' [2] ..., ω ' [M] are each character for needing the keyword ω ' scanned for, and M is key
The character sum of word ω ';K ' is assistant key, and K '=F (K, 2), K is master key;
Step X22, it is directed to each character ω ' [m] of keyword ω ' for needing to scan for, m=1,2,3 ... M, first
It is encrypted to obtain:Then ciphertext is found from chained list indexed set isChain table index, between the chain table index and linked list head by each character ω ' [m]
Mapping relations get the chained list of each character ω ' [m];
Step X23, it is directed to the last character ω ' [M] for needing the keyword ω ' scanned for, by character ω '
Each node in the chained list of [M] is mapped to encryption FM tuple:
WhereinThe data that the corresponding F in FM is arranged;
WhereinCorrespond to the data that the L in FM is arranged;
Wherein E (posj) correspond to the data for being in the SA column jth row of FM, posjIndicate that the data of SA column jth row are corresponding
Position ciphertext of the character in Summary file, n are total line number of FM;
Wherein,For the corresponding character of data in FM F column jth row,For the number in FM F column jth row
According to corresponding characterPosition number;For the corresponding character of data in FM L column jth row,For in FM L
The corresponding character of data of column jth rowPosition number;
It is directed to each encryption FM tuple that each byte in the chained list of ω ' [M] is be mapped to:
F is used firstK(ω ' [m]) is directed to the data that the F in FM is arrangedCarry out exclusive or
Operation is realized decryption, is obtained
Then it usesElement as L of the key decryption in the FM data first part arrangedIt obtainsIt willWith the element of the data Part arranged of the L in FMXor operation is carried out, obtains xor operation as a result, subsequently into step X24;
Step X24, each xor operation that previous step obtains is directed to as a result, search data are to be somebody's turn to do in the F column of FM
Then the row of xor operation result gets the FM tuple of the row, according to FM member in each node of chained list as shown in Figure 3 and FM table
The mapping relations of group, search out the chained list that corresponding node is mapped to the FM tuple, to get the corresponding character c of the chained listx,
The character arrived as current search;Wherein x is the number that data are currently searched in the F column of FM;Enter step X25;
Step X25, each character c got in judgment step X24xIn whether have the identical word with character ω ' [M-x]
Symbol;
If so, whether the number x that data are currently searched in judgement in the F column of FM is equal to M-1;If so, terminating substring
Search, and substring is searched for successfully, includes the keyword ω ' for needing to scan in corresponding Summary file;If it is not, then entering step
Rapid X26;
If it is not, then terminate substring search, and return substring search failure as a result, corresponding in Summary file and not including
Keyword ω ';
Step X26, it is directed to step X24 the is got and identical character c of character ω ' [M-x]x, obtain step X24 and obtain
Take character cxWhen accessed each FM tuple, and each FM tuple is performed the following operation:
F is used firstK(cx) in FM F arrange data beXor operation is carried out,
It realizes decryption, obtains
Then it usesElement as L of the key decryption in the FM data first part arrangedIt obtainsIt willWith the element of the data Part arranged of the L in FMXor operation is carried out, xor operation result is obtained;Subsequently into step X24.
The present embodiment also discloses a kind of keyword search system, comprising:
Data file obtains module, for get the data by embodiment can search for dictionary γ that encryption method obtains,
Encrypted data file and encrypted Summary file;
Keyword receiving module, each keyword that the needs for receiving user's sending scan for;
The first search module of keyword, it is true by search dictionary γ for being directed to each keyword for needing to scan for
It is fixed whether to have in encrypted data file including the keyword;
The second search module of keyword fails for being directed to each keyword for needing to scan in data file collection
In the case where searching, Summary file after encryption, which is concentrated, carries out substring search;
Query result return unit, for by the inquiry knot of the second search module of the first search module of keyword and keyword
Fruit returns to user;
Dictionary γ updating unit, for updating dictionary γ according to keyword the second search module query result, specifically:
In the case that user confirms that the query result of the second search module of keyword is correct, the above-mentioned correspondence as query result is determined
Data file includes the keyword, calculates the keyword and corresponds to the label of above-mentioned data file and the index of corresponding data file
Information, and be added in dictionary γ.
The present embodiment also discloses a kind of calculating equipment, including processor and for storage processor executable program
Memory realizes the above-mentioned keyword search methodology of the present embodiment when processor executes the program of memory storage.
In the present embodiment, as shown in figure 4, above-mentioned calculating equipment includes client and server, wherein client is meter
Calculation machine or other intelligent terminals, client be it is user oriented, user by client input need the keyword searched for
And master key.After client receives the keyword and master key of user's input, the above-mentioned keyword search side of the present embodiment is executed
The step X11 of method generates a pair of of sub-key of keyword, and is sent in server, after server receives sub-key, holds
This implements the step X12 and step X13 of above-mentioned keyword search methodology to row, to determine whether to encrypt by searching for dictionary
Data file afterwards finds the keyword ω ' for needing to scan for;Wherein server executes the above-mentioned keyword of the present embodiment and searches
The code of the step X12 and step X13 of Suo Fangfa are as follows:
(i '=0 For;i′!=⊥;i′++){
Li′←F(K′1, i ');
di′← Get (γ, Li′);// calculate Li′After can be found in dictionary γ comprising label Li′Tab indexes pair, obtain
To corresponding index information di′;
di←Dec(K′2, di′);//di′Decryption obtain include the data file of keyword ω ' document number;
Refresh(time);// more new keywords ω ' corresponding label in dictionary γ tab indexes to the corresponding time
Stamp;
When including keyword ω ' by searching for dictionary γ to search in data file, server will be held at this time
The step X21 to X26 of the above-mentioned keyword search methodology of row concentrates whether search includes plucking for keyword ω ' in Summary file
File is wanted, if searching for the data file for successfully returning to the encryption searched to client, and keyword ω ' is corresponding
Tab indexes are added to dictionary γ, to be updated to dictionary γ.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.