CN106610995B - Method, device and system for creating ciphertext index - Google Patents

Method, device and system for creating ciphertext index Download PDF

Info

Publication number
CN106610995B
CN106610995B CN201510698146.2A CN201510698146A CN106610995B CN 106610995 B CN106610995 B CN 106610995B CN 201510698146 A CN201510698146 A CN 201510698146A CN 106610995 B CN106610995 B CN 106610995B
Authority
CN
China
Prior art keywords
index
ciphertext
search
character string
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510698146.2A
Other languages
Chinese (zh)
Other versions
CN106610995A (en
Inventor
欧锻灏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201510698146.2A priority Critical patent/CN106610995B/en
Publication of CN106610995A publication Critical patent/CN106610995A/en
Application granted granted Critical
Publication of CN106610995B publication Critical patent/CN106610995B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device and a system for creating a ciphertext index, which relate to the field of computer information security and are used for improving the speed of ciphertext search. The method comprises the following steps: encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data; performing word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain target keywords; generating a Hash authentication code according to the target keyword and a Hash algorithm; encoding the Hash authentication code by adopting a preset encoding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext; and sending the ciphertext and the index character string to a database server so that the database server stores the ciphertext and the index character string in the same data table, wherein the index character string and the ciphertext are stored correspondingly.

Description

Method, device and system for creating ciphertext index
Technical Field
The invention relates to the field of computer information security, in particular to a method, a device and a system for creating a ciphertext index.
Background
In a big data platform or a public cloud platform, in order to prevent personal sensitive data (a mobile phone number, a home address, an identity card number, a passport number, a bank account number and/or the like) from being illegally accessed, the personal sensitive data needs to be encrypted and stored, and since a ciphertext obtained by encrypting the personal sensitive data is stored in a messy code form and cannot be directly searched, a ciphertext search technology based on keyword index appears.
In general, a keyword index-based ciphertext search technique includes an index creation stage and a search matching stage, and a method for creating an index includes: performing word segmentation on the Chinese sensitive data by adopting a Chinese word segmentation algorithm to obtain N keywords; calculating an editing distance based on pinyin for each keyword in the N keywords to obtain N editing distances; taking each editing distance and the key in the N editing distances as the input of an HMAC (Hash-based Message Authentication Code) algorithm, and calculating to obtain N Hash Authentication codes; taking the N Hash authentication codes as N indexes of the Chinese sensitive data; encrypting the Chinese sensitive data by adopting an encryption algorithm to obtain a ciphertext of the Chinese sensitive data; the N indices are stored in the database server along with the ciphertext.
The index of the ciphertext generated by the method cannot be used for directly searching the ciphertext in the database server, so that the ciphertext searching speed is low.
Disclosure of Invention
The embodiment of the invention provides a method, a device and a system for creating a ciphertext index, which are used for improving the speed of ciphertext search.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, a method for creating a ciphertext index is provided, including:
encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data;
performing word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain target keywords;
generating a Hash authentication code according to the target keyword and a Hash algorithm;
encoding the Hash authentication code by adopting a preset encoding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext;
and sending the ciphertext and the index character string to a database server so that the database server stores the ciphertext and the index character string in the same data table, wherein the index character string and the ciphertext are stored correspondingly.
With reference to the first aspect, in a first possible implementation manner, before the encoding the hash authentication code by using a preset encoding manner to obtain an index character string, the method further includes:
intercepting the front R bits of the Hash authentication code to obtain a sub-Hash authentication code, wherein R is more than or equal to 1 and is less than or equal to R, R and R are integers, and R is the length of the Hash authentication code;
the encoding of the hash authentication code by adopting a preset encoding mode to obtain an index character string comprises the following steps:
and coding the sub-hash authentication code by adopting a preset coding mode to obtain an index character string.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner, when the number of the index character strings is N, after the hash authentication code is encoded in a preset encoding manner to obtain an index character string, the method further includes:
randomly scrambling N index character strings, wherein N is more than or equal to 1 and is an integer;
and connecting the N index character strings after random scrambling, wherein adjacent index character strings in the N index character strings after the random scrambling are separated by printable characters in a non-preset coding mode.
With reference to the first aspect, the first possible implementation manner of the first aspect, or the second possible implementation manner, in a third possible implementation manner, the method further includes:
acquiring a search keyword;
generating a search character string from the search keyword by adopting the same method for generating the index character string from the target keyword, wherein the search character string is a printable character string;
and sending the search character string to the database server so that the database server can search the ciphertext according to the search character string and the stored index character string.
With reference to the third possible implementation manner of the first aspect, in a fourth possible implementation manner, before the obtaining of the search keyword, the method further includes:
acquiring a search statement;
the acquiring of the search keyword comprises: and performing word segmentation on the search sentence by adopting the word segmentation algorithm to obtain a search keyword.
With reference to the third possible implementation manner or the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, the method further includes:
the database server receives M search character strings, and when M is larger than or equal to 2, the database server also acquires a search mode, wherein the search mode is an AND mode or an OR mode;
the database server matches the M search strings with the stored index strings;
if M is 1, obtaining a ciphertext corresponding to the index character string which is the same as the search character string;
if M is more than or equal to 2 and the searching mode is the same mode, acquiring ciphertext corresponding to M index character strings which are the same as the M searching character strings;
and if M is more than or equal to 2 and the searching mode is the OR mode, acquiring the ciphertext corresponding to the index character string which is the same as any searching character string in the M searching character strings.
In a second aspect, an apparatus for creating a ciphertext index is provided, including:
the encryption unit is used for encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data;
the word segmentation unit is used for segmenting the sensitive data by adopting a word segmentation algorithm to obtain a target keyword;
the first generation unit is used for generating a Hash authentication code according to the target keyword and a Hash algorithm;
the encoding unit is used for encoding the Hash authentication code by adopting a preset encoding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext;
and the sending unit is used for sending the ciphertext and the index character string to a database server so that the database server can store the ciphertext and the index character string in the same data table, and the index character string and the ciphertext are stored correspondingly.
With reference to the second aspect, in a first possible implementation manner, the apparatus further includes:
the intercepting unit is used for intercepting the first R bits of the Hash authentication code to obtain a sub-Hash authentication code, R is more than or equal to 1 and less than or equal to R, R and R are integers, and R is the length of the Hash authentication code;
the encoding unit is specifically configured to encode the sub-hash authentication code in a preset encoding manner to obtain an index string.
With reference to the second aspect or the first possible implementation manner of the second aspect, in a second possible implementation manner, when the number of index strings is N, the apparatus further includes:
the scrambling unit is used for randomly scrambling N index character strings, wherein N is not less than 1 and is an integer;
and the serial connection unit is used for connecting the N index character strings after random scrambling, and adjacent index character strings in the N index character strings after serial connection are spaced by printable characters in a non-preset coding mode.
With reference to the second aspect, the first possible implementation manner of the second aspect, or the second possible implementation manner, in a third possible implementation manner, the apparatus further includes:
a first acquisition unit configured to acquire a search keyword;
a second generating unit, configured to generate a search string from the search keyword by using the same method as that used to generate the index string from the target keyword, where the search string is a printable string;
the sending unit is further configured to send the search string to the database server, so that the database server searches the ciphertext according to the search string and the stored index string.
With reference to the third possible implementation manner of the second aspect, in a fourth possible implementation manner, the apparatus further includes:
a second acquisition unit configured to acquire a search sentence;
the first obtaining unit is specifically configured to perform word segmentation on the search sentence by using the word segmentation algorithm to obtain a search keyword.
In a third aspect, a system for creating a ciphertext index is provided, including: the second aspect provides any one of the apparatus and the database server.
According to the method, the device and the system provided by the embodiment of the invention, after the target keyword generates the Hash authentication code, the Hash authentication code is coded by adopting a preset coding mode to obtain the index character string, when the number of the index character string is N, the N index character strings are N indexes of the ciphertext of the sensitive data, the database server correspondingly stores the ciphertext and the N index character strings in the same data table, and the index character string can be directly inquired in the database server through SQL because the index character string is the printable character string. If ciphertext containing a certain keyword needs to be searched, the search keyword is generated into a search character string by adopting the same method for generating the index character string by adopting the target keyword, the search character string can be directly matched with the N index character strings in the database server through SQL, whether the ciphertext is obtained or not is determined, compared with the prior art, the index of the ciphertext is not required to be loaded into a memory, the memory space is saved, and the speed of ciphertext search can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of a method for creating a ciphertext index according to an embodiment of the present invention;
fig. 2 is a schematic view of an application scenario of another method for creating a ciphertext index according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for creating a ciphertext index according to an embodiment of the present invention;
fig. 4 is a flowchart of another method for creating a ciphertext index according to an embodiment of the present invention;
fig. 5 is a flowchart of a ciphertext search method according to an embodiment of the present invention;
fig. 6 is a flowchart of another method for creating a ciphertext index according to an embodiment of the present invention;
fig. 7 is a flowchart of a ciphertext search method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an apparatus for creating a ciphertext index according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of another apparatus for creating a ciphertext index according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of another apparatus for creating a ciphertext index according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. "plurality" herein means two or more.
The method provided by the embodiment of the invention can be at least applied to a big data platform or a public cloud platform. As shown in fig. 1 (the shown numbers represent the sequence of steps, and the same is true in fig. 2), in the big data platform, the user considers that the big data server is authentic, and therefore sensitive data is directly uploaded to the big data server through the user equipment, the big data server obtains a ciphertext of the sensitive data and an index of the ciphertext (generated according to a target keyword) according to the sensitive data, uploads the ciphertext and the index of the ciphertext to the database server, and the database server correspondingly stores the ciphertext and the index of the ciphertext in the same data table. When a user (or an administrator authorized by the user) needs to acquire sensitive data, a search keyword is provided to a big data server through user equipment, the big data server generates a search character string according to the search keyword by a method for generating an index of a ciphertext according to a target keyword and then sends the search character string to a database server, the database server acquires the ciphertext according to the search character string and the index of the ciphertext and then sends the ciphertext to the big data server, and the big data server decrypts the ciphertext to obtain the sensitive data and sends the sensitive data to the user equipment. As shown in fig. 2, in the public cloud platform, a tenant (a user who rents a public cloud device) considers that a provider of the public cloud service is semi-trusted, and therefore, after acquiring a ciphertext of sensitive data and an index of the ciphertext (generated according to a target keyword) according to the sensitive data, the tenant device (the public cloud device rented by the tenant) uploads the ciphertext and the index of the ciphertext to the public cloud server, the public cloud server uploads the ciphertext and the index of the ciphertext to a database server, when the user needs to acquire the sensitive data, the tenant device generates a search string by generating the index of the ciphertext according to the target keyword, transmits the search string to the public cloud server, the public cloud server transmits the search string to the database server, the database server determines according to the search string and the index of the ciphertext and transmits the search string to the device through the public cloud server, after receiving the ciphertext, the tenant device decrypts the ciphertext to obtain the sensitive data. It should be noted that the database server may be disposed inside a big data server or a public cloud server.
Example one
An embodiment of the present invention provides a method for creating a ciphertext index, as shown in fig. 3, including:
301. and encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data.
In the application scenario shown in fig. 1 and fig. 2, when the method provided in the embodiment of the present invention is applied to a big data platform, the execution subject in the embodiment of the present invention may be a big data server, and when the method provided in the embodiment of the present invention is applied to a public cloud platform, the execution subject in the embodiment of the present invention may be tenant equipment.
For example, the reversible Encryption algorithm may be an AES (Advanced Encryption Standard) algorithm, a DES (Data Encryption Standard) algorithm, or another reversible Encryption algorithm, which is not limited in this embodiment of the present invention. Preferably, the use of a standard data encryption algorithm (e.g., DES algorithm or AES algorithm) is advantageous to ensure the security of the ciphertext compared to the use of a non-standard data encryption algorithm to encrypt the sensitive data.
Optionally, before step 301, the method may further include: sensitive data is determined. The sensitive data may be a mobile phone number, a home address, an identification number, a passport number, and/or a bank account of the user.
When the execution main body of the embodiment of the invention is the big data server, when the big data server receives the data sent by the user equipment, the sensitive data in the data can be determined according to a specific protocol.
302. And performing word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain a target keyword.
The sensitive data can be Chinese, English or digital, and the like, and the embodiment of the invention does not limit the sensitive data, and different word segmentation algorithms can be adopted for different types of sensitive data. For example, when the sensitive data is english, since words in an english sentence are generally separated by punctuation marks or spaces, the target keywords can be obtained by segmenting the english sentence using the punctuation marks and the spaces, and when the sensitive data is a chinese sentence, the chinese sentence can be segmented according to the meaning of the words in the chinese sentence.
303. And generating a Hash authentication code according to the target keyword and a Hash algorithm.
Preferably, the hash algorithm may be a keyed hash algorithm, and exemplarily, the keyed hash algorithm may be an HMAC algorithm, and specifically, the keyed hash algorithm may be an HMAC-MD5 algorithm, an HMAC-SHA1 algorithm, an HMAC-SHA256 algorithm, or the like.
One target keyword corresponds to one Hash authentication code, and the Hash authentication code corresponding to the target keyword can be obtained after the target keyword and the secret key are used as the input of a Hash algorithm for calculation.
304. And coding the Hash authentication code by adopting a preset coding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext.
It should be noted that one or more target keywords can be obtained after word segmentation is performed on one piece of sensitive data, one target keyword generates one hash authentication code, and one hash authentication code generates one index character string, so that when N (N is greater than or equal to 1, N is an integer) target keywords are obtained after word segmentation is performed on one piece of sensitive data, N index character strings can be generated according to the N target keywords, and the N index character strings are N indexes of the ciphertext of the piece of sensitive data.
It should be noted that the basic ASCII (American Standard Code for information exchange Code) character set has 128 characters, and there are 96 printable characters including common letters, numbers, punctuation marks, and the like, and 32 control characters. The preset encoding mode is an encoding mode that can encode 8-bit bytes into a printable character string, and may be Base64 specifically.
305. And sending the ciphertext and the index character string to a database server so that the database server stores the ciphertext and the index character string in the same data table, wherein the index character string and the ciphertext are stored correspondingly.
Specifically, after the index character strings are stored in the same data table in the database server, the index character strings are printable character strings, and therefore Query can be directly performed in the database server through Structured Query Language (SQL).
It should be noted that step 301 may be executed before or after any one of steps 302 to 304.
It should be noted that there may be a plurality of sensitive data, each ciphertext of the sensitive data has a corresponding index, and the number of the indexes corresponding to the ciphertext of each sensitive data may beThe same may be different. Illustratively, as shown in table 1, table 1 shows the corresponding relationship between the ciphertext of 2 different sensitive data and the index thereof, where X1And X2Ciphertext representing 2 different sensitive data, B11To B14Is X 14 indexes of (1), B21To B23Is X 23 indices.
TABLE 1
Figure BDA0000828992530000091
Optionally, before step 304, the method further includes: intercepting the front R bits of the Hash authentication code to obtain a sub-Hash authentication code, wherein R is more than or equal to 1 and is less than or equal to R, R and R are integers, and R is the length of the Hash authentication code; in this case, step 304 includes: and coding the sub-hash authentication code by adopting a preset coding mode to obtain an index character string.
In general, the length of the hash authentication code is long, and the optional method can reduce the operation amount.
Optionally, the method further includes: 11) acquiring a search keyword;
12) generating a search character string from the search keyword by adopting the same method for generating the index character string from the target keyword, wherein the search character string is a printable character string;
13) and sending the search character string to the database server so that the database server can search the ciphertext according to the search character string and the stored index character string.
Optionally, before step 11), the method further includes: acquiring a search statement; in this case, step 11) includes: and performing word segmentation on the search sentence by adopting the word segmentation algorithm to obtain a search keyword.
It should be noted that, when searching for a ciphertext, the search may be performed directly through a search keyword, or may be performed through a sentence (segment), in this case, a word segmentation algorithm is required to perform word segmentation on the sentence (segment) to obtain the search keyword, the search keyword may include one or more search keywords, and one search keyword corresponds to one search string.
Optionally, the method further includes:
the database server receives M search character strings, and when M is larger than or equal to 2, the database server also acquires a search mode, wherein the search mode is an AND mode or an OR mode;
the database server matches the M search strings with the stored index strings;
if M is 1, obtaining a ciphertext corresponding to the index character string which is the same as the search character string;
if M is more than or equal to 2 and the searching mode is the same mode, acquiring ciphertext corresponding to M index character strings which are the same as the M searching character strings;
and if M is more than or equal to 2 and the searching mode is the OR mode, acquiring the ciphertext corresponding to the index character string which is the same as any searching character string in the M searching character strings.
Because the index character string is generated according to the target keyword, and the target keyword is obtained by utilizing a word segmentation algorithm to segment sensitive data, when one search keyword exists, the same method for generating the index character string by the target keyword is adopted, after the search keyword is generated into the search character string, if N indexes of the ciphertext are provided, when the search character string is the same as one of the N index character strings, the search keyword is indicated to be one word in the sensitive data corresponding to the ciphertext, and under the condition, the ciphertext is obtained; when the search character string is different from any one of the N index character strings, the search keyword is not a word in the sensitive data corresponding to the ciphertext, and the ciphertext is not acquired under the condition. When a plurality of search keywords exist and the search mode is an or mode, it is indicated that the sensitive data required by the user is sensitive data containing any search keyword in the plurality of search keywords, in this case, as long as the search character string corresponding to any search keyword in the plurality of search keywords is the same as one of the N index character strings, a ciphertext is obtained, otherwise, the ciphertext is not obtained; and when a plurality of search keywords exist and the search mode is the AND mode, it is indicated that the sensitive data required by the user is the sensitive data containing the search keywords, under the condition, a plurality of search character strings corresponding to the search keywords are respectively the same as a plurality of search character strings in the N index character strings, and the ciphertext is obtained, otherwise, the ciphertext is not obtained.
It should be noted that, when one search keyword is a word in a plurality of sensitive data, the server obtains ciphertext of the plurality of sensitive data. Illustratively, based on the example described in Table 1, if ciphertext X1The plaintext of (1) is '0501' and 4 target keywords of '0501' are {0, 05, 050, 0501}, and the ciphertext X2The plaintext of (1) is "052", and the 3 target keywords of (052) are {0, 05, 052 }. The index string corresponding to each target keyword is shown in table 2. When the search keyword is '05', the search string corresponding to the search keyword is B1', will B1' separately and ciphertext X1And ciphertext X2Is matched because of B1' and B12And B22If they are the same, the ciphertext X is obtained1And X2. When the search keyword is "052", the search string corresponding to the search keyword is B2', then B2' separately and ciphertext X1And ciphertext X2Is matched because of B2' and B23If they are the same, the ciphertext X is obtained2
TABLE 2
Index character string Target keywords
B11 0
B12 05
B13 050
B14 0501
B21 0
B22 05
B23 052
Based on the example described in table 2, when the search keywords are "05" and "052", their corresponding search strings are B, respectively1' and B2', then B1' and B2' separately and ciphertext X1And ciphertext X2Are matched with the index of B1' and B12And B22Same as B2' and B23The same; when the searching mode is AND mode, obtaining the ciphertext X2When the search mode is an OR mode, obtaining the ciphertext X1And X2
Optionally, when the number of the index strings is N, after step 304, the method further includes: randomly scrambling N index character strings, wherein N is more than or equal to 1 and is an integer; and connecting the N index character strings after random scrambling, wherein adjacent index character strings in the N index character strings after the random scrambling are separated by printable characters in a non-preset coding mode.
In this case, step 305 specifically includes: and sending the ciphertext and the N index character strings after the concatenation to a database server. When the database server stores the N index strings after the concatenation, the database server may occupy one storage unit in one field in the data table for storage, where the field is used to store one or more indexes of the ciphertext.
In the embodiment of the present invention, N index strings corresponding to one ciphertext may be stored in one storage unit after being concatenated.
It should be noted that, since N index character strings are required to be stored after being concatenated, and the N index character strings are generated according to N target keywords, and the N index character strings are sequentially arranged to possibly reveal contents of a ciphertext, in order to improve security of the ciphertext, the N index character strings are randomly scrambled before being concatenated. In the embodiment of the invention, the N index character strings are connected in series, and when the N index character strings are stored, only one storage unit in one field can be occupied for storage, so that the resource of a database server is saved.
Meanwhile, the N index character strings are spaced by printable characters in a non-preset coding mode, so that matching errors can be prevented. For example, 2 index strings are AAAA and BBBB, respectively, one search string is AABB, and it is assumed that the printable character in the non-default encoding mode is "! If 2 index strings are directly connected in series, the 2 index strings after being connected in series are aaaaaabbbb, and when the search string AABB is matched with the AAAABBBB, the matching result may be erroneous because the middle parts of the AABB and the AAAABBBB are the same; if 2 index strings are used "! "spaced apart, then the 2 index strings after concatenation are AAAA! BBBB, when the character string AABB and AAAA! When BBBB matches, only match is "! "spaced index strings, and therefore, errors in matching results can be prevented.
In addition, when searching for a ciphertext, the search may be performed by several sentences (paragraphs), and the search method between the sentences (paragraphs) may be an and method or an and method. In this case, if there are W sentences (paragraphs), the word segmentation algorithm can be adopted to enter the W sentences (paragraphs) respectivelyDividing the lines into words to obtain search keywords corresponding to each sentence (paragraph), and generating search character strings by the search keywords corresponding to each sentence (paragraph) by adopting the same method of generating index character strings by target keywords; wherein one search keyword corresponds to one search string. Suppose that the number of search strings corresponding to the i-th sentence (i is more than or equal to 1 and less than or equal to W, i is an integer) in the W sentences (paragraphs) is Wi(wi≥1,wiIs an integer), the total number of all different search strings corresponding to the W sentence (segment) is W (W is more than or equal to 1, W is an integer), and in the process of searching the ciphertext, the following concrete steps are performed:
when the search mode between W sentences (segments) is an AND mode and the search mode between the search keywords corresponding to each sentence is an AND mode, acquiring a ciphertext when the W search character strings are respectively the same as the W index character strings in the N index character strings;
when the search mode between W sentences (segments) is an OR mode and the search mode between the search keywords corresponding to each sentence is an OR mode, acquiring a ciphertext when any one search character string in the W search character strings is the same as one index character string in the N index character strings;
when the search mode between W sentences (segments) is an AND mode and the search mode between the search keywords corresponding to each sentence is an OR mode, when W search character strings corresponding to the W sentences (segments) are respectively the same as W index character strings in the N index character strings, acquiring a ciphertext; wherein, W search character strings respectively correspond to W sentences (paragraphs);
when the search mode between W sentences (paragraphs) is the AND mode and the search mode between the search keywords corresponding to each sentence is the AND mode, when the W sentence (paragraph) corresponds to the i sentence (paragraph)iEach search string is respectively associated with w in the N index stringsiWhen the index character strings are the same, the ciphertext is obtained, and the ith sentence (segment) can be any one of W sentences (segments).
Optionally, step 303 includes: generating a Hash authentication code according to a first result and a Hash algorithm, wherein the first result is obtained by directly connecting target parameters and the target key words in series, and the target parameters are the ciphertext or the initial vector adopted when the sensitive data is encrypted; in this case, step 12) includes: and generating a search string from a second result by adopting the same method for generating the index string from the first result, wherein the second result is obtained by directly connecting the target parameter and the search keyword in series.
It should be noted that, in an application scenario with a very high requirement on security, because different users may upload sensitive data including the same word, if ciphertext of the sensitive data of all users is generated by the same method and an index of the ciphertext of the sensitive data of all users is generated by the same method, when a ciphertext is searched according to a search keyword, the ciphertext including the search keyword of other users may be acquired, so that security of the ciphertext is reduced.
Based on the problem, the generated ciphertext of the sensitive data of different users (or tenants) and the index of the ciphertext can be different, so that the security of the ciphertext is improved. Specifically, when the AES algorithm or the DES algorithm is adopted, random initial vectors are used when ciphertext of sensitive data of different users is generated, and thus, the generated ciphertext of the sensitive data of different users is different inevitably. When the index of the ciphertext is generated, the index of the ciphertext of the sensitive data of different users is necessarily different because the target parameters contained in the first result are different according to the generation of the first result and the hash algorithm.
In the prior art mentioned in the background, the hash authentication code needs to be constructed as balance 2 in the search matching stage8Fork tree, therefore, it is necessary to load N hash codes into the memory of the server, and construct balance 2 corresponding to each hash code in the memory8Index structure of cross tree and balance 2 generated according to search key word8The matching is performed by the cross tree, so that extra memory space is needed, and the speed of ciphertext search is greatly reduced.
According to the method provided by the embodiment of the invention, after the target keyword generates the Hash authentication code, the Hash authentication code is coded by adopting a preset coding mode to obtain the index character string, when the number of the index character string is N, the N index character strings are N indexes of the ciphertext of the sensitive data, the database server correspondingly stores the ciphertext and the N index character strings in the same data table, and the index character string can be directly inquired in the database server through SQL because the index character string is the printable character string. If ciphertext containing a certain keyword needs to be searched, the search keyword is generated into a search character string by adopting the same method for generating the index character string by adopting the target keyword, the search character string can be directly matched with the N index character strings in the database server through SQL, whether the ciphertext is obtained or not is determined, compared with the prior art, the index of the ciphertext is not required to be loaded into a memory, the memory space is saved, and the speed of ciphertext search can be improved.
Example two
It should be noted that, in the big data platform, the database server is generally disposed in the big data server, and this embodiment takes this case as an example to describe the method for creating the ciphertext index provided in the first embodiment, and for the relevant explanation in this embodiment, reference may be made to the above-mentioned embodiment, as shown in fig. 4, where the method includes:
401. the user equipment sends data to the big data server.
Specifically, when a user to which the user device belongs needs to store data in the big data server, the data may be sent to the big data server through the user device.
402. The big data server receives the data sent by the user equipment and determines sensitive data in the data.
It should be noted that data sent by the user equipment to the big data server may include a plurality of sensitive data, and in the embodiment of the present invention, one sensitive data is taken as an example for description.
403. And the big data server encrypts the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext X of the sensitive data.
Specifically, the reversible encryption algorithm may be AES, DES, or another reversible encryption algorithm, which is not limited in this embodiment of the present invention.
404. The big data server performs word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain N target keywords K1、K2、…、KN
Specifically, when the sensitive data is different, the word segmentation algorithm adopted can also be different. When the sensitive data is Chinese, an intelligent word segmentation algorithm or a fine-grained word segmentation algorithm can be adopted, and when the sensitive data is digital, a prefix word segmentation algorithm or a suffix word segmentation algorithm can be adopted.
The following introduces the word segmentation principle of several word segmentation algorithms:
1. intelligent word segmentation: and segmenting the most significant word in the sentence as the target keyword.
For example: the word segmentation result of "excellent engineer" is { excellent, engineer }.
2. Fine-grained word segmentation: and all the words from the most significant word to the least significant word in the sentence are divided to be used as target keywords.
For example: the word segmentation result of "excellent engineer" is { excellent, engineer }.
Specifically, a Chinese word segmentation tool IKAnalyze can be used for realizing an intelligent word segmentation algorithm and a fine-grained word segmentation algorithm.
3. Prefix segmentation: the method comprises the steps of sequentially intercepting continuous first 1, 2, … and L characters from sentences with the length of L (L is more than or equal to 1, and L is an integer) as target keywords respectively.
For example: the result of word segmentation for "050119" is {0, 05, 050, 0501, 05011, 050119 }.
405. The big data server is based on N target keywords K1、K2、…、KNGenerating N Hash authentication codes H by Hash algorithm with key1、H2、…、HN
For example, the keyed hash algorithm may be an HMAC algorithm, and specifically, may be an HMAC-MD5 algorithm, an HMAC-SHA1 algorithm, an HMAC-SHA256 algorithm, or the like.
406. Intercepting N Hash authentication codes H by big data server1、H2、…、HNOf each hashed authentication codeObtaining N sub-Hash authentication codes S by the first r bits1、S2、…、SN
Wherein R is more than or equal to 1 and less than or equal to R, R and R are integers, and R is the length of the Hash authentication code.
It should be noted that the hash code may have 256 bits, and r bits before the hash code is truncated may be used for calculation in order to reduce the operation amount.
407. The big data server adopts a preset coding mode to carry out verification on the N sub-Hash authentication codes S1、S2、…、SNRespectively coding to obtain N index character strings B1、B2、…、BN
Specifically, the preset encoding mode may be Base64, and the character string obtained by encoding the sub-hash authentication code in Base64 is a printable character string.
408. The big data server sends N index character strings B1、B2、…、BNRandomly scrambling to obtain N disordered index character strings C1、C2、…、CN
Note that, B is because1、B2、…、BNIs generated from N target keywords, B1、B2、…、BNThe content of the ciphertext is possibly leaked, so in order to improve the security of the ciphertext, N index character strings B are used1、B2、…、BNAnd (4) randomly scrambling.
409. The big data server randomly scrambles the N index character strings C1、C2、…、CNAre connected in series.
Wherein, adjacent index character strings in the N index character strings after being connected in series are separated by printable characters in the non-preset encoding mode, specifically, when the preset encoding mode is Base64, the character "|! "is not a printable character as used in Base64, the printable character may be"! ".
410. The big data server serially connects N index character strings C1、C2、…、CNAnd the ciphertext X are stored in the same data table in the database server.
The N index character strings are N indexes of a ciphertext, the indexes of the ciphertext are stored in correspondence with the ciphertext, it should be noted that, in the prior art, after the N indexes of the ciphertext of sensitive data are generated, each index occupies one storage unit in one field in a data table for storage. Meanwhile, the index character strings are spaced by printable characters in a non-preset coding mode, so that matching errors can be prevented.
When a user needs to acquire sensitive data, a search keyword can be sent to the big data server through the user equipment, so that the big data server searches out a ciphertext according to the search keyword, decrypts the ciphertext to obtain the sensitive data, and sends the sensitive data to the user equipment, as shown in fig. 5, the specific process includes:
501. the user equipment sends M search keywords to the big data server, and when M is larger than or equal to 2, the user equipment also sends a search mode to the big data server, wherein the search mode is an AND mode or an OR mode; m is not less than 1 and is an integer.
502. The big data server receives M search keywords, and when M is larger than or equal to 2, the big data server also receives a search mode.
503. The big data server adopts the same method of generating index character strings by target keywords, and generates M search character strings B by M search keywords1′、B2′、…、BM′。
504. The big data server sends M search character strings B1′、B2′、…、BM' and C1!C2!…!CNThe N index strings included in the search result are matched.
Specifically, if M is 1, when the M search strings are the same as one index string in the N index strings, the ciphertext is obtained;
if M is more than or equal to 2 and the searching mode is the AND mode, when the M searching character strings are the same as M index character strings in the N index character strings, acquiring the ciphertext;
and if M is more than or equal to 2 and the searching mode is the OR mode, when any one of the M searching character strings is the same as one of the N indexing character strings, acquiring the ciphertext.
When the big data server acquires the ciphertext, executing the step 505 to the step 507; and when the big data server does not acquire the ciphertext, the big data server sends a search failure message to the user equipment. In fig. 5, the large data server obtains the ciphertext as an example for rendering.
505. And the big data server decrypts the acquired ciphertext by adopting a decryption algorithm corresponding to the reversible encryption algorithm to obtain the sensitive data.
506. The big data server sends sensitive data to the user equipment.
507. And the user equipment receives the sensitive data sent by the big data server.
Specifically, an example of obtaining the ciphertext may refer to the example described in table 2.
According to the method provided by the embodiment of the invention, after the target keyword generates the Hash authentication code, the Hash authentication code is coded by adopting a preset coding mode to obtain the index character string, when the number of the index character string is N, the N index character strings are N indexes of the ciphertext of the sensitive data, the database server correspondingly stores the ciphertext and the N index character strings in the same data table, and the index character string can be directly inquired in the database server through SQL because the index character string is the printable character string. If ciphertext containing a certain keyword needs to be searched, the search keyword is generated into a search character string by adopting the same method for generating the index character string by adopting the target keyword, the search character string can be directly matched with the N index character strings in the database server through SQL, whether the ciphertext is obtained or not is determined, compared with the prior art, the index of the ciphertext is not required to be loaded into a memory, the memory space is saved, and the speed of ciphertext search can be improved.
EXAMPLE III
It should be noted that, in the public cloud platform, the database server is generally disposed in the public cloud server, and this embodiment takes this case as an example to describe the method for creating the ciphertext index provided in the first embodiment, and for the relevant explanation in this embodiment, reference may be made to the foregoing embodiment, as shown in fig. 6, where the method includes:
601. the tenant device determines the sensitive data.
In the embodiment of the present invention, a sensitive data is taken as an example for description.
602. And the tenant equipment encrypts the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext X of the sensitive data.
Specifically, the reversible encryption algorithm may be AES, DES, or another reversible encryption algorithm, which is not limited in this embodiment of the present invention.
603. The tenant equipment performs word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain N target keywords K1、K2、…、KN
Specifically, when the sensitive data is different, the word segmentation algorithm adopted can also be different. When the sensitive data is Chinese, an intelligent word segmentation algorithm or a fine-grained word segmentation algorithm can be adopted, and when the sensitive data is digital, a prefix word segmentation algorithm or a suffix word segmentation algorithm can be adopted. The principles of several specific segmentation algorithms can be seen from the description in example two.
604. The tenant equipment is based on N target keywords K1、K2、…、KNGenerating N Hash authentication codes H by Hash algorithm with key1、H2、…、HN
For example, the keyed hash algorithm may be an HMAC algorithm, and specifically, may be an HMAC-MD5 algorithm, an HMAC-SHA1 algorithm, an HMAC-SHA256 algorithm, or the like.
605. Tenant equipment intercepts N Hash authentication codes H1、H2、…、HNThe first r bits of each Hash authentication code in the sequence table are used for obtaining N sub-Hash authentication codes S1、S2、…、SN
Wherein R is more than or equal to 1 and less than or equal to R, R and R are integers, and R is the length of the Hash authentication code.
It should be noted that the hash code may have 256 bits, and r bits before the hash code is truncated may be used for calculation in order to reduce the operation amount.
606. The tenant equipment adopts a preset coding mode to carry out authentication on the N sub-Hash codes S1、S2、…、SNRespectively coding to obtain N index character strings B1、B2、…、BN
Specifically, the preset encoding mode may be Base64, and the character string obtained by encoding the sub-hash authentication code in Base64 is a printable character string.
607. The tenant equipment transmits N index character strings B1、B2、…、BNRandomly scrambling to obtain N disordered index character strings C1、C2、…、CN
Note that, B is because1、B2、…、BNIs generated from N target keywords, B1、B2、…、BNThe content of the ciphertext is possibly leaked, so in order to improve the security of the ciphertext, N index character strings B are used1、B2、…、BNAnd (4) randomly scrambling.
608. The tenant equipment randomly scrambles the N index character strings C1、C2、…、CNAre connected in series.
Wherein, adjacent index character strings in the N index character strings after being connected in series are separated by printable characters in the non-preset encoding mode, specifically, when the preset encoding mode is Base64, the character "|! "is not a printable character as used in Base64, the printable character may be"! ".
609. The tenant equipment sends N index character strings C after being connected in series to the public cloud server1、C2、…、CNAnd a ciphertext X.
610. The public cloud server receives the N index character strings C which are sent by the tenant equipment and are connected in series1、C2、…、CNAnd ciphertext X, and serially connecting N index character strings C1、C2、…、CNAnd the ciphertext X are stored in the same data table in the database server.
The N index character strings are N indexes of a ciphertext, the indexes of the ciphertext are stored in correspondence with the ciphertext, it should be noted that, in the prior art, after the N indexes of the ciphertext of sensitive data are generated, each index occupies one storage unit in one field in a data table for storage. Meanwhile, the index character strings are spaced by printable characters in a non-preset coding mode, so that matching errors can be prevented.
When a user needs to obtain a ciphertext, as shown in fig. 7, the ciphertext can be obtained through the following processes:
701. the tenant equipment determines M search keywords, and adopts the same method that the target keywords generate the index character string to generate M search character strings B from the M search keywords1′、B2′、…、BM' when M is larger than or equal to 2, the tenant equipment also determines a search mode, wherein the search mode is an AND mode or an OR mode, M is larger than or equal to 1, and M is an integer.
702. Tenant equipment sends M search character strings B to public cloud server1′、B2′、…、BM' and search mode.
703. The public cloud server receives M search character strings B sent by tenant equipment1′、B2′、…、BM' and search mode.
704. The public cloud server searches M search character strings B1′、B2′、…、BM' and C1!C2!…!CNMatching the N index character strings;
if M is equal to 1, when the M search character strings are the same as one index character string in the N index character strings, acquiring the ciphertext;
if M is more than or equal to 2 and the searching mode is the AND mode, when the M searching character strings are the same as M index character strings in the N index character strings, acquiring the ciphertext;
and if M is more than or equal to 2 and the searching mode is the OR mode, when any one of the M searching character strings is the same as one of the N indexing character strings, acquiring the ciphertext.
When the public cloud server acquires the ciphertext, executing step 705 to step 707; and when the public cloud server does not acquire the ciphertext, the public cloud server sends a search failure message to the tenant device. In fig. 7, the public cloud server acquires the ciphertext as an example to perform rendering.
705. And the public cloud server sends the acquired ciphertext to the tenant equipment.
706. And the tenant equipment receives the ciphertext sent by the public cloud server.
707. And the tenant equipment decrypts the acquired ciphertext by adopting a decryption algorithm corresponding to the reversible encryption algorithm to obtain the sensitive data.
Specifically, an example of obtaining the ciphertext may refer to the example described in table 2.
According to the method provided by the embodiment of the invention, after the target keyword generates the Hash authentication code, the Hash authentication code is coded by adopting a preset coding mode to obtain the index character string, when the number of the index character string is N, the N index character strings are N indexes of the ciphertext of the sensitive data, the database server correspondingly stores the ciphertext and the N index character strings in the same data table, and the index character string can be directly inquired in the database server through SQL because the index character string is the printable character string. If ciphertext containing a certain keyword needs to be searched, the search keyword is generated into a search character string by adopting the same method for generating the index character string by adopting the target keyword, the search character string can be directly matched with the N index character strings in the database server through SQL, whether the ciphertext is obtained or not is determined, compared with the prior art, the index of the ciphertext is not required to be loaded into a memory, the memory space is saved, and the speed of ciphertext search can be improved.
Example four
An embodiment of the present invention provides an apparatus 80 for creating a ciphertext index, configured to execute the method shown in fig. 3, where as shown in fig. 8, the apparatus 80 includes:
the encryption unit 801 is configured to encrypt the sensitive data by using a reversible encryption algorithm to obtain a ciphertext of the sensitive data;
a word segmentation unit 802, configured to perform word segmentation on the sensitive data by using a word segmentation algorithm to obtain a target keyword;
a first generating unit 803, configured to generate a hash authentication code according to the target keyword and a hash algorithm;
the encoding unit 804 is configured to encode the hash authentication code in a preset encoding manner to obtain an index character string, where the index character string is a printable character string and the index character string is an index of the ciphertext;
a sending unit 805, configured to send the ciphertext and the index string to a database server, so that the database server stores the ciphertext and the index string in the same data table, where the index string and the ciphertext are stored correspondingly.
Optionally, as shown in fig. 9, the apparatus 80 further includes:
an intercepting unit 806, configured to intercept the first R bits of the hash authentication code to obtain a sub-hash authentication code, where R is greater than or equal to 1 and is less than or equal to R, R and R are integers, and R is the length of the hash authentication code;
the encoding unit 804 is specifically configured to encode the sub-hash authentication code by using a preset encoding manner to obtain an index character string.
Optionally, as shown in fig. 9, when the number of the index strings is N, the apparatus 80 further includes:
a scrambling unit 807 for randomly scrambling N index character strings, where N is greater than or equal to 1 and is an integer;
a concatenation unit 808, configured to concatenate the N index character strings after random scrambling, where adjacent index character strings in the N index character strings after concatenation are spaced apart by a printable character in a non-preset encoding manner.
Optionally, as shown in fig. 9, the apparatus 80 further includes:
a first acquisition unit 809 for acquiring a search keyword;
a second generating unit 810, configured to generate a search string from the search keyword by using the same method as that for generating the index string from the target keyword, where the search string is a printable string;
the sending unit 805 is further configured to send the search string to the database server, so that the database server searches the ciphertext according to the search string and the stored index string.
Optionally, as shown in fig. 9, the apparatus 80 further includes:
a second obtaining unit 811 for obtaining a search sentence;
the first obtaining unit 809 is specifically configured to perform word segmentation on the search statement by using the word segmentation algorithm to obtain a search keyword.
According to the device provided by the embodiment of the invention, after the target keyword generates the Hash authentication code, the Hash authentication code is coded by adopting a preset coding mode to obtain the index character string, when the number of the index character string is N, the N index character strings are N indexes of the ciphertext of the sensitive data, the database server correspondingly stores the ciphertext and the N index character strings in the same data table, and the index character string can be directly inquired in the database server through SQL because the index character string is the printable character string. If ciphertext containing a certain keyword needs to be searched, the search keyword is generated into a search character string by adopting the same method for generating the index character string by adopting the target keyword, the search character string can be directly matched with the N index character strings in the database server through SQL, whether the ciphertext is obtained or not is determined, compared with the prior art, the index of the ciphertext is not required to be loaded into a memory, the memory space is saved, and the speed of ciphertext search can be improved.
EXAMPLE five
In terms of hardware implementation, each unit in the apparatus may be embedded in a processor of the apparatus in a hardware form or independent from the apparatus, or may be stored in a memory of the apparatus in a software form, so that the processor calls and executes operations corresponding to the above units, and the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip microcomputer, or the like.
As shown in fig. 10, another apparatus 100 for creating a ciphertext index according to an embodiment of the present invention is configured to execute the method shown in fig. 3, where the apparatus 100 includes: memory 1001, processor 1002, transmitter 1003 and bus system 1004.
The memory 1001, the processor 1002 and the transmitter 1003 are coupled via a bus system 1004, wherein the memory 1001 may include a random access memory, and may further include a non-volatile memory, such as at least one disk memory. The bus system 1004 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus system 1004 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
The memory 1001 stores a set of codes for controlling the processor 1002 to perform the following actions:
encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data;
performing word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain target keywords;
generating a Hash authentication code according to the target keyword and a Hash algorithm;
encoding the Hash authentication code by adopting a preset encoding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext;
the transmitter 1003 is configured to transmit the ciphertext and the index character string to a database server, so that the database server stores the ciphertext and the index character string in the same data table, where the index character string and the ciphertext are stored correspondingly.
Optionally, the processor 1002 is further configured to:
intercepting the front R bits of the Hash authentication code to obtain a sub-Hash authentication code, wherein R is more than or equal to 1 and is less than or equal to R, R and R are integers, and R is the length of the Hash authentication code;
the processor 1002 is specifically configured to encode the sub-hash authentication code by using a preset encoding manner to obtain an index character string.
Optionally, when the number of the index strings is N, the processor 1002 is further configured to:
randomly scrambling N index character strings, wherein N is more than or equal to 1 and is an integer;
and connecting the N index character strings after random scrambling, wherein adjacent index character strings in the N index character strings after the random scrambling are separated by printable characters in a non-preset coding mode.
Optionally, the processor 1002 is further configured to:
acquiring a search keyword;
generating a search character string from the search keyword by adopting the same method for generating the index character string from the target keyword, wherein the search character string is a printable character string;
the transmitter 1003 is further configured to transmit the search string to the database server, so that the database server searches the ciphertext according to the search string and the stored index string.
Optionally, the processor 1002 is further configured to:
acquiring a search statement;
the processor 1002 is specifically configured to perform word segmentation on the search statement by using the word segmentation algorithm to obtain a search keyword.
According to the device provided by the embodiment of the invention, after the target keyword generates the Hash authentication code, the Hash authentication code is coded by adopting a preset coding mode to obtain the index character string, when the number of the index character string is N, the N index character strings are N indexes of the ciphertext of the sensitive data, the database server correspondingly stores the ciphertext and the N index character strings in the same data table, and the index character string can be directly inquired in the database server through SQL because the index character string is the printable character string. If ciphertext containing a certain keyword needs to be searched, the search keyword is generated into a search character string by adopting the same method for generating the index character string by adopting the target keyword, the search character string can be directly matched with the N index character strings in the database server through SQL, whether the ciphertext is obtained or not is determined, compared with the prior art, the index of the ciphertext is not required to be loaded into a memory, the memory space is saved, and the speed of ciphertext search can be improved.
The embodiment of the present invention further provides a system for creating a ciphertext index, where the system includes the apparatus 80 and the database server, or the system includes the apparatus 100 and the database server.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working processes of the above-described apparatuses and modules, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (12)

1. A method of creating a ciphertext index, comprising:
encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data;
performing word segmentation on the sensitive data by adopting a word segmentation algorithm to obtain target keywords;
generating a Hash authentication code according to the target keyword and a Hash algorithm;
encoding the Hash authentication code by adopting a preset encoding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext;
sending the ciphertext and the index character string to a database server so that the database server stores the ciphertext and the index character string in the same data table, wherein the index character string and the ciphertext are stored correspondingly;
when the number of the index character strings is N, after the hash authentication code is encoded by using a preset encoding mode to obtain the index character strings, the method further includes: randomly scrambling N index character strings, wherein N is more than or equal to 1 and is an integer; and connecting the N index character strings after random scrambling, wherein adjacent index character strings in the N index character strings after the random scrambling are separated by printable characters in a non-preset coding mode.
2. The method according to claim 1, wherein before the encoding the hash authentication code in a preset encoding manner to obtain the index string, the method further comprises:
intercepting the front R bits of the Hash authentication code to obtain a sub-Hash authentication code, wherein R is more than or equal to 1 and is less than or equal to R, R and R are integers, and R is the length of the Hash authentication code;
the encoding of the hash authentication code by adopting a preset encoding mode to obtain an index character string comprises the following steps:
and coding the sub-hash authentication code by adopting a preset coding mode to obtain an index character string.
3. The method of claim 1,
and occupying a storage unit in one field in the data table to store the N index character strings.
4. The method according to claim 1 or 2, characterized in that the method further comprises:
acquiring a search keyword;
generating a search character string from the search keyword by adopting the same method for generating the index character string from the target keyword, wherein the search character string is a printable character string;
and sending the search character string to the database server so that the database server can search the ciphertext according to the search character string and the stored index character string.
5. The method of claim 4, wherein prior to said obtaining search keywords, the method further comprises:
acquiring a search statement;
the acquiring of the search keyword comprises: and performing word segmentation on the search sentence by adopting the word segmentation algorithm to obtain a search keyword.
6. The method of claim 4, further comprising:
the database server receives M search character strings, and when M is larger than or equal to 2, the database server also acquires a search mode, wherein the search mode is an AND mode or an OR mode;
the database server matches the M search strings with the stored index strings;
if M is 1, obtaining a ciphertext corresponding to the index character string which is the same as the search character string;
if M is more than or equal to 2 and the searching mode is the same mode, acquiring ciphertext corresponding to M index character strings which are the same as the M searching character strings;
and if M is more than or equal to 2 and the searching mode is the OR mode, acquiring the ciphertext corresponding to the index character string which is the same as any searching character string in the M searching character strings.
7. The method of claim 5, further comprising:
the database server receives M search character strings, and when M is larger than or equal to 2, the database server also acquires a search mode, wherein the search mode is an AND mode or an OR mode;
the database server matches the M search strings with the stored index strings;
if M is 1, obtaining a ciphertext corresponding to the index character string which is the same as the search character string;
if M is more than or equal to 2 and the searching mode is the same mode, acquiring ciphertext corresponding to M index character strings which are the same as the M searching character strings;
and if M is more than or equal to 2 and the searching mode is the OR mode, acquiring the ciphertext corresponding to the index character string which is the same as any searching character string in the M searching character strings.
8. An apparatus for creating a ciphertext index, comprising:
the encryption unit is used for encrypting the sensitive data by adopting a reversible encryption algorithm to obtain a ciphertext of the sensitive data;
the word segmentation unit is used for segmenting the sensitive data by adopting a word segmentation algorithm to obtain a target keyword;
the first generation unit is used for generating a Hash authentication code according to the target keyword and a Hash algorithm;
the encoding unit is used for encoding the Hash authentication code by adopting a preset encoding mode to obtain an index character string, wherein the index character string is a printable character string, and the index character string is an index of the ciphertext;
a sending unit, configured to send the ciphertext and the index string to a database server, so that the database server stores the ciphertext and the index string in the same data table, where the index string and the ciphertext are stored correspondingly;
when the number of the index character strings is N, the device further comprises:
the scrambling unit is used for randomly scrambling N index character strings, wherein N is not less than 1 and is an integer;
and the serial connection unit is used for connecting the N index character strings after random scrambling, and adjacent index character strings in the N index character strings after serial connection are spaced by printable characters in a non-preset coding mode.
9. The apparatus of claim 8, further comprising:
the intercepting unit is used for intercepting the first R bits of the Hash authentication code to obtain a sub-Hash authentication code, R is more than or equal to 1 and less than or equal to R, R and R are integers, and R is the length of the Hash authentication code;
the encoding unit is specifically configured to encode the sub-hash authentication code in a preset encoding manner to obtain an index string.
10. The apparatus according to any one of claims 8-9, further comprising:
a first acquisition unit configured to acquire a search keyword;
a second generating unit, configured to generate a search string from the search keyword by using the same method as that used to generate the index string from the target keyword, where the search string is a printable string;
the sending unit is further configured to send the search string to the database server, so that the database server searches the ciphertext according to the search string and the stored index string.
11. The apparatus of claim 10, further comprising:
a second acquisition unit configured to acquire a search sentence;
the first obtaining unit is specifically configured to perform word segmentation on the search sentence by using the word segmentation algorithm to obtain a search keyword.
12. A system for creating a ciphertext index, comprising: the apparatus and database server of any of claims 8-11.
CN201510698146.2A 2015-10-23 2015-10-23 Method, device and system for creating ciphertext index Active CN106610995B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510698146.2A CN106610995B (en) 2015-10-23 2015-10-23 Method, device and system for creating ciphertext index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510698146.2A CN106610995B (en) 2015-10-23 2015-10-23 Method, device and system for creating ciphertext index

Publications (2)

Publication Number Publication Date
CN106610995A CN106610995A (en) 2017-05-03
CN106610995B true CN106610995B (en) 2020-07-07

Family

ID=58613085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510698146.2A Active CN106610995B (en) 2015-10-23 2015-10-23 Method, device and system for creating ciphertext index

Country Status (1)

Country Link
CN (1) CN106610995B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423341B (en) * 2017-05-08 2020-10-16 上海泥娃通信科技有限公司 Ciphertext full-text search system
CN107463848B (en) * 2017-07-18 2021-10-12 北京邮电大学 Application-oriented ciphertext search method, device, proxy server and system
SG10201706106QA (en) * 2017-07-26 2019-02-27 Huawei Int Pte Ltd Searchable Encryption with Hybrid Index
CN108768994B (en) * 2018-05-22 2021-07-27 北京小米移动软件有限公司 Data matching method and device and computer readable storage medium
CN108920967B (en) * 2018-06-28 2022-08-05 深信服科技股份有限公司 Data processing method, device, terminal and computer storage medium
CN110516460B (en) * 2019-08-29 2021-05-14 重庆市筑智建信息技术有限公司 Encryption security method and system for BIM data
CN110689349B (en) * 2019-10-08 2023-07-11 深圳前海微众银行股份有限公司 Transaction hash value storage and searching method and device in blockchain
CN110889017B (en) * 2019-10-15 2022-09-13 福建联迪商用设备有限公司 Retrieval method and terminal for information encrypted through base64
CN111193723B (en) * 2019-12-13 2022-10-14 上海数据交易中心有限公司 Data transmission, matching and storage method and device, storage medium and terminal
CN112711648A (en) * 2020-12-23 2021-04-27 航天信息股份有限公司 Database character string ciphertext storage method, electronic device and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1617584A (en) * 2004-12-06 2005-05-18 武汉大学 Dynamic random mess correction and enciphering-deenciphering method for video frequency information
CN101155128A (en) * 2006-09-29 2008-04-02 华为技术有限公司 Method and system for implementing mobile data business
EP2499562A1 (en) * 2009-11-09 2012-09-19 Arcsight, Inc. Enabling faster full-text searching using a structured data store

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI501580B (en) * 2009-08-07 2015-09-21 Dolby Int Ab Authentication of data streams
CN103064844A (en) * 2011-10-20 2013-04-24 北京中搜网络技术股份有限公司 Indexing equipment, indexing method, search device, search method and search system
US20130238646A1 (en) * 2012-03-06 2013-09-12 Evrichart, Inc. Partial-Match Searches of Encrypted Data Sets
CN110086830B (en) * 2012-08-15 2022-03-04 维萨国际服务协会 Searchable encrypted data
US9069986B2 (en) * 2013-06-18 2015-06-30 International Business Machines Corporation Providing access control for public and private document fields
CN103345526B (en) * 2013-07-22 2016-12-28 武汉大学 A kind of efficient secret protection cryptogram search method under cloud environment
IN2013CH05538A (en) * 2013-12-02 2015-06-12 Infosys Ltd
CN104394155B (en) * 2014-11-27 2017-12-12 暨南大学 It can verify that multi-user's cloud encryption keyword searching method of integrality and completeness
CN104992124A (en) * 2015-08-03 2015-10-21 电子科技大学 Document safety access method for cloud storage environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1617584A (en) * 2004-12-06 2005-05-18 武汉大学 Dynamic random mess correction and enciphering-deenciphering method for video frequency information
CN101155128A (en) * 2006-09-29 2008-04-02 华为技术有限公司 Method and system for implementing mobile data business
EP2499562A1 (en) * 2009-11-09 2012-09-19 Arcsight, Inc. Enabling faster full-text searching using a structured data store

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query;Zhangjie Fu,等;《IEEE Transactions on Consumer Electronics》;20140402;第60卷(第1期);全文 *
可搜索加密技术研究综述;李经纬,等;《软件学报》;20150131;第26卷(第1期);全文 *

Also Published As

Publication number Publication date
CN106610995A (en) 2017-05-03

Similar Documents

Publication Publication Date Title
CN106610995B (en) Method, device and system for creating ciphertext index
US10778441B2 (en) Redactable document signatures
US9977918B2 (en) Method and system for verifiable searchable symmetric encryption
CN110326253B (en) Method and system for fuzzy keyword search of encrypted data
US9736142B2 (en) Tokenization using multiple reversible transformations
US10284372B2 (en) Method and system for secure management of computer applications
CN106161006B (en) Digital encryption algorithm
CN106776904A (en) The fuzzy query encryption method of dynamic authentication is supported in a kind of insincere cloud computing environment
KR101989813B1 (en) Generating and verifying the alternative data in a specified format
EP3637674A1 (en) Computer system, secret information verification method, and computer
JP2012164031A (en) Data processor, data storage device, data processing method, data storage method and program
US20160301524A1 (en) Methods and apparatuses of digital data processing
CN102222188A (en) Information system user password generation method
US11695740B2 (en) Anonymization method and apparatus, device, and storage medium
CN112235104A (en) Data encryption transmission method, system, terminal and storage medium
KR100910303B1 (en) Data encryption and decryption apparatus using variable code table and method thereof
JP7016458B2 (en) Confidential search system, Confidential search method, and Confidential search program
RU2259639C2 (en) Method for complex protection of distributed information processing in computer systems and system for realization of said method
CN115935299A (en) Authorization control method, device, computer equipment and storage medium
CN106357662A (en) MAC (media access control) address-based data encryption method
CN117294429B (en) Public resource transaction data encryption and decryption method, system and medium based on blockchain
CN117459326B (en) Network safety protection system
US11681779B1 (en) Notification service server capable of providing access notification service to harmful sites and operating method thereof
US11829512B1 (en) Protecting membership in a secure multi-party computation and/or communication
CN115146315A (en) Private data protection method, device, equipment and storage medium of low-code platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant