Summary of the invention
First aspect of the present invention provides a kind of processing method of data, comprising:
Receive the data backup requests message that client is sent, said data backup requests message comprises: the finger print information of ID and Backup Data;
According to the finger print information of said Backup Data, inquire about said ID corresponding file, judge whether to exist the data identical with said Backup Data; Do not have to have the data identical if judge,, inquire about other ID corresponding file folders, judge whether to exist the data identical with said Backup Data then according to the finger print information of said Backup Data with said Backup Data;
If judge in said other ID corresponding file folders and have the data identical with said Backup Data; Wherein, The data identical with said Backup Data are second historical data in said other history identification corresponding file folders, and whether the quantity of judging the reference data that said second historical data is corresponding is less than the said second historical data random number corresponding; Wherein, said random number is more than or equal to predetermined threshold value;
If the quantity of judging the corresponding reference data of said second historical data is less than the said second historical data random number corresponding; Then send backup messages and give said client, and receive the said Backup Data of said client transmission and the reference data that generates said second historical data.
Another aspect of the present invention provides a kind of treatment facility of data, comprising:
Transceiver module is used to receive the data backup requests message that client is sent, and said data backup requests message comprises: the finger print information of ID and Backup Data;
Judge module is used for the finger print information according to said Backup Data, inquires about said ID corresponding file, judges whether to exist the data identical with said Backup Data; Do not have to have the data identical if judge,, inquire about other ID corresponding file folders, judge whether to exist the data identical with said Backup Data then according to the finger print information of said Backup Data with said Backup Data; If judge in said other ID corresponding file folders and have the data identical with said Backup Data; Wherein, The data identical with said Backup Data are second historical data in said other history identification corresponding file folders, and whether the quantity of judging the reference data that said second historical data is corresponding is less than the said second historical data random number corresponding; Wherein, said random number is more than or equal to predetermined threshold value;
Said transceiver module also is used for then sending backup messages and giving said client if said judge module is judged the quantity of the corresponding reference data of said second historical data less than the said second historical data random number corresponding; And receive said Backup Data;
The reference data generation module is used to generate the reference data of said second historical data.
Another aspect of the present invention provides a kind of treatment system of data, comprising: the treatment facility of client and above-mentioned described data.
In the embodiment of the invention; Finger print information according to this Backup Data; Inquire about in the data in this ID corresponding file folder and other user folders; Whether stored identical data, if judge when obtaining having identical data in the data in other user files, whether the quantity of judging the reference data that this identical data is corresponding is less than this identical data random number corresponding; Wherein the identical data in other ID corresponding file is second historical data; Can the quantity and the random number size of the reference data of second historical data be compared, thereby when making other users will have the Backup Data of guessing content to backup to high in the clouds, even the data of identical content have been preserved in high in the clouds; Because in the quantity of the corresponding reference data of second historical data during less than the second historical data random number corresponding; Client still will be transmitted this Backup Data to high in the clouds, therefore makes other users can't detect in the database whether backed up identical data, and then has avoided the user's data leakage effectively.
Embodiment
Fig. 1 is the flow chart of an embodiment of processing method of data of the present invention, and as shown in Figure 1, the executive agent of present embodiment is the treatment facility of data, and this equipment is arranged in the cloud storage, and then this method comprises:
The data backup requests message that step 101, reception client are sent, this data backup requests message comprises: the finger print information that ID and Backup Data are corresponding;
Wherein, The cloud storage can also be referred to as high in the clouds; This cloud storage is in cloud computing (cloud computing) conceptive extension and develops a new notion of coming out; Be meant through functions such as cluster application, grid or distributed file systems, a large amount of various dissimilar memory devices in the network are gathered collaborative work through application software, a system of storage and Operational Visit function externally is provided jointly.Finger print information can be Hash (HASH) value of Backup Data, also can use numerical value that other can the unique characteristic of representative data to be used as the finger print information of these data.
In the present embodiment, client is a unit with the Backup Data, calculates the finger print information of this Backup Data, and finger print information and the ID with this Backup Data is carried at the treatment facility that sends to data in the data backup requests message again.
Step 102, according to the finger print information of this Backup Data, inquire about this ID corresponding file folder, judge whether to exist the data identical with this Backup Data; Do not have to have the data identical if judge,, inquire about this other ID corresponding file folders, judge whether to exist the data identical with this Backup Data then according to the finger print information of this Backup Data with this Backup Data.
Step 103, there are the data identical with this Backup Data if judge in these other ID corresponding file folder; Wherein, The data identical with this Backup Data are second historical data in these other ID corresponding file folders, and whether the quantity of judging the reference data that this second historical data is corresponding is less than this second historical data random number corresponding; Wherein, this random number is more than or equal to predetermined threshold value.
Step 104, if the quantity of judging the corresponding reference data of this second historical data less than this second historical data random number corresponding; Then send backup messages and give this client, and receive this Backup Data of this client transmission and the reference data that generates this second historical data
In the present embodiment; The size of this reference data is very little; And the content of this reference data is the sensing to the historical data of identical content, and promptly client can be according to the sensing of the historical data with identical content when reading these data; Find the corresponding identical historical data of content of this sensing, and this historical data is read out.
In the present embodiment, stored history corresponding random number average is to generate at random in the treatment facility of data, and promptly any two historical data random number corresponding can be identical, also can be inequality.When random number more than or equal to predetermined threshold value; Then in case judge the existence data identical in other ID files with this Backup Data; It is second historical data that the data identical with this Backup Data then are set, and whether the reference data of judging this second historical data is less than the random number of this second historical data.Because being the historical data content that how to prevent secret, the technical problem that the present invention mainly solves is not stolen; Then generally speaking, the quantity of the secret historical data of storage is 1, and the quantity of reference data generally also is 1; When the HASH value of the HASH of Backup Data value and this confidential data is identical; The content that this Backup Data then is described is identical with the content of secret historical data, but under the situation of reference data less than random number, then still need point out the user to preserve said Backup Data; Therefore, the user can't know that high in the clouds stored the secret historical data identical with Backup Data.If the user is once more to the high in the clouds backup data identical with this Backup Data; Even in data preservation process; The user judges through data traffic and has preserved identical data in the database, and promptly high in the clouds has adopted source end data de-duplication to handle, but because stored one time Backup Data before; And random number is ignorant for this user, whether has stored the secret historical data identical with this Backup Data so still can't determine high in the clouds, the end.
In the present embodiment; Receive the data backup requests message of the HASH value that carries ID and Backup Data of client transmission; And according to the finger print information of this Backup Data; Inquire about in the data in this ID corresponding file folder and other user folders, whether stored identical data, if judge when obtaining having identical data in the data in other user files; Whether the quantity of judging the reference data that this identical data is corresponding is less than this identical data random number corresponding; Wherein the identical data in other ID corresponding file is second historical data, can the quantity and the random number size of the reference data of second historical data be compared, thereby when making other users will have the Backup Data of guessing content to backup to high in the clouds; Even the data of identical content have been preserved in high in the clouds; Because less than the second historical data random number corresponding, and random number is during more than or equal to predetermined threshold value in the quantity of the corresponding reference data of second historical data, client still will be transmitted this Backup Data to high in the clouds; Therefore make other users can't detect in the database whether backed up identical data, and then avoided the user's data leakage effectively.
Fig. 2 is the flow chart of another embodiment of processing method of data of the present invention; At present embodiment, the executive agent of this method is the treatment facility of data, and this equipment is arranged in the cloud storage; And with the finger print data is that the HASH value is an example; Introduce the technical scheme of present embodiment in detail, then as shown in Figure 2, then this method comprises:
The data backup requests message that step 201, reception client are sent, this data backup requests message comprises: the HASH value of ID and Backup Data.
In the present embodiment, client is a unit with the Backup Data, calculates the HASH value of this Backup Data, and HASH value and the ID with this Backup Data is carried at the treatment facility that sends to data in the data backup requests message again.
Step 202, according to the HASH value of this Backup Data, inquire about this ID corresponding file folder, judge whether to exist the data identical with this Backup Data, if do not exist, then execution in step 203; If exist, then execution in step 207.
Step 203, according to the HASH value of this Backup Data; Inquire about other ID corresponding file folders; Judge whether to exist the data identical with this Backup Data, wherein, in other ID corresponding file folders, having the data identical with this Backup Data is second historical data; If exist, then execution in step 204; If do not exist, then execution in step 209.
Step 204, judge the reference data that this second historical data is corresponding quantity whether less than this second historical data random number corresponding; If less than, then execution in step 205; If more than or equal to, then execution in step 208.Wherein, this random number is more than or equal to predetermined threshold value.
In the present embodiment, stored history corresponding random number average is to generate at random in the treatment facility of data, and promptly any two historical data random number corresponding can be identical, also can be inequality.The setting of random number threshold can be provided with the backup custom of confidential data according to the statistical analysis user, for example; For confidential information, the user gets used to only preserving once usually, and the quantity of this user's confidential information reference data just can not surpass 2 usually; So, the threshold value of this random number just can be set to 2, like this; Random number will when other users preserve data beyond the clouds, the situation of source end data de-duplication just can not occur greater than the quantity of reference data.If user's custom is carried out learning after the statistical analysis, user's custom backs up portion again with confidential data usually, and so, the quantity of the reference data of this user's confidential information can not surpass 3 usually, and the threshold value of this random number just can be set to 3.
Step 205, transmission backup messages are given client, and receive this Backup Data of this client transmission and the reference data that generates this second historical data.
Step 206, the quantity of the reference data of second historical data is added 1.Finish.
Step 207, generate the reference data of first historical data, the quantity of the reference data of first historical data is added 1, the backup success message of redispatching is given client.Finish.
Step 208, the corresponding reference data of generation second historical data, and send the backup success message and give client, and execution in step 206.
Step 209, transmission data backup requests acknowledge message are given this client, and receive and preserve the Backup Data that this client is sent, this Backup Data random number corresponding of regeneration.
In the present embodiment, for instance, when the user backs up a Backup Data for the first time; Client is a unit with this Backup Data earlier, calculates the HASH value of this Backup Data, and the HASH value and the ID of this Backup Data is carried at the treatment facility that sends to data in the data backup requests message; After the treatment facility of data receives this data backup requests message,, inquire about this ID corresponding file folder according to the HASH value of this Backup Data; Judge whether to exist the data corresponding,, so do not have the data identical in this document folder with this Backup Data because this Backup Data is a Backup Data that the user backs up for the first time with this Backup Data; Then according to the HASH value of this Backup Data; Inquire about other ID corresponding file folders and whether have the data identical,, send backup request message and give this client if do not exist with this Backup Data; And receive and preserve this client and send this Backup Data, this Backup Data random number corresponding of regeneration; Wherein, the span of this random number can be [2, N], and wherein, N is an integer.This N can be 10.In addition, preserve this Backup Data when the treatment facility of data, the quantity of the reference data that then this Backup Data is corresponding is 1.
If have the data identical in these other ID corresponding file folders with this Backup Data; Wherein, The data that this Backup Data is identical are second historical data, and whether the quantity that need judge the reference data that this second historical data is corresponding is less than this second historical data random number corresponding, if the quantity of the reference data of this second historical data correspondence of judgement is less than this second historical data random number corresponding; Then send backup messages and give client; And receive this Backup Data, and the reference data of this second historical data of regeneration, the quantity of the reference data that second historical data is corresponding adds 1.
When the user backs up this Backup Data for the second time; Client is a unit with this Backup Data earlier, calculates the HASH value of this Backup Data, and the HASH value of this Backup Data is identical with the HASH value that the user backs up this Backup Data for the first time; And the HASH value and the ID of this Backup Data be carried at the treatment facility that sends to data in the data backup requests message; Because this Backup Data is that the user backs up for the second time, the treatment facility of data is judged in this ID corresponding file folder and is had the data identical with this Backup Data, then is provided with in this ID corresponding file folder; The data identical with this Backup Data are first historical data; And generate the reference data of this first historical data, and the quantity of the application data of this first historical data is added 1, the backup success message of redispatching is given client.
In the present embodiment; Through receiving the data backup requests message of the HASH value that carries ID and Backup Data that client sends; And according to the HASH value of this Backup Data; Inquire about in the data in this ID corresponding file folder and other user folders, whether stored identical data, if judge when obtaining having identical data in the data in other user files; Whether the quantity of judging the reference data that this identical data is corresponding is less than this identical data random number corresponding; Wherein the identical data in other ID corresponding file is second historical data, can the quantity and the random number size of the reference data of second historical data be compared, thereby when making other users will have the Backup Data of guessing content to backup to high in the clouds; Even the data of identical content have been preserved in high in the clouds; Because less than the second historical data random number corresponding, and random number is during more than or equal to predetermined threshold value in the quantity of the corresponding reference data of second historical data, client still will be transmitted this Backup Data to high in the clouds; Therefore make other users can't detect in the database whether backed up identical data, and then avoided the user's data leakage effectively.In addition, when the quantity of the corresponding application data of second historical data during, generate the corresponding reference data of second historical data, thereby improved the client backup performance effectively more than or equal to the second historical data random number corresponding.
Fig. 3 is the structural representation of an embodiment of the treatment facility of data of the present invention; As shown in Figure 3; The equipment of present embodiment comprises: transceiver module 11, judge module 12 and reference data generation module 13; Wherein, transceiver module 11 is used to receive the data backup requests message that client is sent, and this data backup requests message comprises: the finger print information of ID and Backup Data; Judge module 12 is used for the finger print information according to this Backup Data, inquires about this ID corresponding file, judges whether to exist the data identical with this Backup Data; Do not have to have the data identical if judge,, inquire about other ID corresponding file folders, judge whether to exist the data identical with this Backup Data then according to the finger print information of this Backup Data with this Backup Data; If judge in these other ID corresponding file folders and have the data identical with this Backup Data; Wherein, The data identical with this Backup Data are second historical data in these other history identification corresponding file folders, and whether the quantity of judging the reference data that this second historical data is corresponding is less than this second historical data random number corresponding; Wherein, this random number is more than or equal to predetermined threshold value.Transceiver module 11 also is used for then sending backup messages and giving this client if this judge module 12 is judged the quantity of the corresponding reference data of this second historical data less than this second historical data random number corresponding; And receive this Backup Data; Reference data generation module 13 is used to generate the reference data of this second historical data.
The treatment facility of the data of present embodiment can be carried out the technical scheme of method embodiment shown in Figure 1, and its principle is similar, repeats no more here.
In the present embodiment; Through receiving the data backup requests message of the finger print information that carries ID and Backup Data that client sends; And according to the finger print information of this Backup Data; Inquire about in the data in this ID corresponding file folder and other user folders, whether stored identical data, if judge when obtaining having identical data in the data in other user files; Whether the quantity of judging the reference data that this identical data is corresponding is less than this identical data random number corresponding; Wherein the identical data in other ID corresponding file is second historical data, can the quantity and the random number size of the reference data of second historical data be compared, thereby when making other users will have the Backup Data of guessing content to backup to high in the clouds; Even the data of identical content have been preserved in high in the clouds; Because less than the second historical data random number corresponding, and random number is during more than or equal to predetermined threshold value in the quantity of the corresponding reference data of second historical data, client still will be transmitted this Backup Data to high in the clouds; Therefore make other users to detect and whether produced source end data de-duplication, and then avoided the user's data leakage effectively.
Fig. 4 is the structural representation of another embodiment of the treatment facility of data of the present invention; As shown in Figure 4; On above-mentioned basis embodiment illustrated in fig. 3, judge module 12 also is used for having the data identical with this Backup Data if judge this ID corresponding file folder, wherein; In this ID corresponding file folder, having the data identical with this Backup Data is first historical data; Reference data generation module 13 also is used to generate the reference data of this first historical data.Transceiver module 11 also is used for sending the backup success message and gives this client.
Further, reference data generation module 13 also is used for then generating the corresponding reference data of this second historical data if judge module 12 is judged the quantity of the corresponding reference data of this second historical data more than or equal to this second historical data random number corresponding; This transceiver module 11 also is used for sending the backup success message and gives this client.
Further, this equipment also comprises: reference data quantity logging modle 14, and the quantity that is used for the reference data that this first historical data is corresponding adds 1; Perhaps, the quantity that also is used for the reference data that this second historical data is corresponding adds 1.
Further, transceiver module 11 also is used for if this judge module 12 is judged at these other ID corresponding file folders not to be had to have the data identical with this Backup Data, then sends backup messages and gives this client; And receive the Backup Data that this client is sent; Then this equipment also comprises: data memory module 15 and random number generation module 16, and wherein, data memory module 15 is used to preserve said Backup Data; Random number generation module 16 is used to generate said Backup Data random number corresponding.
The treatment facility of the data of present embodiment can be carried out the technical scheme of method embodiment shown in Figure 2, and its principle is similar, repeats no more here.
In the present embodiment; Through receiving the data backup requests message of the finger print information that carries ID and Backup Data that client sends; And according to the finger print information of this Backup Data; Inquire about in the data in this ID corresponding file folder and other user folders, whether stored identical data, if judge when obtaining having identical data in the data in other user files; Whether the quantity of judging the reference data that this identical data is corresponding is less than this identical data random number corresponding; Wherein the identical data in other ID corresponding file is second historical data, can the quantity and the random number size of the reference data of second historical data be compared, thereby when making other users will have the Backup Data of guessing content to backup to high in the clouds; Even the data of identical content have been preserved in high in the clouds; Because less than the second historical data random number corresponding, and random number is during more than or equal to predetermined threshold value in the quantity of the corresponding reference data of second historical data, client still will be transmitted this Backup Data to high in the clouds; Therefore make other users to detect and whether produced source end data de-duplication, and then avoided the user's data leakage effectively.In addition, when the quantity of the corresponding application data of second historical data during, generate the corresponding reference data of second historical data, thereby improved the client backup performance effectively more than or equal to the second historical data random number corresponding.
Fig. 5 is the structural representation of an embodiment of the treatment system of data of the present invention; As shown in Figure 5, this system comprises the treatment facility 22 of client 21 and data, wherein; The treatment facility 22 of data can be Fig. 3 or equipment shown in Figure 4; And can execution graph 1 or the technical scheme of method embodiment shown in Figure 2, its principle is similar, repeats no more here.
One of ordinary skill in the art will appreciate that: all or part of step that realizes above-mentioned each method embodiment can be accomplished through the relevant hardware of program command.Aforesaid program can be stored in the computer read/write memory medium.This program the step that comprises above-mentioned each method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
What should explain at last is: above each embodiment is only in order to explaining technical scheme of the present invention, but not to its restriction; Although the present invention has been carried out detailed explanation with reference to aforementioned each embodiment; Those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, perhaps to wherein part or all technical characteristic are equal to replacement; And these are revised or replacement, do not make the scope of the essence disengaging various embodiments of the present invention technical scheme of relevant art scheme.