CN112929395A - Cloud data duplicate removal method and system - Google Patents

Cloud data duplicate removal method and system Download PDF

Info

Publication number
CN112929395A
CN112929395A CN201911237434.2A CN201911237434A CN112929395A CN 112929395 A CN112929395 A CN 112929395A CN 201911237434 A CN201911237434 A CN 201911237434A CN 112929395 A CN112929395 A CN 112929395A
Authority
CN
China
Prior art keywords
identification information
blocks
deduplication
response
missed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911237434.2A
Other languages
Chinese (zh)
Other versions
CN112929395B (en
Inventor
唐鑫
周琳娜
胡冰蔚
单伟杰
刘丹
刘小梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Relations, University of
Original Assignee
International Relations, University of
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Relations, University of filed Critical International Relations, University of
Priority to CN201911237434.2A priority Critical patent/CN112929395B/en
Publication of CN112929395A publication Critical patent/CN112929395A/en
Application granted granted Critical
Publication of CN112929395B publication Critical patent/CN112929395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a cloud data duplicate removal method and system. The cloud data deduplication method comprises the following steps: acquiring detection identification information from a received file uploading request; determining the number of the missed blocks and the number of the additional blocks according to the detection identification information; judging whether the number of the missed blocks is equal to the number of the additional blocks; when the number of the missed blocks is equal to the number of the additional blocks, returning a first deduplication response, wherein the identification information in the first deduplication response comprises the identification information of the hit blocks and the identification information of the missed blocks; when the number of the missed blocks is not equal to the number of the additional blocks, returning a second deduplication response, wherein the identification information in the second deduplication response comprises the identification information of the missed blocks; and the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response. The method and the device can avoid privacy disclosure, improve the security of cloud data and reduce communication overhead.

Description

Cloud data duplicate removal method and system
Technical Field
The invention relates to the field of data deduplication, in particular to a cloud data deduplication method and system.
Background
In a cloud storage scenario, a cross-user deduplication technology is widely used for saving overhead of cloud data storage and management, and the deduplication range is expanded from a single user to multiple users, so that deduplication efficiency is further improved. However, in the side channel attack mode, since an attacker knows all public information of the cloud target file, the attacker can generate sensitive information completely by guessing, synthesize a complete file and upload the file to the cloud duplicate removal system for detection, and judge the correctness of the sensitive information in the uploaded file according to a cloud duplicate removal response result. If there are n possibilities for the sensitive information of the file A stored in the cloud, an attacker can steal the existence privacy of the file by detecting the file at most n times.
Fig. 1 is a schematic diagram of a side channel attack model. As shown in fig. 1, it is assumed that all sensitive information of the cloud target file a is contained in one data block B, and the rest data blocks are public information. In order for an attacker to obtain sensitive information, generate (A)1,A2,...,An) The n files only comprise data blocks Bi (i is 1, 2.. n) containing sensitive information, the other public blocks are the same, an attacker uploads the generated files to a cloud duplicate removal system for detection, and if the duplicate removal response prompts the uploaded file A, the attacker uploads the file A to a cloud duplicate removal system for detectionk(k∈[1,n]) Repeating with the cloud target file A, the attacker can judge the file AkSensitive block B in (1)kIs identical to the data block B of the cloud target file A, namely BkThe content in the cloud target file A is sensitive information content, so that the privacy of the cloud target file A is revealed.
FIG. 2 is a schematic diagram of an additional block attack model. As shown in fig. 2, in the additional block attack, an attacker adds redundant blocks that do not exist in the cloud target file a to the file to be detected. Since the number of additional blocks is also randomly generated by an attacker, the cloud end is difficult to judge the actual existence of the detected file. In an attack scenario of X ' additional blocks, the number of the missed blocks which can be detected by the cloud may be X ' +1 or X ', which respectively corresponds to two situations of absence and presence of a detected file. However, the cloud does not know the value of X' since it was randomly chosen by an attacker. Therefore, under the attack of the additional blocks, the cloud end cannot judge the number of the additional blocks of the detected file according to the duplicate removal result, so that the cloud end cannot confuse an attacker through response fuzzification. Therefore, the additional block attack is a huge threat to the security of the cloud data.
In the prior art, random redundancy values of deduplication responses of a hit block and an un-hit block are in different value ranges, so that the possibility of privacy disclosure exists, and a large communication overhead is also caused.
Disclosure of Invention
The embodiment of the invention mainly aims to provide a cloud data duplicate removal method and system, so as to avoid privacy disclosure, improve the security of cloud data and reduce communication overhead.
In order to achieve the above object, an embodiment of the present invention provides a cloud data deduplication method, including:
acquiring detection identification information from a received file uploading request;
determining the number of the missed blocks and the number of the additional blocks according to the detection identification information;
judging whether the number of the missed blocks is equal to the number of the additional blocks;
when the number of the missed blocks is equal to the number of the additional blocks, returning a first deduplication response, wherein the identification information in the first deduplication response comprises the identification information of the hit blocks and the identification information of the missed blocks;
when the number of the missed blocks is not equal to the number of the additional blocks, returning a second deduplication response, wherein the identification information in the second deduplication response comprises the identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response.
An embodiment of the present invention further provides a cloud data deduplication system, including:
the acquisition unit is used for acquiring the detection identification information from the received file uploading request;
a determining unit configured to determine the number of the missed blocks and the number of the additional blocks according to the detection identification information;
a judging unit for judging whether the number of the missed blocks is equal to the number of the additional blocks;
a first returning unit, configured to return a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks, where identification information in the first deduplication response includes identification information of the hit blocks and identification information of the missed blocks;
a second returning unit, configured to return a second deduplication response when the number of the missed blocks is not equal to the number of the additional blocks, where identification information in the second deduplication response includes identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the steps of the cloud data deduplication method are realized when the processor executes the computer program.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the cloud data deduplication method are realized.
The cloud data deduplication method and the cloud data deduplication system of the embodiment of the invention firstly acquire detection identification information from a received file uploading request, then determine the number of missed blocks and the number of additional blocks according to the detection identification information, and then judge whether the number of the missed blocks is equal to the number of the additional blocks; returning a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks; and when the number of the missed blocks is not equal to that of the additional blocks, returning a second duplicate removal response, wherein the number of the identification information in the first duplicate removal response is equal to that of the identification information in the second duplicate removal response, so that privacy disclosure can be avoided, the security of cloud data is improved, and the communication overhead is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a side channel attack model;
FIG. 2 is a schematic diagram of an additional block attack model;
FIG. 3 is a flowchart of a cloud data deduplication method in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a document to be detected and its detection identification information according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a document to be detected and its detection identification information according to another embodiment of the present invention;
FIG. 6 is a schematic diagram of a set of tags and a cloud tag in an embodiment of the present invention;
fig. 7 is a block diagram of a cloud data deduplication system in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In view of the fact that the possibility of privacy disclosure exists in the prior art and a large communication overhead is caused, the embodiment of the invention provides a cloud data deduplication method to avoid privacy disclosure, improve the security of cloud data and reduce the communication overhead. The present invention will be described in detail below with reference to the accompanying drawings.
Fig. 3 is a flowchart of a cloud data deduplication method in an embodiment of the present invention. As shown in fig. 3, the cloud data deduplication method includes:
s101: and acquiring the detection identification information from the received file uploading request.
Before executing S101, the method further includes: a file upload request is received from a detection system. The detection system stores a file to be detected, and the file to be detected comprises a plurality of data blocks. The detection system generates and uploads corresponding detection identification information according to each data block.
S102: and determining the number of the missed blocks and the number of the additional blocks according to the detection identification information.
The detection identification information may be a hash value of the data block.
S103: it is determined whether the number of missed blocks equals the number of additional blocks.
S104: and when the number of the missed blocks is equal to the number of the additional blocks, returning a first deduplication response, wherein the identification information in the first deduplication response comprises the identification information of the hit blocks and the identification information of the missed blocks.
S105: and when the number of the missed blocks is not equal to the number of the additional blocks, returning a second deduplication response, wherein the identification information in the second deduplication response comprises the identification information of the missed blocks.
And the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response.
The execution subject of the cloud data deduplication method shown in fig. 3 may be a computer. As can be seen from the flow shown in fig. 3, the cloud data deduplication method and system in the embodiment of the present invention first obtain the detection identification information from the received file upload request, determine the number of the missed blocks and the number of the additional blocks according to the detection identification information, and then determine whether the number of the missed blocks is equal to the number of the additional blocks; returning a first deduplication response when the number of missed blocks equals the number of additional blocks: and when the number of the missed blocks is not equal to that of the additional blocks, returning a second duplicate removal response, wherein the number of the identification information in the first duplicate removal response is equal to that of the identification information in the second duplicate removal response, so that privacy disclosure can be avoided, the security of cloud data is improved, and the communication overhead is reduced.
In one embodiment, determining the number of missed blocks comprises:
determining the number of the hit blocks according to the detection identification information;
determining the number of the missed blocks according to the number of the detection identification information and the number of the hit blocks; in specific implementation, the number of the missed blocks is the difference between the number of the detection identification information and the number of the hit blocks.
Wherein, determining the number of the hit blocks comprises:
determining a label set corresponding to the detection identification information; acquiring a cloud label in a label set; and matching the cloud label with the detection identification information to determine the number of the hit blocks.
When the number of the missed blocks is equal to that of the additional blocks, it is indicated that detection identification information in the file uploading request is completely matched with a cloud tag in a cloud target file, sensitive blocks in the cloud target file are hit, and the cloud target file comprises a file to be detected corresponding to the detection identification information; at this time, identification information of a hit block needs to be added into the returned first deduplication response to enable the number of the identification information in the first deduplication response to be equal to the number of the identification information in the second deduplication response, so that response fuzzification is achieved, and the purpose of confusing attackers is achieved.
In one embodiment, determining the number of additional blocks comprises:
determining the number of cloud tags;
and determining the number of the additional blocks according to the number of the detection identification information and the number of the cloud labels.
In specific implementation, the number of the additional blocks is the difference between the number of the detection identification information and the number of the cloud end tags.
Assuming that all unpublished sensitive information in the cloud target file is contained in one data block, the data block is called a cloud sensitive block, and other published data blocks are public blocks; the file to be detected comprises a plurality of public blocks and a detection sensitive block generated by a detection system; the flow of one embodiment of the invention is as follows:
1. and acquiring the detection identification information from the received file uploading request.
Fig. 4 is a schematic diagram of a file to be detected and detection identification information thereof in an embodiment of the present invention. Fig. 5 is a schematic diagram of a document to be detected and its detection identification information in another embodiment of the present invention. As shown in fig. 4 and 5, the file to be detected includes a plurality of data blocks (including Y 'additional blocks), where the data block Cs and the data block Cs' are detection sensitive blocks generated by the detection system, and the detection tag set t{F}And t{F}' to detect a set of identification information, tCsAnd tCs' detection identification information corresponding to the detection sensitive block.
2. Determining a label set corresponding to the detection identification information; and acquiring cloud tags in the tag set, and matching the cloud tags with the detection identification information to determine the number of the hit blocks.
Fig. 6 is a schematic diagram of a tag set and a cloud tag in an embodiment of the invention. As shown in fig. 6, the cloud file includes a plurality of tag sets, and a tag set t corresponding to the detection identification information is determined first{F}Then, a plurality of cloud tags in the tag set, such as t, are obtainedC1,tC2,tCsAnd tCn. Wherein, tCsAnd the cloud end label corresponds to the cloud end sensitive block. When the detection identification information includes tCsWhen the number of the hit blocks is the number of the open blocks plus one (the number of all data blocks in the file to be detected), when the detection identification information does not include tCsAnd then, the number of the hit blocks is the number of the public blocks (the number of all data blocks in the file to be detected is reduced by one).
3. And determining the number of the missed blocks according to the number of the detection identification information and the number of the hit blocks.
The number of the missed blocks is the difference between the number of the detection identification information and the number of the hit blocks.
As shown in fig. 6, the detection identification information does not include tCsThe number of time-lapse missing blocks includes t in comparison with the detection identification informationCsThe number of the time-lapse missed blocks is one more. When it is detected that the identification information does not include tCsWhen the number of the missed blocks is the number of the additional blocks plus one; when the detection identification information includes tCsThe number of the missed blocks is addedThe number of blocks.
4. Determining the number of cloud tags; and determining the number of the additional blocks according to the number of the detection identification information and the number of the cloud labels.
The number of the additional blocks is the difference between the number of the detection identification information and the number of the cloud end tags. Under normal conditions, no additional block exists in the file to be detected corresponding to the detection identification information uploaded by the ordinary user.
5. It is determined whether the number of missed blocks equals the number of additional blocks.
As shown in fig. 6, when the detection identification information includes tCsThe number of missed blocks equals the number of additional blocks, at which point the first deduplication response is returned. When it is detected that the identification information does not include tCsThe number of missed blocks equals the number of additional blocks plus one, at which time a second deduplication response is returned.
Because the number of the identification information of the non-hit block in the first deduplication response is one less than that of the identification information of the non-hit block in the second deduplication response, the first deduplication response comprises the identification information of the non-hit block and the identification information of one hit block, the second deduplication response comprises the identification information of the non-hit block, so that the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response, response fuzzification is realized, and the purpose of confusing an attacker is achieved, and the attacker cannot judge whether the file to be detected exists in the cloud target file through the deduplication response. In addition, the number of the identification information in the returned first deduplication response and the second deduplication response is the minimum value of the identification information required by the confusion attacker, so that the communication overhead can be reduced.
To sum up, the cloud data deduplication method of the embodiment of the present invention obtains detection identification information from a received file upload request, determines the number of missed blocks and the number of additional blocks according to the detection identification information, and then determines whether the number of missed blocks is equal to the number of additional blocks; returning a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks; and when the number of the missed blocks is not equal to that of the additional blocks, returning a second duplicate removal response, wherein the number of the identification information in the first duplicate removal response is equal to that of the identification information in the second duplicate removal response, so that privacy disclosure can be avoided, the security of cloud data is improved, and the communication overhead is reduced.
Based on the same inventive concept, the embodiment of the invention also provides a cloud data deduplication system, and as the problem solving principle of the system is similar to that of a cloud data deduplication method, the implementation of the system can refer to the implementation of the method, and repeated parts are not described again.
Fig. 7 is a block diagram of a cloud data deduplication system in the embodiment of the present invention. As shown in fig. 7, the cloud data deduplication system includes:
the acquisition unit is used for acquiring the detection identification information from the received file uploading request;
a determining unit configured to determine the number of the missed blocks and the number of the additional blocks according to the detection identification information;
a judging unit for judging whether the number of the missed blocks is equal to the number of the additional blocks;
a first returning unit, configured to return a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks, where identification information in the first deduplication response includes identification information of the hit blocks and identification information of the missed blocks;
a second returning unit, configured to return a second deduplication response when the number of the missed blocks is not equal to the number of the additional blocks, where identification information in the second deduplication response includes identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response.
In one embodiment, the determining unit is specifically configured to:
determining the number of the hit blocks according to the detection identification information;
and determining the number of the missed blocks according to the number of the detection identification information and the number of the hit blocks.
In one embodiment, the determining unit is specifically configured to:
determining a label set corresponding to the detection identification information;
acquiring a cloud label in a label set;
and matching the cloud label with the detection identification information to determine the number of the hit blocks.
In one embodiment, the determining unit is specifically configured to:
determining the number of cloud tags;
and determining the number of the additional blocks according to the number of the detection identification information and the number of the cloud labels.
To sum up, the cloud data deduplication system of the embodiment of the present invention first obtains detection identification information from a received file upload request, determines the number of the missed blocks and the number of the additional blocks according to the detection identification information, and then determines whether the number of the missed blocks is equal to the number of the additional blocks; returning a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks; and when the number of the missed blocks is not equal to that of the additional blocks, returning a second duplicate removal response, wherein the number of the identification information in the first duplicate removal response is equal to that of the identification information in the second duplicate removal response, so that privacy disclosure can be avoided, the security of cloud data is improved, and the communication overhead is reduced.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, all or part of contents of the cloud data deduplication method may be implemented, for example, when the processor executes the computer program, the following contents may be implemented:
acquiring detection identification information from a received file uploading request;
determining the number of the missed blocks and the number of the additional blocks according to the detection identification information;
judging whether the number of the missed blocks is equal to the number of the additional blocks;
when the number of the missed blocks is equal to the number of the additional blocks, returning a first deduplication response, wherein the identification information in the first deduplication response comprises the identification information of the hit blocks and the identification information of the missed blocks;
when the number of the missed blocks is not equal to the number of the additional blocks, returning a second deduplication response, wherein the identification information in the second deduplication response comprises the identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response.
To sum up, the computer device of the embodiment of the present invention first obtains the detection identification information from the received file upload request, determines the number of the missed blocks and the number of the additional blocks according to the detection identification information, and then determines whether the number of the missed blocks is equal to the number of the additional blocks; returning a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks; and when the number of the missed blocks is not equal to that of the additional blocks, returning a second duplicate removal response, wherein the number of the identification information in the first duplicate removal response is equal to that of the identification information in the second duplicate removal response, so that privacy disclosure can be avoided, the security of cloud data is improved, and the communication overhead is reduced.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, may implement all or part of contents of a cloud data deduplication method, for example, when the processor executes the computer program, the following contents may be implemented:
acquiring detection identification information from a received file uploading request;
determining the number of the missed blocks and the number of the additional blocks according to the detection identification information;
judging whether the number of the missed blocks is equal to the number of the additional blocks;
when the number of the missed blocks is equal to the number of the additional blocks, returning a first deduplication response, wherein the identification information in the first deduplication response comprises the identification information of the hit blocks and the identification information of the missed blocks;
when the number of the missed blocks is not equal to the number of the additional blocks, returning a second deduplication response, wherein the identification information in the second deduplication response comprises the identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to that of the identification information in the second deduplication response.
To sum up, the computer-readable storage medium of the embodiment of the present invention first obtains the detection identification information from the received file upload request, determines the number of the missed blocks and the number of the additional blocks according to the detection identification information, and then determines whether the number of the missed blocks is equal to the number of the additional blocks; returning a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks; and when the number of the missed blocks is not equal to that of the additional blocks, returning a second duplicate removal response, wherein the number of the identification information in the first duplicate removal response is equal to that of the identification information in the second duplicate removal response, so that privacy disclosure can be avoided, the security of cloud data is improved, and the communication overhead is reduced.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The various illustrative logical blocks, or elements, or devices described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.
In one or more exemplary designs, the functions described above in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and which can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.

Claims (10)

1. A cloud data deduplication method is characterized by comprising the following steps:
acquiring detection identification information from a received file uploading request;
determining the number of the missed blocks and the number of the additional blocks according to the detection identification information;
judging whether the number of the missed blocks is equal to the number of the additional blocks;
when the number of the missed blocks is equal to the number of the additional blocks, returning a first deduplication response, wherein identification information in the first deduplication response comprises identification information of the hit blocks and identification information of the missed blocks;
when the number of the missed blocks is not equal to the number of the additional blocks, returning a second deduplication response, wherein identification information in the second deduplication response comprises identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to the number of the identification information in the second deduplication response.
2. The cloud data deduplication method of claim 1, wherein determining the number of the missed blocks comprises:
determining the number of hit blocks according to the detection identification information;
and determining the number of the missed blocks according to the number of the detection identification information and the number of the hit blocks.
3. The cloud data deduplication method of claim 2, wherein determining the number of hit blocks comprises:
determining a label set corresponding to the detection identification information;
acquiring cloud tags in the tag set;
and matching the cloud tag with the detection identification information to determine the number of the hit blocks.
4. The cloud data deduplication method of claim 3, wherein determining the number of additional blocks comprises:
determining the number of the cloud tags;
and determining the number of the additional blocks according to the number of the detection identification information and the number of the cloud end tags.
5. A cloud data deduplication system, comprising:
the acquisition unit is used for acquiring the detection identification information from the received file uploading request;
a determining unit, configured to determine the number of the missed blocks and the number of the additional blocks according to the detection identification information;
a judging unit configured to judge whether the number of the missed blocks is equal to the number of the additional blocks;
a first returning unit, configured to return a first deduplication response when the number of the missed blocks is equal to the number of the additional blocks, where identification information in the first deduplication response includes identification information of a hit block and identification information of the missed block;
a second returning unit, configured to return a second deduplication response when the number of the missed blocks is not equal to the number of the additional blocks, where identification information in the second deduplication response includes identification information of the missed blocks;
and the number of the identification information in the first deduplication response is equal to the number of the identification information in the second deduplication response.
6. The cloud data deduplication system of claim 5, wherein the determining unit is specifically configured to:
determining the number of hit blocks according to the detection identification information;
and determining the number of the missed blocks according to the number of the detection identification information and the number of the hit blocks.
7. The cloud data deduplication system of claim 6, wherein the determining unit is specifically configured to:
determining a label set corresponding to the detection identification information;
acquiring cloud tags in the tag set;
and matching the cloud tag with the detection identification information to determine the number of the hit blocks.
8. The cloud data deduplication system of claim 7, wherein the determining unit is specifically configured to:
determining the number of the cloud tags;
and determining the number of the additional blocks according to the number of the detection identification information and the number of the cloud end tags.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the cloud data deduplication method of any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the cloud data deduplication method according to any one of claims 1 to 4.
CN201911237434.2A 2019-12-05 2019-12-05 Cloud data deduplication method and system Active CN112929395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911237434.2A CN112929395B (en) 2019-12-05 2019-12-05 Cloud data deduplication method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911237434.2A CN112929395B (en) 2019-12-05 2019-12-05 Cloud data deduplication method and system

Publications (2)

Publication Number Publication Date
CN112929395A true CN112929395A (en) 2021-06-08
CN112929395B CN112929395B (en) 2022-06-28

Family

ID=76161144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911237434.2A Active CN112929395B (en) 2019-12-05 2019-12-05 Cloud data deduplication method and system

Country Status (1)

Country Link
CN (1) CN112929395B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069111A (en) * 2015-08-10 2015-11-18 广东工业大学 Similarity based data-block-grade data duplication removal method for cloud storage
US20160162218A1 (en) * 2014-12-03 2016-06-09 International Business Machines Corporation Distributed data deduplication in enterprise networks
US20170116217A1 (en) * 2015-03-24 2017-04-27 Intellectual Ventures Hong Kong Limited High bit rate covert channel in cloud storage systems
US20170208052A1 (en) * 2016-01-19 2017-07-20 Hope Bay Technologies, Inc Hybrid cloud file system and cloud based storage system having such file system therein

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160162218A1 (en) * 2014-12-03 2016-06-09 International Business Machines Corporation Distributed data deduplication in enterprise networks
US20170116217A1 (en) * 2015-03-24 2017-04-27 Intellectual Ventures Hong Kong Limited High bit rate covert channel in cloud storage systems
CN105069111A (en) * 2015-08-10 2015-11-18 广东工业大学 Similarity based data-block-grade data duplication removal method for cloud storage
US20170208052A1 (en) * 2016-01-19 2017-07-20 Hope Bay Technologies, Inc Hybrid cloud file system and cloud based storage system having such file system therein

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
POORANIAN,ZAHRA等: "《RARE: Defeating Side Channels based on Data-Deduplication in Cloud Storage》", 《2018 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS): CCSNA 2018: CLOUD COMPUTING SYSTEMS, NETWORKS, AND APPLICATIONS》 *

Also Published As

Publication number Publication date
CN112929395B (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN111901327B (en) Cloud network vulnerability mining method and device, electronic equipment and medium
US20110276578A1 (en) Obtaining file system view in block-level data storage systems
WO2020000743A1 (en) Webshell detection method and related device
WO2010135082A1 (en) Localized weak bit assignment
CN109600362B (en) Zombie host recognition method, device and medium based on recognition model
CN107070940B (en) Method and device for judging malicious login IP address from streaming login log
CN110851535B (en) Data processing method and device based on block chain, storage medium and terminal
CN112769775B (en) Threat information association analysis method, system, equipment and computer medium
CN111464513A (en) Data detection method, device, server and storage medium
CN110008462B (en) Command sequence detection method and command sequence processing method
CN111049783A (en) Network attack detection method, device, equipment and storage medium
US20190179804A1 (en) Tracking file movement in a network environment
CN113141335B (en) Network attack detection method and device
CN110619022B (en) Node detection method, device, equipment and storage medium based on block chain network
CN110826461A (en) Video content identification method and device, electronic equipment and storage medium
CN110865982A (en) Data matching method and device, electronic equipment and storage medium
CN112929395B (en) Cloud data deduplication method and system
CN106651183B (en) Communication data security audit method and device of industrial control system
CN107995167B (en) Equipment identification method and server
CN112783971B (en) Transaction recording method, transaction query method, electronic device and storage medium
CN110855614B (en) Method and device for processing shared black product information in industry
JP2018121262A (en) Security monitoring server, security monitoring method, program
CN111967043B (en) Method, device, electronic equipment and storage medium for determining data similarity
US20200007499A1 (en) Big-data-based business logic learning method and protection method and apparatuses thereof
CN113139179A (en) Web attack-based analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant