CN108400970B - Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system - Google Patents

Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system Download PDF

Info

Publication number
CN108400970B
CN108400970B CN201810055819.6A CN201810055819A CN108400970B CN 108400970 B CN108400970 B CN 108400970B CN 201810055819 A CN201810055819 A CN 201810055819A CN 108400970 B CN108400970 B CN 108400970B
Authority
CN
China
Prior art keywords
data
similar
key
user
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810055819.6A
Other languages
Chinese (zh)
Other versions
CN108400970A (en
Inventor
姜涛
袁浩然
陈晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810055819.6A priority Critical patent/CN108400970B/en
Publication of CN108400970A publication Critical patent/CN108400970A/en
Application granted granted Critical
Publication of CN108400970B publication Critical patent/CN108400970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0435Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/065Encryption by serially and continuously modifying data stream elements, e.g. stream cipher systems, RC4, SEAL or A5/3
    • H04L9/0656Pseudorandom key sequence combined element-for-element with data sequence, e.g. one-time-pad [OTP] or Vernam's cipher
    • H04L9/0662Pseudorandom key sequence combined element-for-element with data sequence, e.g. one-time-pad [OTP] or Vernam's cipher with particular pseudorandom sequence generator
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/06Network architectures or network communication protocols for network security for supporting key management in a packet data network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

本发明属于云存储技术领域,公开了一种云环境中相似数据消息锁定加密去重方法,采用相似性保留哈希函数,基于纠错码的密钥提取方法和基于伪随机生成器的安全对称加密算法构成本发明的技术路线。与现有的数据去重方法相比,该方案的实现可以进一步提高现有的方案去重的效率,可以进一步提升云服务器存储空间的利用率,并可以进一步的降低用户与云服务器的计算开销和存储开销。本发明能够实现相似密文数据安全和高效的去重;本还发明采用汉明距离缩减和标签切割最优化方法,通过汉明距离缩减和标签切割最优化的方法,可以提高云服务器进行标签查询的效率,实验结果表明本发明在存储和通信花销方面是高效的。

Figure 201810055819

The invention belongs to the technical field of cloud storage, and discloses a method for locking, encrypting and deduplicating similar data messages in a cloud environment. The encryption algorithm constitutes the technical route of the present invention. Compared with the existing data deduplication method, the implementation of this scheme can further improve the efficiency of the existing scheme for deduplication, can further improve the utilization rate of cloud server storage space, and can further reduce the computing overhead of users and cloud servers. and storage overhead. The invention can realize safe and efficient deduplication of similar ciphertext data; the invention also adopts the Hamming distance reduction and label cutting optimization method, and can improve the cloud server's ability to perform label query through the Hamming distance reduction and label cutting optimization method. The experimental results show that the present invention is efficient in terms of storage and communication costs.

Figure 201810055819

Description

云环境中相似数据消息锁定加密去重方法、云存储系统Similar data message locking, encryption and deduplication method in cloud environment, cloud storage system

技术领域technical field

本发明属于云存储技术领域,尤其涉及一种云环境中相似数据消息锁定加密去重方法、云存储系统。The invention belongs to the technical field of cloud storage, and in particular relates to a method for locking, encrypting and deduplicating similar data messages in a cloud environment, and a cloud storage system.

背景技术Background technique

现如今,每天都有大量的数据被产生和处理。国际数据公司关于数字领域的研究指出到2020年互联网中的数据将达到40000EB,并且数据将持续以每两年翻倍的速度增长。云计算给数据存储方案的范式带来了转变。通过提供可靠的、可扩展的、按需的云存储服务并收取相对低廉的价格,对个人和企业数据管理带来了极大的便利性。思科全球云指数指出所有数据中心流量的83%来自于云,到2019年,数据中心工作量的80%将在云上被处理。根据现在的研究表明,主存储器中的20%到30%数据是冗余的。具体来说,通过在备份存储中实现全部文件的重复数据删除技术,将节约超过50%的标准文件系统的存储空间,超过72%的备份文件系统的存储空间。因此,数据去重技术可以有效的减轻数据存储的压力,通过删除冗余数据减少网络通信量,从而提高系统服务质量。一系列的在线/离线存储系统已经提供了数据去重功能,如IBM公司的InfoSphere QualityStage服务和FirstLogic公司的SAP Data Services等商业数据集成工具支持的重复数据检测。有许多基于聚类的技术,分类技术,链接分析技术或统计技术用于检测重复记录。还有很多软件旨在检测和消除相同或类似的重复数据,如Duplicate Cleaner,VisiPics和DupeGuru。然而,由于在云存储系统中用户将失去其数据的物理控制权,这使得用户数据的安全成为最大的诉求。因此,为了保护云用户敏感数据的安全性,用户将数据外包前通常会对数据进行加密。然而,加密技术的目标是提供明文数据的语义安全性,使得密文数据与随机数据不可区分。因此,在多用户云存储系统中,如何在保护数据安全性的前提下实现数据去重成为了关键且极具挑战的问题。为了解决这个问题,收敛加密方案被提出。在收敛加密中,通过使用文件的哈希值作为收敛密钥,相同的数据总能得到相同的密钥,通过使用收敛密钥加密解密数据,这使得密文去重得以实现。这种加密方案被形式化定义为消息锁定加密,通过从相同的数据得到相同的密钥进行数据加密的方法,使得云服务器可以判断两个密文数据是否是由相同的明文得到的。之后,一系列新的消息锁定加密方案力图提高方案的安全性或者提供其他新的特性。然而那些方案均只考虑了相同数据的去重并不能应用于实现相似数据的去重。许多实际的系统需要在数据检测,数据清理和数据聚合等情况下进行重复数据去重或搜索相似的数据项,如差错,拼写错误和内容不一致。一些相似数据检索方案和去重系统已经被提出,还有很多方案和软件被用来删除内容相似的网页、文本文档、音乐、图片、视频或者是本地磁盘的二进制数据。然而,那些方案和软件主要解决明文数据的去重而不是密文数据。进一步的,他们的方案适用于个人使用而不适用于云环境下的多用户场景。因此,现有的方案很难直接应用于云环境下安全的相似数据去重。尽管现在有方案可以支持隐私保护的云环境中的相似图像去重,但是他们假设存在一个用户群组并且加密密钥在群组中被分享。然而云环境中用户很难知道拥有同样数据的其他用户。一般来说,云环境下相似数据去重的挑战是云用户很难与其他用户沟通协商一个共同的加密密钥,而且云服务器也很难判断两个密文是否是由相似的数据加密得来的。Today, huge amounts of data are generated and processed every day. Research by the International Data Corporation on the digital realm points out that by 2020, the data on the Internet will reach 40,000 exabytes, and the data will continue to double at a rate of doubling every two years. Cloud computing brings a paradigm shift in data storage solutions. It brings great convenience to personal and enterprise data management by providing reliable, scalable, on-demand cloud storage services at relatively low prices. The Cisco Global Cloud Index states that 83 percent of all data center traffic originates in the cloud, and by 2019, 80 percent of data center workloads will be processed on the cloud. According to current research, 20% to 30% of the data in main memory is redundant. Specifically, by implementing the deduplication technology of all files in the backup storage, it will save more than 50% of the storage space of the standard file system and more than 72% of the storage space of the backup file system. Therefore, data deduplication technology can effectively reduce the pressure of data storage, reduce network traffic by removing redundant data, and improve system service quality. A series of online/offline storage systems already provide data deduplication, such as IBM's InfoSphere QualityStage service and FirstLogic's SAP Data Services and other commercial data integration tools support duplicate data detection. There are many clustering based techniques, classification techniques, link analysis techniques or statistical techniques used to detect duplicate records. There are also plenty of software designed to detect and eliminate identical or similar duplicate data, such as Duplicate Cleaner, VisiPics and DupeGuru. However, since users will lose physical control of their data in the cloud storage system, the security of user data becomes the biggest appeal. Therefore, in order to protect the security of cloud users' sensitive data, users usually encrypt the data before outsourcing it. However, the goal of cryptography is to provide the semantic security of plaintext data, making ciphertext data indistinguishable from random data. Therefore, in a multi-user cloud storage system, how to achieve data deduplication under the premise of protecting data security has become a key and challenging issue. To solve this problem, convergent encryption schemes are proposed. In convergent encryption, by using the hash value of the file as the convergent key, the same data can always get the same key. By using the convergent key to encrypt and decrypt the data, this enables ciphertext deduplication. This encryption scheme is formally defined as message locking encryption. By obtaining the same key from the same data for data encryption, the cloud server can determine whether two ciphertext data are obtained from the same plaintext. Since then, a series of new message locking encryption schemes have tried to improve the security of the scheme or provide other new features. However, those schemes only consider the deduplication of the same data and cannot be applied to realize the deduplication of similar data. Many practical systems require deduplication or searching for similar data items such as errors, spelling errors, and content inconsistencies in cases such as data inspection, data cleaning, and data aggregation. Some similar data retrieval schemes and deduplication systems have been proposed, and many schemes and software are used to delete similar content of web pages, text documents, music, pictures, videos or binary data on local disks. However, those schemes and software mainly address deduplication of plaintext data rather than ciphertext data. Further, their scheme is suitable for personal use but not for multi-user scenarios in cloud environment. Therefore, the existing solutions are difficult to be directly applied to secure similar data deduplication in the cloud environment. Although there are currently schemes to support similar image deduplication in a privacy-preserving cloud environment, they assume that there is a user group and that encryption keys are shared among the group. However, it is difficult for users in a cloud environment to know other users who have the same data. Generally speaking, the challenge of deduplication of similar data in a cloud environment is that it is difficult for cloud users to communicate and negotiate a common encryption key with other users, and it is also difficult for cloud servers to determine whether two ciphertexts are encrypted by similar data. of.

综上所述,现有技术存在的问题是 To sum up, the problems existing in the prior art are :

(1)现有的消息锁定加密方案的密钥是通过计算其明文的哈希值得到的,而哈希函数的特性是即使明文有1比特不相同,得到的哈希值也截然不同。因此,使用传统消息锁定加密方案加密得到的密文不再具有相似性,云服务器无法判断两个密文数据的明文是否是相似的,所以现有的消息锁定加密方案很难直接应用于云环境下安全的相似数据去重。(1) The key of the existing message locking encryption scheme is obtained by calculating the hash value of its plaintext, and the characteristic of the hash function is that even if the plaintext is different by 1 bit, the obtained hash value is completely different. Therefore, the ciphertext encrypted by the traditional message locking encryption scheme is no longer similar, and the cloud server cannot judge whether the plaintext of the two ciphertext data is similar, so the existing message locking encryption scheme is difficult to directly apply to the cloud environment Deduplication of similar data under safe.

(2)另一方面,尽管有一些方案可以实现群组用户协商密钥并在群组内共享,然而云环境下用户可以随时随地上传数据,云服务器无法在用户上传数据之前知道所有的数据拥有者,因此群组用户共同协商密钥的方案也无法用于实现相似数据去重。(2) On the other hand, although there are some solutions that can realize group users negotiate keys and share them within the group, users can upload data anytime and anywhere in the cloud environment, and the cloud server cannot know that all the data has ownership before users upload data. Therefore, the scheme of group users negotiating keys together cannot be used to deduplicate similar data.

解决上述技术问题的难度和意义 The difficulty and significance of solving the above technical problems :

(1)如何突破现有消息锁定加密方案的限制,使得相似的数据在加密之后依然是相似的,是相似数据消息锁定加密去重方法需要解决的问题。(1) How to break through the limitations of the existing message locking encryption scheme, so that similar data remains similar after encryption, is a problem that needs to be solved by the similar data message locking encryption and deduplication method.

(2)通过实现相似数据消息锁定加密方案的构造,可以用于实现相似数据加密去重系统,进而使云服务器可以实现相似数据的密文去重,这将进一步提高密文去重的效率,节省云服务器大量的存储资源与管理资源。(2) By realizing the structure of the similar data message locking encryption scheme, it can be used to realize the similar data encryption and deduplication system, so that the cloud server can realize the ciphertext deduplication of similar data, which will further improve the efficiency of ciphertext deduplication, Save a lot of storage resources and management resources of cloud servers.

发明内容SUMMARY OF THE INVENTION

针对现有技术存在的问题,本发明提供了一种云环境中相似数据消息锁定加密去重方法、云存储系统。In view of the problems existing in the prior art, the present invention provides a method for locking, encrypting and deduplicating similar data messages in a cloud environment, and a cloud storage system.

本发明是这样实现的,一种云环境中相似数据消息锁定加密去重方法,所述云环境中相似数据消息锁定加密去重方法使用相似性保留哈希函数(如SimHash或PHash)使得相似的数据可以获得相似的标签,基于纠错码的密钥提取方法使得具有相似的明文数据总能得到相同加密密钥,基于伪随机生成器的安全对称加密算法对相似数据消息锁定加密去重;用户如果希望上传数据,首先使用相似度保留哈希算法来生成明文的去重标签并发送给云服务器,云服务器判断是否有相似的数据已经存储在云服务器上,若云服务器不拥有相似的数据,则需要用户生成相似数据密钥和用于相似密钥恢复的辅助信息,并将加密后的密文数据和辅助信息发送给云服务器;若云服务器拥有相似的数据,则返回用于恢复相似密钥的辅助信息给用户,用户通过恢复出来的相似密钥对数据进行加密,并用得到的密文与服务器进行相似数据拥有验证,若通过验证,则云服务器允许用户访问数据。此外,本发明还通过汉明距离缩减和标签切割最优化的方法提高标签查询效率。The present invention is implemented as follows: a method for locking, encrypting and deduplicating similar data messages in a cloud environment, wherein the method for locking, encrypting and deduplicating similar data messages in a cloud environment uses a similarity preserving hash function (such as SimHash or PHash) to make similar Data can get similar labels. The key extraction method based on error correction code makes it possible to always obtain the same encryption key for data with similar plaintext. The secure symmetric encryption algorithm based on pseudo-random generator locks, encrypts and deduplicates similar data messages; If you want to upload data, first use the similarity-preserving hash algorithm to generate a plaintext deduplication label and send it to the cloud server. The cloud server determines whether there is similar data already stored on the cloud server. If the cloud server does not have similar data, Users are required to generate similar data keys and auxiliary information for similar key recovery, and send the encrypted ciphertext data and auxiliary information to the cloud server; The auxiliary information of the key is given to the user. The user encrypts the data with the recovered similar key, and uses the obtained ciphertext to verify the ownership of the similar data with the server. If the verification is passed, the cloud server allows the user to access the data. In addition, the present invention also improves the efficiency of label query by means of Hamming distance reduction and label cutting optimization.

进一步,所述云环境中相似数据消息锁定加密去重方法包括以下步骤:Further, the method for locking, encrypting and deduplicating similar data messages in the cloud environment includes the following steps:

客户端首先使用相似度保留哈希算法来生成明文的去重标签并发送给云服务器,云服务器判断是否有相似的数据已经存储在云服务器上;The client first uses the similarity-preserving hash algorithm to generate a clear-text deduplication label and send it to the cloud server, and the cloud server determines whether there is similar data already stored on the cloud server;

若云服务器不拥有相似的数据,则需要用户生成相似数据密钥和用于相似密钥恢复的辅助信息,并将加密后的密文数据和辅助信息发送给云服务器;If the cloud server does not have similar data, the user is required to generate a similar data key and auxiliary information for similar key recovery, and send the encrypted ciphertext data and auxiliary information to the cloud server;

若云服务器拥有相似的数据,则返回用于恢复相似密钥的辅助信息给用户,用户通过恢复出来的相似密钥对数据进行加密,并用得到的密文与服务器进行相似数据拥有验证,若通过验证,则云服务器允许用户访问数据。If the cloud server has similar data, it will return auxiliary information for recovering the similar key to the user. The user encrypts the data with the recovered similar key, and uses the obtained ciphertext to verify the ownership of the similar data with the server. After verification, the cloud server allows the user to access the data.

进一步,所述云环境中相似数据消息锁定加密去重方法使用[n,k,2t+1]F的纠错码C,基于汉明距离的安全略图的思想是使用纠错码C对数据w进行纠错;输入w,均匀随机选择码字c∈C,令s=SS(w)=w-c是c到w所需的变换;计算Rec(w',s),通过公式c'=w'-s然后解码c'得到c;通过w=c+s得到w。Further, the method of locking, encrypting and deduplicating similar data messages in the cloud environment uses the error correction code C of [n, k, 2t+1] F , and the idea of the security sketch based on the Hamming distance is to use the error correction code C to correct the data w Perform error correction; input w, uniformly and randomly select the codeword c∈C, let s=SS(w)=wc be the transformation required from c to w; calculate Rec(w',s), through the formula c'=w' -s then decode c' to get c; get w by w=c+s.

进一步,所述云环境中相似数据消息锁定加密去重方法客户端应用相似度保留哈希算法来生成明文的去重标签和相似数据密钥;使用相似性保留哈希,相似的明文数据将映射到具有特定长度的相似标签和相似数据密钥;在特定的汉明距离内的相似数据总能得到相同的随机加密密钥,第一个用户选择一些辅助参数并计算明文w'的随机密钥;辅助参数将存储在云服务器上;当随后的用户拥有相似明文数据w(w≈w')的标签tw并且想要执行相似数据去重操作,云服务器将发送辅助参数给之后的用户,之后的用户通过运行密钥重生成算法生成密钥kw;如果文件w和文件w'的汉明距离小于特定的值,(fw,fw')<t,则密钥重生成算法将输出相同的随机密钥kw'=kwFurther, in the cloud environment, similar data messages are locked, encrypted, and deduplicated. The client applies a similarity-preserving hash algorithm to generate a plaintext deduplication label and a similar data key; using the similarity-preserving hash, similar plaintext data will be mapped to to similar labels and similar data keys with a certain length; similar data within a certain Hamming distance always get the same random encryption key, the first user chooses some auxiliary parameters and computes the random key for the plaintext w'; Auxiliary parameters will be stored on the cloud server; when a subsequent user has a tag tw of similar plaintext data w (w≈w') and wants to perform a similar data deduplication operation, the cloud server will send the auxiliary parameter to the subsequent user, Subsequent users generate the key k w by running the key regeneration algorithm; if the Hamming distance between the file w and the file w' is less than a specific value, (f w , f w' )<t, the key regeneration algorithm will The same random key kw' = kw is output.

进一步,所述云环境中相似数据消息锁定加密去重方法的相似消息锁定加密方案由六个多项式时间算法构成(FKG,KG,REP,ENC,DEC,TAG):Further, the similar message locking encryption scheme of the similar data message locking encryption and deduplication method in the cloud environment is composed of six polynomial time algorithms (FKG, KG, REP, ENC, DEC, TAG):

FKG(1λ,r2,w)→fkw:是一个基于相似保留哈希函数的相似密钥生成算法,用于让用户计算数据的摘要信息;以安全参数λ、随机数r2∈{0,1}λ和文件w作为输入,输出一个文件的相似摘要fkwFKG(1 λ ,r 2 ,w)→fk w : It is a similar key generation algorithm based on similarity retention hash function, which is used to allow users to calculate the summary information of data; with security parameters λ, random numbers r 2 ∈ { 0,1} λ and file w as input, output a similar summary fk w of a file;

RKG(1λ,r3,fkw)→{kw,Pw}:是一个密钥生成算法,用于让用户计算数据的加密密钥和辅助参数;x是一个公开参数,RKG算法使用安全略图的略图算法SS{r3,w}→Pw和模糊提取器中的提取算法Ext(w,x)→{Kw}生成辅助参数P={x,s}和一个随机加密密钥Kw,其中r3是一个随机参数用于生成一个随机的编码C(r3)→c算法C(·)是一个编码生成算法,编码c用于安全略图中的SS算法;RKG(1 λ ,r 3 ,fk w )→{k w ,P w }: It is a key generation algorithm, which is used to allow users to calculate the encryption key and auxiliary parameters of the data; x is a public parameter, and the RKG algorithm uses The sketch algorithm SS{r 3 ,w}→P w of the secure sketch and the extraction algorithm Ext( w ,x)→{Kw} in the fuzzy extractor generate auxiliary parameters P={x,s} and a random encryption key K w , where r 3 is a random parameter used to generate a random code C(r 3 )→c algorithm C(·) is a code generation algorithm, and the code c is used for the SS algorithm in the security sketch;

REP(fkw',Pw)→kw:是一个密钥再生算法,由用户运行,通过将辅助参数Pw和文件的模糊摘要fkw'作为输入,当且仅当fkw'与fkw相似的时候输出私钥kw;否则输出一个随机值;REP(fk w' ,P w )→k w : is a key regeneration algorithm, run by the user, by taking the auxiliary parameter P w and the fuzzy digest fk w' of the file as input, if and only if fk w' and fk When w is similar, output the private key k w ; otherwise, output a random value;

ENC(kw,w)→cw:是一个加密算法,由用户运行用来计算加密数据并得到相应的密文,以文件w和一个私钥kw作为输入,返回密文

Figure BDA0001553757070000051
其中G(kw)→{0,1}|w|是伪随机生成器,以kw作为输入并输出长度为|w|的伪随机加密密钥G(kw);ENC(k w ,w)→c w : is an encryption algorithm, which is run by the user to calculate the encrypted data and obtain the corresponding ciphertext, taking the file w and a private key kw as input, and returning the ciphertext
Figure BDA0001553757070000051
where G(k w )→{0,1} |w| is a pseudo-random generator that takes k w as input and outputs a pseudo-random encryption key G(k w ) of length |w|;

DEC(kw,cw)→w:是一个解密算法,由用户运行用来计算输入数据的明文;它以密文cw和一个私钥kw作为输入,返回明文

Figure BDA0001553757070000052
DEC(k w ,c w )→w: is a decryption algorithm run by the user to calculate the plaintext of the input data; it takes the ciphertext c w and a private key k w as input, and returns the plaintext
Figure BDA0001553757070000052

TAG(1λ,r1,w)→tw:是一个标签生成算法,通过使用相似保留哈希函数实现,由用户运行用来计算输入数据的摘要。它以安全参数λ,随机数r1和数据w为输入,返回数据标签twTAG(1 λ ,r 1 ,w)→t w : is a tag generation algorithm, implemented by using a similarity-preserving hash function, run by the user to compute a digest of the input data. It takes as input a security parameter λ, a random number r 1 and data w and returns a data label tw .

本发明的另一目的在于提供一种应用所述云环境中相似数据消息锁定加密去重方法的云存储系统。Another object of the present invention is to provide a cloud storage system applying the method for locking, encrypting, and deduplicating similar data messages in the cloud environment.

综上所述,本发明的优点及积极效果为:能够实现安全和高效的相似数据去重的方案,叫做模糊的消息锁定加密方案(FuzzyMLE);采用相似性保留哈希函数,基于纠错码的密钥提出方法和基于伪随机生成器的安全对称加密算法构成本发明的技术路线。另外,通过汉明距离缩减和标签切割最优化的方法提高标签查询效率。最后,分析了本发明的效率,并且通过建立一个实际的系统在公开的数据库上评估了本发明的开销。实验结果表明本发明在存储和通信开销方面是高效的。To sum up, the advantages and positive effects of the present invention are as follows: a safe and efficient scheme for deduplicating similar data, called Fuzzy Message Locking Encryption Scheme (FuzzyMLE); using a similarity preserving hash function, based on error correction codes The method for proposing the key and the secure symmetric encryption algorithm based on the pseudo-random generator constitute the technical route of the present invention. In addition, the label query efficiency is improved through Hamming distance reduction and label cutting optimization. Finally, the efficiency of the present invention is analyzed, and the cost of the present invention is evaluated on a public database by building an actual system. Experimental results show that the present invention is efficient in terms of storage and communication overhead.

本发明针对相似数据安全、高效的跨用户的数据去重。如果云服务器已经存储了用户A的数据,用户B的数据与用户A的数据相似,可以使云服务器在不需要与用户A通信的情况下实现密文去重。形式化定义了相似消息锁定加密方案并且构建了相似消息锁定加密系统。通过将多种技术进行改进和组合,克服了云存储系统中安全高效的相似重复数据去重的挑战。首先,相对于传统消息锁定加密方案中使用的传统密码Hash标签,采用相似保持Hash函数来处理相似数据,并为每个数据生成一个相似标签。其次,代替相同标签查询,Hamming标签查询得到了改进,并被用来提供高效的相似的数据查询功能。同时,采用基于纠错码的相似加密密钥生成方法,根据用户数据在数据相似的条件下生成相似数据加密密钥。而且,采用基于伪随机生成器的安全异或加密方案来替代常规的对称加密算法(例如AES)来实现加密操作。此外,本发明还通过引入汉明距离缩减和标签切割最优化方法,进一步提高了标签查询效率。The present invention aims at safe and efficient cross-user data deduplication for similar data. If the cloud server has stored the data of user A, and the data of user B is similar to the data of user A, the cloud server can realize ciphertext deduplication without communicating with user A. A similar message-locking encryption scheme is formally defined and a similar message-locking encryption system is constructed. By improving and combining multiple technologies, the challenge of safe and efficient deduplication of similar duplicate data in cloud storage systems is overcome. First, compared with traditional cryptographic hash tags used in traditional message locking encryption schemes, similarity-preserving hash function is used to process similar data, and a similarity tag is generated for each data. Second, instead of the same label query, Hamming label query is improved and used to provide an efficient similar data query function. At the same time, a similar encryption key generation method based on error correction code is used to generate similar data encryption keys according to user data under the condition of similar data. Also, a secure XOR encryption scheme based on a pseudo-random generator is used instead of a conventional symmetric encryption algorithm (eg, AES) to implement the encryption operation. In addition, the present invention further improves the label query efficiency by introducing the Hamming distance reduction and label cutting optimization methods.

云存储系统由一个远程云存储服务器(S)和一组客户端(Cs)组成,他们希望在S上存储敏感数据。为了保护数据的安全,Cs想要在上传数据之前加密其敏感的数据。为了减少S和Cs之间的存储开销和不必要的通信开销,S和Cs要实现对上传密文的安全重复数据去重。与现有的只能对相同数据进行安全重复数据去重的安全重复数据去重方法不同,本发明考虑更具挑战性的情况:相似数据的安全重复数据去重。为了提高通信效率,如果Cs已经在S的数据库中存储了一些相似的数据,则只有第一个用户需要将数据的密文上传到S上。实际上,类似于大多数现有的精确安全的重复数据去重方法,用户不需要在系统中直接相互通信。他们分别与S进行通信,S在需要时处理消息或转发消息。A cloud storage system consists of a remote cloud storage server (S) and a set of clients (Cs) who wish to store sensitive data on S. To keep the data safe, Cs want to encrypt their sensitive data before uploading it. In order to reduce the storage overhead and unnecessary communication overhead between S and Cs, S and Cs should implement secure deduplication of the uploaded ciphertext. Different from the existing secure duplicate data deduplication method that can only perform secure duplicate data deduplication on the same data, the present invention considers a more challenging situation: secure duplicate data deduplication of similar data. To improve communication efficiency, if Cs has stored some similar data in S's database, only the first user needs to upload the ciphertext of the data to S. In fact, similar to most existing accurate and secure deduplication methods, users do not need to communicate with each other directly in the system. They respectively communicate with S, which processes messages or forwards them when needed.

附图说明Description of drawings

图1是本发明实施例提供的云环境中相似数据消息锁定加密去重方法流程图。FIG. 1 is a flowchart of a method for locking, encrypting, and deduplicating similar data messages in a cloud environment provided by an embodiment of the present invention.

图2是本发明实施例提供的安全略图的示意图。FIG. 2 is a schematic diagram of a security sketch provided by an embodiment of the present invention.

图3是本发明实施例提供的模糊提取器的示意图。FIG. 3 is a schematic diagram of a blur extractor provided by an embodiment of the present invention.

图4是本发明实施例提供的相似数据锁定加密示意图。FIG. 4 is a schematic diagram of similar data locking and encryption provided by an embodiment of the present invention.

图5是本发明实施例提供的SimHash计算时间示意图。FIG. 5 is a schematic diagram of calculation time of SimHash provided by an embodiment of the present invention.

图6是本发明实施例提供的PHash计算时间示意图。FIG. 6 is a schematic diagram of a PHash calculation time provided by an embodiment of the present invention.

图7是本发明实施例提供的文本数据去重花费时间示意图7 is a schematic diagram of the time spent on deduplication of text data provided by an embodiment of the present invention

图8是本发明实施例提供的图像数据去重花费时间示意图。FIG. 8 is a schematic diagram of the time spent in deduplication of image data provided by an embodiment of the present invention.

图9是本发明实施例提供的测试硬件环境示意图。FIG. 9 is a schematic diagram of a testing hardware environment provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

随着数据的爆炸式增长,数据存储的高效性成为云存储系统需要实现的最重要的目标。大多数的云存储提供商使用数据去重以减轻数据的存储和管理花销。最近几年,为了进一步保护用户数据的隐私性,许多安全的去重方法已经被提出。同时,许多实际应用指出消除相似的(或者错误的)数据能够进一步减少云存储提供商的存储开销,并且能够提高数据存储质量。然而,云存储环境中仍然缺少安全和高效的相似数据去重方法。With the explosive growth of data, the efficiency of data storage has become the most important goal that cloud storage systems need to achieve. Most cloud storage providers use data deduplication to reduce data storage and management overhead. In recent years, in order to further protect the privacy of user data, many secure deduplication methods have been proposed. At the same time, many practical applications point out that eliminating similar (or erroneous) data can further reduce storage overhead for cloud storage providers and can improve data storage quality. However, there is still a lack of safe and efficient methods for deduplication of similar data in cloud storage environments.

如图1所示,本发明实施例提供的云环境中相似数据消息锁定加密去重方法包括以下步骤:As shown in FIG. 1 , the method for locking, encrypting and deduplicating similar data messages in a cloud environment provided by an embodiment of the present invention includes the following steps:

S101:客户端首先使用相似度保留哈希算法来生成明文的去重标签并发送给云服务器,云服务器判断是否有相似的数据已经存储在云服务器上。S101: The client first uses a similarity-preserving hash algorithm to generate a plaintext deduplication label and send it to the cloud server, and the cloud server determines whether there is similar data already stored on the cloud server.

S102:若云服务器不拥有相似的数据,则需要用户生成相似数据密钥和用于相似密钥恢复的辅助信息,并将加密后的密文数据和辅助信息发送给云服务器。S102: If the cloud server does not possess similar data, the user is required to generate a similar data key and auxiliary information for similar key recovery, and send the encrypted ciphertext data and auxiliary information to the cloud server.

S103:若云服务器拥有相似的数据,则返回用于恢复相似密钥的辅助信息给用户,用户通过恢复出来的相似密钥对数据进行加密,并用得到的密文与服务器进行相似数据拥有验证,若通过验证,则云服务器允许用户访问数据。S103: If the cloud server has similar data, return auxiliary information for recovering the similar key to the user, and the user encrypts the data with the recovered similar key, and uses the obtained ciphertext to verify the ownership of the similar data with the server, If verified, the cloud server allows the user to access the data.

下面结合附图对本发明的应用原理作进一步的描述。The application principle of the present invention will be further described below with reference to the accompanying drawings.

1、安全略图1. Safety outline

安全略图可以重构相似的数据并通过辅助信息并且精确地得到相同的数据。使M作为距离函数dis的度量空间,图2描述了安全略图的示意图。它的定义如下:The security sketch can reconstruct similar data and obtain the same data exactly with auxiliary information. Let M be the metric space of the distance function dis, Figure 2 depicts a schematic diagram of the security sketch. It is defined as follows:

一个参数为(M,m,m',t,)的安全略图由两个高效的随机略图算法和恢复算法组成(SS,Rec)。A secure sketch with parameters (M, m, m', t, ) consists of two efficient random sketch algorithms and a recovery algorithm (SS, Rec).

略图算法SS:以元素w∈M作为输入,并输出一个串s∈{0,1}*The sketch algorithm SS: takes an element w∈M as input and outputs a string s∈{0,1} * .

恢复算法Rec:以元素w'∈M和串s∈{0,1}*最为输入。当dis(w,w')≤t时,则Rec(w',SS(w))=w,而且当dis(w,w')≥t时,则不保证Rec的输出。The recovery algorithm Rec: takes the element w'∈M and the string s∈{0,1} * as the most input. When dis(w,w')≤t, then Rec(w',SS(w))=w, and when dis(w,w')≥t, the output of Rec is not guaranteed.

基于汉明距离的安全略图算法:为了从Fn汉明距离纠错码获得一个安全略图,本发明使用[n,k,2t+1]F的纠错码C。基于汉明距离的安全略图的思想是使用纠错码C对数据w进行纠错。举例来说,输入w,均匀随机选择码字c∈C,令s=SS(w)=w-c是c到w所需的变换。计算Rec(w',s),通过公式c'=w'-s然后解码c'得到c。因为dis(w,w')≤t,因此dis(c,c')≤t。最终通过w=c+s得到w。Hamming Distance Based Safe Sketch Algorithm: In order to obtain a safe sketch from the Fn Hamming distance error correction code, the present invention uses an error correction code C of [n,k,2t+1] F . The idea of the Hamming distance-based security sketch is to use the error correction code C to correct the data w. For example, input w, uniformly randomly select a codeword c∈C, let s=SS(w)=wc be the required transformation of c to w. Calculate Rec(w',s), obtain c by formula c'=w'-s and then decode c'. Because dis(w,w')≤t, so dis(c,c')≤t. Finally, w is obtained by w=c+s.

2、模糊提取器2. Fuzzy Extractor

模糊提取器可以使两个相似的数据获得同样的字符串K。一个参数为(M,m,l,t,ε)的模糊提取器由一对有效的生成算法和再生成算法组成(KG,REP)。The fuzzy extractor can make two similar data get the same string K. A fuzzy extractor with parameters (M, m, l, t, ε) consists of a pair of efficient generation and regeneration algorithms (KG, REP).

生成算法KG(w)→{K,P}:以w∈M作为输入,输出一个提取串K∈{0,1}l和一个公开的辅助串P∈{0,1}*Generating algorithm KG(w)→{K,P}: Take w∈M as input, output an extraction string K∈{0,1} l and a public auxiliary string P∈{0,1} * .

再生成算法REP(w',P)→{K}:以w'∈M和串P∈{0,1}*作为输入,如果dis(w,w')≤t而且KG(w)→{K,P},则REP(w',P)=K。如果M的最小熵

Figure BDA0001553757070000081
那么(R,P,E)≈ε(U,P,E),则模糊提取器是安全的。The regeneration algorithm REP(w',P)→{K}: takes as input w'∈M and the string P∈{0,1} * , if dis(w,w')≤t and KG(w)→{ K,P}, then REP(w',P)=K. If the minimum entropy of M
Figure BDA0001553757070000081
Then (R,P,E)≈ ε (U,P,E), then the fuzzy extractor is safe.

由于传统的哈希函数算法(如SHA-1或SHA-256)和对称加密算法(如AES-128或AES-256)不能直接应用于实现相似数据的安全重复数据去重。本发明整合了相似性保留哈希算法(SimHash和PHash),基于纠错码的模糊密钥提取算法,基于一次性填充的异或加密方案等新技术,实现了本发明系统中安全高效的相似数据去重。相似数据去重方法可以实现客户数据的加密/解密,允许云服务器对用户的密文数据进行安全的相似数据去重。Because traditional hash function algorithms (such as SHA-1 or SHA-256) and symmetric encryption algorithms (such as AES-128 or AES-256) cannot be directly applied to achieve secure deduplication of similar data. The invention integrates the similarity preserving hash algorithm (SimHash and PHash), the fuzzy key extraction algorithm based on error correction code, the XOR encryption scheme based on one-time padding and other new technologies, and realizes the safe and efficient similarity in the system of the invention. Data deduplication. The similar data deduplication method can realize the encryption/decryption of customer data, allowing the cloud server to perform secure similar data deduplication on the user's ciphertext data.

相似数据去重方法是设计用来让客户端对数据进行加密,云服务器对密文进行相似重复数据检测。在本发明中,客户端首先应用相似度保留哈希算法来生成明文的去重标签和相似数据密钥。使用相似性保留哈希,相似的明文数据将映射到具有特定长度的相似标签和相似数据密钥(例如64位)。这些固定长度的标签还可以显著降低存储开销。本发明设计了一种基于模糊提取器的随机密钥生成算法,在特定的汉明距离内的相似数据总能得到相同的随机加密密钥。在这个阶段,第一个用户选择一些辅助参数并计算明文w'的随机密钥(例如kw')。然后辅助参数将存储在云服务器上。当随后的用户拥有相似明文数据w(w≈w')的标签tw并且想要执行相似数据去重操作,云服务器将发送辅助参数给之后的用户,之后的用户通过运行密钥重生成算法生成密钥kw。如果文件w和文件w'的汉明距离小于特定的值,比如(fw,fw')<t,则密钥重生成算法将输出相同的随机密钥kw'=kwThe similar data deduplication method is designed to allow the client to encrypt the data, and the cloud server to perform similar duplicate data detection on the ciphertext. In the present invention, the client first applies the similarity preserving hash algorithm to generate the deduplication label and the similarity data key of the plaintext. Using similarity-preserving hashing, similar plaintext data is mapped to similar labels and similar data keys of a certain length (e.g. 64 bits). These fixed-length tags can also significantly reduce storage overhead. The invention designs a random key generation algorithm based on a fuzzy extractor, and similar data within a specific Hamming distance can always obtain the same random encryption key. At this stage, the first user chooses some auxiliary parameters and computes a random key (eg kw' ) for the plaintext w'. The auxiliary parameters will then be stored on the cloud server. When a subsequent user has a tag tw of similar plaintext data w (w≈w') and wants to perform a similar data deduplication operation, the cloud server will send auxiliary parameters to the subsequent user, and the subsequent user will run the key regeneration algorithm by Generate the key k w . If the Hamming distance of file w and file w' is less than a certain value, such as (f w , f w' )<t, the key regeneration algorithm will output the same random key k w' =k w .

由于云服务器需要对用户敏感数据执行相似性检测,因此相似的数据必须加密成 相似的密文。这将违背消息锁定加密采用的传统的加密方法。为了解决这一问题,本发明使 用了简单的基于一次性填充生成器的异或加密算法。类似于流密码,一个伪随机生成器G (·)通过使用相似密钥生成足够的比特长度的加密密钥。如果有两个明文w和w'是相似的, 则它们各自的相似密钥是kw和kw'且kw'=kw;否则,kw'≠kw。本发明可以直观的得到

Figure BDA0001553757070000101
Since the cloud server needs to perform similarity detection on user sensitive data, similar data must be encrypted into similar ciphertext. This would go against the traditional encryption methods employed for message lock encryption. To solve this problem, the present invention uses a simple XOR encryption algorithm based on a one-time padding generator. Similar to stream ciphers, a pseudorandom generator G(·) generates encryption keys of sufficient bit length by using similar keys. If two plaintexts w and w ' are similar, then their respective similar keys are kw and kw' and kw' =kw; otherwise, kw'kw . The present invention can intuitively obtain
Figure BDA0001553757070000101

相似消息锁定加密方案由六个多项式时间算法构成(FKG,KG,REP,ENC,DEC,TAG):Similar message locking encryption scheme consists of six polynomial time algorithms (FKG, KG, REP, ENC, DEC, TAG):

FKG(1λ,r2,w)→fkw:这是一个基于相似保留哈希函数的相似密钥生成算法,用于让用户计算数据的摘要信息。它以安全参数λ、随机数r2∈{0,1}λ和文件w作为输入,输出一个文件的相似摘要fkw。在实际使用中使用SimHash或PHash来实现。FKG(1 λ ,r 2 ,w)→fk w : This is a similarity key generation algorithm based on similarity preserving hash function, which is used to let users calculate the digest information of data. It takes as input a security parameter λ, a random number r 2 ∈ {0,1} λ and a file w, and outputs a similar digest fk w of a file. Use SimHash or PHash to achieve in actual use.

RKG(1λ,r3,fkw)→{kw,Pw}:这是一个密钥生成算法,用于让用户计算数据的加密密钥和辅助参数。x是一个公开参数,RKG算法使用安全略图的略图算法SS{r3,w}→Pw和模糊提取器中的提取算法Ext(w,x)→{Kw}生成辅助参数P={x,s}和一个随机加密密钥Kw。其中r3是一个随机参数用于生成一个随机的编码C(r3)→c(算法C(·)是一个编码生成算法)。编码c用于安全略图中的SS算法。RKG(1 λ ,r 3 ,fk w )→{k w ,P w }: This is a key generation algorithm that lets users calculate encryption keys and auxiliary parameters for data. x is a public parameter, and the RKG algorithm uses the secure thumbnail sketch algorithm SS{r 3 ,w}→P w and the extraction algorithm Ext(w,x)→{K w } in the fuzzy extractor to generate the auxiliary parameter P={x ,s} and a random encryption key K w . where r 3 is a random parameter used to generate a random code C(r 3 )→c (algorithm C(·) is a code generation algorithm). The code c is used for the SS algorithm in the security sketch.

REP(fkw',Pw)→kw:这是一个密钥再生算法,由用户运行。类似于模糊提取器中的再生算法,通过将辅助参数Pw和文件的模糊摘要fkw'作为输入,当且仅当fkw'与fkw相似的时候输出私钥kw;否则输出一个随机值。REP(fk w' ,P w )→k w : This is a key regeneration algorithm, run by the user. Similar to the regeneration algorithm in the fuzzy extractor, by taking the auxiliary parameter Pw and the fuzzy digest fkw ' of the file as input, output the private key kw if and only if fkw ' is similar to fkw ; otherwise, output a random value.

ENC(kw,w)→cw:这是一个加密算法,由用户运行用来计算加密数据并得到相应的密文。它以文件w和一个私钥kw作为输入,返回密文

Figure BDA0001553757070000102
其中G(kw)→{0,1}|w|是伪随机生成器,以kw作为输入并输出长度为|w|的伪随机加密密钥G(kw)。ENC(k w ,w)→c w : This is an encryption algorithm run by the user to calculate encrypted data and get the corresponding ciphertext. It takes as input the file w and a private key k w and returns the ciphertext
Figure BDA0001553757070000102
where G(k w )→{0,1} |w| is a pseudo-random generator that takes k w as input and outputs a pseudo-random encryption key G(k w ) of length |w|.

DEC(kw,cw)→w:这是一个解密算法,由用户运行用来计算输入数据的明文。它以密文cw和一个私钥kw作为输入,返回明文

Figure BDA0001553757070000103
DEC(kw, cw )→ w : This is a decryption algorithm run by the user to compute the plaintext of the input data. It takes as input the ciphertext c w and a private key k w and returns the plaintext
Figure BDA0001553757070000103

TAG(1λ,r1,w)→tw:这是一个标签生成算法,通过使用相似保留哈希函数实现,可以对相似的数据生成相同的摘要。该算法由用户运行用来计算输入数据的摘要。它以安全参数λ,随机数r1和数据w为输入,返回数据标签twTAG(1 λ ,r 1 ,w)→t w : This is a tag generation algorithm implemented by using a similarity-preserving hash function that can generate the same digest for similar data. The algorithm is run by the user to compute a digest of the input data. It takes as input a security parameter λ, a random number r 1 and data w and returns a data label tw .

基于相似数据消息锁定加密方案的定义,本发明在图4给出了方案示意图。类似于消息锁定加密,所有的算法可能依赖于公开参数Pw,它对所有的参与方甚至敌手而言都是公开的。Based on the definition of the similar data message locking encryption scheme, the present invention provides a schematic diagram of the scheme in FIG. 4 . Similar to message lock encryption, all algorithms may rely on public parameters P w , which are public to all participants and even adversaries.

下面结合具体实施例对本发明的应用原理作进一步的描述。The application principle of the present invention will be further described below with reference to specific embodiments.

在本发明的系统中,本发明假设用户Cs是数据的拥有者而且他们希望将其数据外包存储在云服务器上并进行相似去重存储。在用户上传数据之后,用户仅需保留每个数据条目(比如数据w)的身份链接(比如IDw)和加密密钥(比如kw)。通过从云服务器下载密文cw从而解密得到明文数据w。云服务器S存储一个从用户得到的所有数据信息,并维持一个数据集DB={Tag,ID,Cipher}。在本发明的系统中,数据集DB提供三个必须的文件,也就是标签文件,身份链接文件和密文文件。In the system of the present invention, the present invention assumes that the user Cs is the owner of the data and they wish to outsource their data to be stored on the cloud server and perform similar deduplication storage. After the user uploads the data, the user only needs to keep the identity link (eg ID w ) and encryption key (eg k w ) of each data entry (eg data w ). The plaintext data w is obtained by decrypting the ciphertext c w from the cloud server. The cloud server S stores all data information obtained from the user, and maintains a data set DB={Tag, ID, Cipher}. In the system of the present invention, the data set DB provides three necessary files, that is, the label file, the identity link file and the ciphertext file.

相似数据锁定加密方案由三个阶段组成,也就是系统建立阶段,上传阶段和下载阶段。由于上传阶段和下载阶段是两方交互式协议,本发明形式化定义交互式协议如下:Π:[P1:in1;P2:in2]→[P1:out1;P2:out2]。协议Π表示一个交互式协议被两个参与方P1和P2运行,ini和outi表示参与方Pi的输入和输出。相似数据锁定加密系统三个阶段的细节构造如下所示:The similar data locking encryption scheme consists of three stages, namely the system establishment stage, the upload stage and the download stage. Since the upload phase and the download phase are two-party interactive protocols, the present invention formally defines the interactive protocol as follows: Π:[P 1 :in 1 ; P 2 :in 2 ]→[P 1 :out 1 ;P 2 :out 2 ]. Protocol Π denotes an interactive protocol run by two parties P1 and P2, in i and out i denote the inputs and outputs of the parties P i . The detailed structure of the three stages of a similar data locking encryption system is as follows:

系统建立阶段由用户C运行,其中r1和r2是两个公开参数,r3是随机选择的参数,用于作为[n,k,2t+1]F纠错码的输入。不失一般性,本发明假设用户A是数据w'的第一个数据拥有者并且他希望将数据上传到云存储服务器S上。用户A首先运行标签生成算法TAG(1λ,r1,w')→tw'和相似密钥生成算法FKG(1λ,r2,w')→fkw'生成数据w'的标签tw'和相似数据摘要fkw'。(实际上,标签生成算法TAG和相似密钥生成算法FKG都是用SimHash或PHash实现的,因此用户将

Figure BDA0001553757070000111
Figure BDA0001553757070000112
分别作为算法TAG和算法FKG的输入。)在此之后,用户A运行密钥生成算法RKG(1λ,r3,fkw')→{kw',Pw'}得到相似加密密钥kw'和辅助参数Pw'。The system setup phase is run by user C, where r 1 and r 2 are two public parameters and r 3 is a randomly chosen parameter used as input to the [n,k,2t+1] F error correction code. Without loss of generality, the present invention assumes that user A is the first data owner of data w' and he wishes to upload the data to cloud storage server S. User A first runs the label generation algorithm TAG(1 λ ,r 1 ,w')→t w' and the similar key generation algorithm FKG(1 λ ,r 2 ,w')→fk w' to generate the label t of the data w'w' and similar data summaries fk w' . (Actually, both the tag generation algorithm TAG and the similar key generation algorithm FKG are implemented with SimHash or PHash, so users will
Figure BDA0001553757070000111
and
Figure BDA0001553757070000112
As the input of algorithm TAG and algorithm FKG, respectively. ) After that, user A runs the key generation algorithm RKG(1 λ , r 3 , fk w' )→{k w' , P w' } to obtain the similar encryption key k w' and auxiliary parameter P w' .

上传阶段是一个交互式的协议,运行在用户C和云服务器S之间。用户C首先发送标签tw'给云服务器S,标签用于服务器S在它存储的数据库进行相似重复检测。在这个阶段中,有两个不同的情况发生在云服务器上:The upload phase is an interactive protocol that runs between user C and cloud server S. The user C first sends the tag tw' to the cloud server S, and the tag is used for the server S to perform similar duplication detection in the database it stores. During this phase, two different things happen on the cloud server:

不存在重复数据,如果云服务器S现有的数据中不存在标签tw与标签tw'类似,则用户需要上传数据。上传阶段运行如下操作:Upload:[C:tw',w',r3;s:DedupTb]→[C:kw',cw',Pw',Linkw';S:tw',Rw',cw',Linkw']。There is no duplicate data. If there is no label tw similar to the label tw' in the existing data of the cloud server S, the user needs to upload the data. The upload phase runs the following operations: Upload:[C:t w' ,w',r 3 ;s:DedupTb]→[C:k w' ,c w' ,P w' ,Link w' ;S:t w' ,R w' ,c w' ,Link w' ].

用户首先运行随机密钥生成算法RKG(1λ,r3,fkw)→{kw,Pw}生成随机的加密密钥和辅助参数。然后加密得到密文ENC(kw,w')→cw'并发送{tw',Pw',cw'}给云服务器S。S存储{tw',Pw',cw'}并返回链接Linkw'给用户C用于下载密文cw'The user first runs the random key generation algorithm RKG(1 λ , r 3 , fk w )→{k w , P w } to generate random encryption keys and auxiliary parameters. Then encrypt to get the ciphertext ENC(k w ,w')→c w' and send {t w' ,P w' ,c w' } to the cloud server S. S stores {t w' ,P w' ,c w' } and returns the link Link w' to user C for downloading the ciphertext c w' .

已存在重复数据,如果云服务器已经存储了数据w且数据w的标签tw与标签tw'类似,则上传阶段运行如下操作:Upload:[C:tw',w';s:DedupTb,Pw']→[C:kw,cw',Linkw;S:Linkw]。根据标签tw',云服务器S返回辅助信息Pw={xw,sw}给用户C。当用户接收到Pw={xw,sw}后,他首先运行密钥再生成算法REP(fkw',Pw)→kw。然后用户加密得到数据w的密文ENC(kw,w')→cw'。之后云服务器S和用户C执行相似数据拥有证明协议,相似数据拥有证明协议可以有效的验证用户的密文数据cw'是否与服务器上存储的数据cw相似。如果用户通过验证,则云服务器返回用户连接Linkw可以下载存储在云服务器S上的密文数据cw,由于已经有相似数据存储在云服务器上,因此用户也不需要再次上传数据cw'Duplicate data already exists. If the cloud server has stored the data w and the tag tw of the data w is similar to the tag t w ' , the upload stage will run the following operations: Upload:[C:t w' ,w'; s:DedupTb, Pw' ]→[C:kw, cw ' , Linkw ;S: Linkw ]. According to the tag tw' , the cloud server S returns the auxiliary information P w ={x w ,s w } to the user C. When the user receives P w ={x w ,s w }, he first runs the key regeneration algorithm REP(fk w' ,P w )→k w . Then the user encrypts to obtain the ciphertext ENC(k w ,w')→c w' of the data w. Afterwards, the cloud server S and the user C execute the similar data ownership proof protocol, and the similar data ownership proof protocol can effectively verify whether the user's ciphertext data c w' is similar to the data c w stored on the server. If the user passes the verification, the cloud server returns the user to connect Link w to download the ciphertext data c w stored on the cloud server S. Since there is already similar data stored on the cloud server, the user does not need to upload the data c w' again .

下载阶段是一个交互式的协议将由用户C发起用来获得服务器S上的外包数据。协议如下Download:[C:Linkw,kw;s:DedupTb,cw]→[C:w;s:⊥];直观的,如果用户C想要从服务器S上下载w的密文,用户首先发送数据身份链接Linkw给服务器,服务器查询数据库DB寻找身份链接是Linkw的密文Cw。然后服务器将密文Cw发送给用户C。在收到密文Cw之后,用户C运行解密算法DEC(kw,cw)→w得到明文w。在这个过程中,用户首先运行伪随机生成算法得到解密秘钥G(kw)并且计算明文

Figure BDA0001553757070000121
The download phase is an interactive protocol that will be initiated by user C to obtain outsourced data on server S. The protocol is as follows Download: [C:Link w ,k w ; s:DedupTb,c w ]→[C:w; s:⊥]; Intuitively, if user C wants to download the ciphertext of w from server S, the user First, the data identity link Link w is sent to the server, and the server queries the database DB to find the ciphertext C w whose identity link is Link w . The server then sends the ciphertext C w to user C. After receiving the ciphertext Cw, user C runs the decryption algorithm DEC( kw , cw )→ w to obtain the plaintext w. In this process, the user first runs the pseudo-random generation algorithm to obtain the decryption key G(k w ) and calculates the plaintext
Figure BDA0001553757070000121

为了能进一步提高服务器在接收到标签tw'之后找到相似的标签tw的速度,并返回辅助信息Pw={xw,sw}给用户C,我们还设计了汉明距离缩减和标签切割最优化的方法提高标签查询效率。汉明距离缩减的思想如下:由于云服务器存储了大量的数据并拥有大量的标签,如果通过遍历所有的标签从而找到与标签tw'相似的标签tw,这样将带来巨大的计算开销。因此,我们设计了1bits(x)函数,1bits(x)函数用来统计数据x中1bit的个数。如果设置相似数据的阀值为t,则两个数据x和数据y相似必须满足-t≤1bits(x)-1bits(y)≤t。我们对存储在云服务器中的标签均计算其1bits(x)函数的值并降序排列。在服务器查找与标签tw'相似的标签tw时只在满足-t≤1bits(tw')-1bits(tw)≤t的标签里面进行查找。在找到满足-t≤1bits(tw')-1bits(tw)≤t的标签之后我们使用标签切割最优化进一步提高判定两个数据汉明距离的效率。其原理是首先将数据划分成同样大小的分块,从前往后计算每个分块的汉明距离,如果计算到某个块时的汉明距离已经大于t,则说明这两个数据一定不相似。因此,我们不再需要计算具体的两个数据的汉明距离,而是计算到某块时两个数据汉明距离已经超过t,则不需要继续计算并可判定两个数据并不相似。实际中,对于两个长度为n的数据x和数据y,我们分别将数据x和数据y分割为(x1,x2,...,xr)和(y1,y2,...,yr)。前面r-(nmodr)个串的长度为

Figure BDA0001553757070000131
后面nmodr个串的长度为
Figure BDA0001553757070000132
我们首先计算第1块的汉明距离disHam(x1,y1)开始,一直计算到第r块的汉明距离disHam(xr,yr),若在计算到第i块时有disHam(x1,y1)+...+disHam(xi,yi)>t。则说明两个数据不相似,服务器将不再继续计算之后块的汉明距离。In order to further improve the speed at which the server finds similar tags tw after receiving tags tw' , and returns auxiliary information P w ={x w ,s w } to user C, we also design Hamming distance reduction and tag Cutting-optimized methods improve tag query efficiency. The idea of Hamming distance reduction is as follows: Since the cloud server stores a large amount of data and has a large number of labels, if a label tw similar to the label tw ' is found by traversing all the labels, it will bring huge computational overhead. Therefore, we designed the 1bits(x) function, which is used to count the number of 1bits in the data x. If the threshold value of similar data is set to t, the two data x and data y are similar and must satisfy -t≤1bits(x)-1bits(y)≤t. We calculate the value of the 1bits(x) function for the tags stored in the cloud server and sort them in descending order. When the server searches for a tag tw similar to the tag tw', it only searches in the tags that satisfy -t≤1bits( tw ' )-1bits( tw )≤t. After finding a label that satisfies -t≤1bits(t w' )-1bits(t w )≤t, we use label cutting optimization to further improve the efficiency of determining the Hamming distance of two data. The principle is to first divide the data into blocks of the same size, and calculate the Hamming distance of each block from front to back. If the Hamming distance of a block is already greater than t, it means that the two data must be different. resemblance. Therefore, we no longer need to calculate the Hamming distance of the specific two data, but when the Hamming distance of the two data has exceeded t when a certain block is calculated, there is no need to continue the calculation and it can be determined that the two data are not similar. In practice, for two data x and data y of length n, we split the data x and data y into (x 1 ,x 2 ,...,x r ) and (y 1 ,y 2 ,... .,y r ). The length of the first r-(nmodr) strings is
Figure BDA0001553757070000131
The length of the following nmodr strings is
Figure BDA0001553757070000132
We first calculate the Hamming distance dis Ham (x 1 , y 1 ) of the first block, and continue to calculate the Hamming distance dis Ham (x r , y r ) of the rth block. dis Ham (x 1 ,y 1 )+...+dis Ham (x i ,y i )>t. It means that the two data are not similar, and the server will not continue to calculate the Hamming distance of subsequent blocks.

为了进一步提高我们方案的安全性,我们还设计了基于辅助服务器的相似数据消息锁定加密去重方案和基于相似标签的相似数据锁定加密去重方案。基于辅助服务器的相似数据消息锁定加密去重方案通过结合基于RSA的盲签名方案抵抗离线蛮力攻击。假设我们系统的服务器使用的是RSA密钥生成算法,以参数e为输入,输出N和d使得

Figure BDA0001553757070000133
N是两个大素数的乘积。((N,e),(N,d))是输出的私钥公钥对。每一个合法的用户首先在密钥服务器进行注册,输入密钥服务器的公钥和明文数据w,选择一个随机数r并通过FKG(r2,w)算法计算fkw,然后通过算法RKG(r3,fkw)计算得到kw和Pw。最后用户计算x←H(kw·re)并将x发送给密钥服务器。密钥服务器在收到x之后计算y←xd modN并将y返回给用户。用户接收到y之后计算z←y·r-1并验证是否zemodN=H(kw)。如果相等则返回z,如果不相等则返回⊥。z用于通过使用伪随机生成算法计算明文w私有的加密密钥和相似数据验证标签tw=h(G(z))。在基于辅助服务器的相似数据消息锁定加密去重方案中,密钥服务器不能获得加密密钥的任何信息。在基于相似标签的相似数据锁定加密去重方案中,每一个数据(比如数据w)的询问标签是通过TAG(1λ,)→tw得到的。更准确的来说,
Figure BDA0001553757070000141
其中g是双线性群的生成元,h是抗碰撞哈希函数,r是随机数。假设用户C拥有数据w'。用户C首先计算fk'←FKG(r2,w')然后运行密钥再生成算法计算
Figure BDA0001553757070000142
最后对于每一个记录用户验证云服务器S是否存在
Figure BDA0001553757070000143
和标签
Figure BDA0001553757070000144
详细来说,服务器S验证
Figure BDA0001553757070000145
是否相等。在发现相应的标签之后,用户C与服务器进行数据拥有证明协议。In order to further improve the security of our scheme, we also design a similar data message locking encryption and deduplication scheme based on auxiliary servers and a similar data lock encryption and deduplication scheme based on similar labels. Auxiliary server-based lock-encryption and deduplication scheme for similar data messages resists offline brute force attacks by combining with RSA-based blind signature scheme. Suppose the server of our system uses the RSA key generation algorithm, with parameter e as input, and output N and d such that
Figure BDA0001553757070000133
N is the product of two large prime numbers. ((N,e), (N,d)) is the output private key and public key pair. Each legitimate user first registers with the key server, enters the public key of the key server and plaintext data w, selects a random number r and calculates fk w by the FKG(r 2 ,w) algorithm, and then uses the algorithm RKG(r 3 , fk w ) is calculated to obtain k w and P w . Finally the user computes x←H( kw · re ) and sends x to the key server. The key server computes y←x d modN after receiving x and returns y to the user. The user calculates z←y·r -1 after receiving y and verifies whether ze modN=H(k w ). Returns z if equal, ⊥ if not. z is used to verify the label tw =h(G(z)) by computing a private encryption key and similar data of the plaintext w using a pseudo-random generation algorithm. In a similar data message locking encryption deduplication scheme based on the auxiliary server, the key server cannot obtain any information about the encryption key. In the similar data locking encryption and deduplication scheme based on similar tags, the query tag of each data (such as data w) is obtained by TAG(1 λ ,)→t w . More precisely,
Figure BDA0001553757070000141
where g is the generator of the bilinear group, h is the collision-resistant hash function, and r is a random number. Suppose user C has data w'. User C first calculates fk'←FKG(r 2 ,w') and then runs the key regeneration algorithm to calculate
Figure BDA0001553757070000142
Finally, verify whether the cloud server S exists for each recorded user
Figure BDA0001553757070000143
and labels
Figure BDA0001553757070000144
In detail, server S authenticates
Figure BDA0001553757070000145
are equal. After discovering the corresponding tag, user C conducts a data ownership certification agreement with the server.

下面结合实验对本发明的应用效果作详细的描述。The application effect of the present invention will be described in detail below in conjunction with experiments.

本发明的系统在MySQL数据库系统上使用3000行C++代码实现。本发明利用免费的GMP库来实现SimHash算法。密码哈希算法和异或加密算法(SHA-256和异或加密算法)由OpenSSL库来实现。本发明在运行Linux 14.04的两台计算机上分别运行客户端和服务器应用程序,使用的计算机硬件配置为:1.70GHz Intel i5-3317U CPU,4GB内存。为了在局域网上进行实验,本发明实现了客户端与服务器之间的通信,并将这两台机器放在同一个区域。服务器和客户端之间的有线连接的通信带宽设置为10Mbps。为了测量本发明的系统在真实数据集上的性能,本发明使用了亚马逊电影评论文本数据集,包含7911684个文本文件,每个文本文件的长度约为1-15KB,图像数据超过1400万张图像。The system of the present invention is implemented on the MySQL database system using 3000 lines of C++ code. The present invention utilizes the free GMP library to realize the SimHash algorithm. The password hashing algorithm and the XOR encryption algorithm (SHA-256 and XOR encryption algorithm) are implemented by the OpenSSL library. The present invention runs client and server applications respectively on two computers running Linux 14.04, and the used computer hardware is configured as: 1.70GHz Intel i5-3317U CPU and 4GB memory. In order to conduct experiments on the local area network, the present invention realizes the communication between the client and the server, and places the two machines in the same area. The communication bandwidth of the wired connection between the server and the client is set to 10Mbps. In order to measure the performance of the system of the present invention on real datasets, the present invention uses the Amazon movie review text dataset, which contains 7,911,684 text files, each text file is about 1-15KB in length, and the image data exceeds 14 million images .

Figure BDA0001553757070000151
Figure BDA0001553757070000151

表1不同算法的计算时间Table 1 Computation time of different algorithms

对相似数据锁定加密系统的测试结果如表1所示。具体情况如下:FKG是相似哈希函数,本发明在本文中采用了64位的SimHash和64位的PHash进行实现(SimHash只能用于处理文本文件,PHash可以适用于文本和图像文件)。给定固定长度的文本数据(1KB),SimHash和PHash的平均计算时间分别为1386us和5312us。给定一个图像JPEG数据(10KB),PHash的平均计算时间为6439us。RKG算法在64和256位长度下计算分别需要261us和885us。REP算法在64和256位长度下分别需124us和368us。采用异或运算对ENC和DEC中的数据进行加密和解密,对1KB比特串执行ENC和DEC操作需要耗时338us。类似于FKG,TAG的实现也通过使用相似哈希函数实现的(即本发明的方案中使用的64位SimHash算法)。另外,本发明观察到FKG、ENC、DEC和TAG的计算时间与输入的大小有关,本发明在图5和图6分别给出模拟结果。随着输入数据量的大小的增加,它们的计算时间是线性增加的。The test results of similar data locking encryption systems are shown in Table 1. The specific conditions are as follows: FKG is a similar hash function, and the present invention adopts 64-bit SimHash and 64-bit PHash for implementation in this paper (SimHash can only be used to process text files, and PHash can be applied to text and image files). Given fixed-length text data (1KB), the average computation time for SimHash and PHash is 1386us and 5312us, respectively. Given an image JPEG data (10KB), the average computation time of PHash is 6439us. The RKG algorithm needs 261us and 885us to calculate at 64 and 256-bit lengths, respectively. The REP algorithm takes 124us and 368us at 64 and 256-bit lengths, respectively. The data in ENC and DEC are encrypted and decrypted by XOR operation, and it takes 338us to perform ENC and DEC operations on a 1KB bit string. Similar to FKG, the implementation of TAG is also achieved by using a similar hash function (ie the 64-bit SimHash algorithm used in the scheme of the present invention). In addition, the present invention observes that the calculation time of FKG, ENC, DEC and TAG is related to the size of the input, and the present invention presents the simulation results in FIG. 5 and FIG. 6 respectively. Their computation time increases linearly as the size of the input data increases.

最后,本发明用文本数据和图像数据来测试本发明的相似数据去重系统。如图7所示,实验运行在100,000个记录的文本数据库上,每个记录具有256位标记。如图8所示,实验运行在1000个记录的图像数据库上,每个记录具有64位标记。在这两个测试中,明文客户端将与服务器进行交互,验证数据库中是否存在相似的密文。如果服务器数据库中没有重复的类似数据,则客户端将上传其密文。否则,服务器将与用户进行相似数据去重,并将相似数据的链接发送给用户。Finally, the present invention tests the similar data deduplication system of the present invention with text data and image data. As shown in Figure 7, the experiments were run on a text database of 100,000 records, each with a 256-bit tag. As shown in Figure 8, the experiments were run on an image database of 1000 records, each with 64-bit labels. In both tests, the plaintext client will interact with the server to verify that a similar ciphertext exists in the database. If there are no duplicates of similar data in the server database, the client will upload its ciphertext. Otherwise, the server will de-duplicate similar data with the user, and send a link to the similar data to the user.

以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims (3)

1. A locking, encrypting and de-duplicating method for similar data messages in a cloud environment is characterized in that the locking, encrypting and de-duplicating method for similar data messages in the cloud environment adopts a similarity preserving hash function, and realizes the de-duplication of similar data by a secret key extraction method based on an error correcting code and a secure symmetric encryption algorithm based on a pseudo-random generator; the label query efficiency is improved by a Hamming distance reduction and label cutting optimization method;
a client of the similar data message locking encryption deduplication method in the cloud environment applies a similarity preserving hash algorithm to generate a deduplication label of a plaintext and a similar data key; using the similarity-preserving hash, similar plaintext data will be mapped to similar labels and similar data keys having a particular length; the same random encryption key can be always obtained from similar data within a specific Hamming distance, and a first user selects some auxiliary parameters and calculates the random key of a plaintext w'; the auxiliary parameters are stored on the cloud server; when the subsequent user has a label t of similar plaintext data w (w ≈ w')/twAnd when the similar data deduplication operation is required to be executed, the cloud server sends the auxiliary parameters to the later uploaded user, and the later uploaded user generates the key k by running a key regeneration algorithmw(ii) a If the Hamming distance of file w and file w' is less than a specified value, if (f)w,fw') If t is less than t, the key regeneration algorithm outputs the same random key kw'=kw
The similar message locking encryption scheme of the similar data message locking encryption deduplication method in the cloud environment is formed by six polynomial time algorithms (FKG, KG, REP, ENC, DEC, TAG):
FKG(1λ,r2,w)→fkw: the method is a similar key generation algorithm based on a similar reserved hash function and is used for enabling a user to calculate summary information of data; with a security parameter lambda, a random number r2∈{0,1}λThe similar abstract fk of a file is output by taking the file w as inputw
RKG(1λ,r3,fkw)→{kw,Pw}: is a key generation algorithm for the user to calculate the encryption key and auxiliary parameters of the data; x is a public parameter, RKG algorithm uses the outline algorithm SS r of the safety outline3,w}→PwAnd the extraction algorithm Ext (w, x) → { K ] in the blur extractorwGenerating auxiliary parameters P ═ x, s and a random encryption key KwWherein r is3Is a random parameter for generating a random code C (r)3) Algorithm C (-) is a code generation algorithm, code C is used for the SS algorithm in the safety sketch;
REP(fkw',Pw)→kw: is a key regeneration algorithm, which is run by the user by applying the auxiliary parameter PwAnd fuzzy summary fk of the filew'As input, if and only if fkw'And fkwOutputting the private key k at similar timesw(ii) a Otherwise, outputting a random value;
ENC(kw,w)→cw: is an encryption algorithm operated by user to calculate encrypted data and obtain corresponding ciphertext, a file w and a private key kwReturning as input ciphertext
Figure FDA0002640830650000021
Wherein G (k)w)→{0,1}|w|Is a pseudo-random generator, with kwAs input and output a pseudorandom encryption key G (k) of length | w |w);
DEC(kw,cw) → w: is a decryption algorithm run by the user to calculate the plaintext of the input data; it uses the ciphertext cwAnd a private key kwReturning as input the plaintext
Figure FDA0002640830650000022
TAG(1λ,r1,w)→tw: is a label generation algorithm, realized by using a similar retention hash function, operated by a user to calculate the abstract of input data; with a security parameter lambda, a random number r1And data w as input, return data tag tw
2. The method for locking encryption and de-duplication of similar data messages in cloud environment according to claim 1, wherein the method for locking encryption and de-duplication of similar data messages in cloud environment comprises the following steps:
the client generates a duplicate removal label of a plaintext by using a similarity retention hash algorithm and sends the duplicate removal label to a cloud server, and the cloud server judges whether similar data are stored on the cloud server;
if the cloud server does not have similar data, the user is required to generate a similar data key and auxiliary information for similar key recovery, and encrypted ciphertext data and the auxiliary information are sent to the cloud server;
if the cloud server has similar data, returning auxiliary information for recovering the similar key to the user, encrypting the data by the user through the recovered similar key, and performing similar data ownership verification by using the obtained ciphertext and the server, wherein if the data is verified, the cloud server allows the user to access the data.
3. The method for locking encryption and de-duplication of similar data messages in cloud environment as claimed in claim 1, wherein the method for locking encryption and de-duplication of similar data messages in cloud environment uses [ n, k,2t +1 ]]FThe idea of the hamming distance-based safety sketch is to use an error correction codeThe method comprises the steps of correcting error of data w, inputting w, uniformly and randomly selecting a code word C ∈ C, enabling s to be SS (w) to be w-C to be the transformation needed from C to w, calculating Rec (w ', s), obtaining C through a formula C' to w '-s and then decoding C', and obtaining w through w to C + s.
CN201810055819.6A 2018-01-20 2018-01-20 Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system Active CN108400970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810055819.6A CN108400970B (en) 2018-01-20 2018-01-20 Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810055819.6A CN108400970B (en) 2018-01-20 2018-01-20 Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system

Publications (2)

Publication Number Publication Date
CN108400970A CN108400970A (en) 2018-08-14
CN108400970B true CN108400970B (en) 2020-10-02

Family

ID=63094066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810055819.6A Active CN108400970B (en) 2018-01-20 2018-01-20 Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system

Country Status (1)

Country Link
CN (1) CN108400970B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109379182B (en) * 2018-09-04 2021-06-01 西安电子科技大学 Efficient data re-encryption method and system supporting data deduplication and cloud storage system
CN109933618B (en) * 2019-03-08 2023-05-12 西安电子科技大学 Novel encrypted data mining system
CN110086789B (en) * 2019-04-17 2021-07-13 腾讯科技(深圳)有限公司 Data transmission method, device, equipment and medium
CN111211903B (en) * 2019-12-02 2021-06-11 中国矿业大学 Mobile group perception data report duplication removing method based on fog calculation and privacy protection
CN111050133B (en) * 2019-12-23 2020-10-23 广州公评科技有限公司 Video data processing system based on block chain technology
CN113468553B (en) * 2021-06-02 2022-07-19 湖北工业大学 Privacy protection analysis system and method for industrial big data
CN113569223B (en) * 2021-06-30 2024-02-09 珠海晶通科技有限公司 Security authentication method for offline equipment
CN113792315B (en) * 2021-09-17 2023-04-25 长春理工大学 Cloud data access control method and control system supporting block-level encryption deduplication
CN114048180B (en) * 2021-11-10 2025-01-21 焦点科技股份有限公司 A cloud storage file deduplication method based on link technology
CN115695038B (en) * 2022-11-11 2025-01-03 东南大学 Ciphertext similar data deduplication management method in Internet of things scene
CN117058423B (en) * 2023-07-24 2025-01-28 西华大学 A fuzzy deduplication method and device based on a single server

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976321A (en) * 2010-09-21 2011-02-16 北京工业大学 Generated encrypting method based on face feature key
CN104216925A (en) * 2013-06-05 2014-12-17 中国科学院声学研究所 Repetition deleting processing method for video content
CN104917609A (en) * 2015-05-19 2015-09-16 华中科技大学 Efficient and safe data deduplication method and efficient and safe data deduplication system based on user perception
CN105939191A (en) * 2016-07-08 2016-09-14 南京理工大学 Client secure deduplication method of ciphertext data in cloud storage
CN106100832A (en) * 2016-06-12 2016-11-09 广东工业大学 Key management method based on convergent encryption in a kind of cloud storage data deduplication
CN106708951A (en) * 2016-11-25 2017-05-24 西安电子科技大学 Client image blurring deduplication method supporting proprietary authentication
CN107483585A (en) * 2017-08-18 2017-12-15 西安电子科技大学 Efficient data integrity audit system and method supporting secure deduplication in cloud environment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140149320A1 (en) * 2012-11-29 2014-05-29 International Business Machines Corporation Consistent price optimization in transportation networks
CN104346753A (en) * 2013-08-07 2015-02-11 鸿富锦精密工业(深圳)有限公司 Cutting optimization processing system and cutting optimization processing method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976321A (en) * 2010-09-21 2011-02-16 北京工业大学 Generated encrypting method based on face feature key
CN104216925A (en) * 2013-06-05 2014-12-17 中国科学院声学研究所 Repetition deleting processing method for video content
CN104917609A (en) * 2015-05-19 2015-09-16 华中科技大学 Efficient and safe data deduplication method and efficient and safe data deduplication system based on user perception
CN106100832A (en) * 2016-06-12 2016-11-09 广东工业大学 Key management method based on convergent encryption in a kind of cloud storage data deduplication
CN105939191A (en) * 2016-07-08 2016-09-14 南京理工大学 Client secure deduplication method of ciphertext data in cloud storage
CN106708951A (en) * 2016-11-25 2017-05-24 西安电子科技大学 Client image blurring deduplication method supporting proprietary authentication
CN107483585A (en) * 2017-08-18 2017-12-15 西安电子科技大学 Efficient data integrity audit system and method supporting secure deduplication in cloud environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于重复数据删除技术的云中云存储系统;毛波;《计算机研究与发展》;20160729;1-10 *
一种安全的云存储数据确定性删除方法;肜丽;《信阳师范学院学报(自然科学版)》;20140809;1-4 *

Also Published As

Publication number Publication date
CN108400970A (en) 2018-08-14

Similar Documents

Publication Publication Date Title
CN108400970B (en) Similar data message locking, encrypting and de-duplicating method in cloud environment and cloud storage system
JP6180177B2 (en) Encrypted data inquiry method and system capable of protecting privacy
US10374807B2 (en) Storing and retrieving ciphertext in data storage
CN112800445B (en) Boolean query method for forward and backward security and verifiability of ciphertext data
CN103530201A (en) Safety data repetition removing method and system applicable to backup system
CN110750796A (en) A Deduplication Method for Encrypted Data Supporting Public Audit
CN114528331A (en) Data query method, device, medium and equipment based on block chain
Rasina Begum et al. SEEDDUP: a three-tier SEcurE data DedUPlication architecture-based storage and retrieval for cross-domains over cloud
Jeyaselvi et al. Cyber security-based multikey management system in cloud environment
Keerthana et al. A survey on managing cloud storage using secure deduplication
Ahmad et al. Distributed text-to-image encryption algorithm
Abo-Alian et al. Auditing-as-a-service for cloud storage
CN113259317B (en) A cloud storage data deduplication method based on identity proxy re-encryption
Kumar et al. A study on data de-duplication schemes in cloud storage
CN113408729A (en) Data processing method for DNA calculation
CN108494552B (en) Cloud storage data deduplication method supporting efficient convergent key management
Tian et al. Pts-dep: A high-performance two-party secure deduplication for cloud storage
Su et al. An efficient and secure deduplication scheme based on rabin fingerprinting in cloud storage
Sepehri et al. Efficient implementation of a proxy-based protocol for data sharing on the cloud
Yoosuf et al. FogDedupe: A Fog‐Centric Deduplication Approach Using Multi‐Key Homomorphic Encryption Technique
Srinadh et al. Data security and recovery approach using elliptic curve cryptography
Patil et al. A survey on: secure data deduplication on hybrid cloud storage architecture
Ebinazer et al. A hybrid encryption for secure data deduplication the cloud
Al-lehaibi et al. A Secure Deduplication Technique for Data in the Cloud
Supriya et al. STUDY ON DATA DEDUPLICATION IN CLOUD COMPUTING.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant