CN113037732B

CN113037732B - Multi-user security encryption de-duplication method based on wide area network scene

Info

Publication number: CN113037732B
Application number: CN202110222902.XA
Authority: CN
Inventors: 田臣; 张渊; 张紫薇
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2022-09-23
Anticipated expiration: 2041-02-26
Also published as: CN113037732A

Abstract

The invention discloses a multi-user safe encryption de-duplication method based on a wide area network scene.A sending end preprocesses a plaintext message to be sent, and then compares the preprocessed plaintext message with a local cache to replace de-duplication information; the sending end establishes an end-to-end trusted connection to transmit the encrypted processed information to the receiving end; the receiving end receives the encrypted information and decrypts the encrypted information to obtain processed information; the receiving end interactively communicates with the duplicate removal agent of the local area network to obtain the encrypted ciphertext of the information which can be subjected to duplicate removal in a convergence manner, so that the corresponding plaintext information is obtained by decryption, and then the plaintext information is combined with the received information which cannot be subjected to duplicate removal to obtain the original information. And the sending end/the receiving end of the receiving end carries out updating synchronization with the duplicate removal agent of the local area network after each transmission. The method can be applied to encrypted flow, can relieve the transmission pressure of the wide area network, effectively saves the bandwidth of the wide area network, and can ensure the safety and the privacy of user information.

Description

A secure multi-user encryption and deduplication method based on wide area network

技术领域technical field

本发明涉及广域网通信的冗余流量消除技术领域，具体而言涉及一种基于广域网场景下的多用户的安全的加密去重方法。The invention relates to the technical field of redundant traffic elimination in wide area network communication, in particular to a secure encryption and deduplication method based on multiple users in a wide area network scenario.

背景技术Background technique

广域网冗余流量消除系统涉及两个或者多个用广域网链路连接的局域网。此类系统中，每个局域网的边缘均节点部署了一个去重代理(例如网关路由器)。在待发送的信息进入广域网之前，发送端与本地的去重代理合作对信息进行冗余消除。发送端首先依据其数据特征对信息进行分块处理，然后在待发送的信息和已发送的信息中寻找匹配的块。所有已发送的块均存储在去重代理的缓存中，可以称之为字典。字典中同时还存储块所对应的指纹信息。然后信息中每个可匹配的块然后被指纹信息所替代，从而完成消息的去重处理，因此需要通过广域网传输的数据量有效减少。消除广域网流量中的冗余或者重复数据可以提升网络效率，节省带宽，因此对于大型企业，互联网服务提供商和网络设备供应商，冗余流量消除系统的应用价值非常大。A WAN redundant traffic elimination system involves two or more local area networks connected by WAN links. In such systems, a deduplication agent (eg gateway router) is deployed at the edge of each LAN. Before the information to be sent enters the WAN, the sender cooperates with the local deduplication agent to eliminate redundancy of the information. The sender first divides the information into blocks according to its data characteristics, and then searches for matching blocks in the information to be sent and the information that has been sent. All sent chunks are stored in the deduplication proxy's cache, which can be called a dictionary. The dictionary also stores the fingerprint information corresponding to the block. Then each matchable block in the information is then replaced by the fingerprint information, thereby completing the deduplication processing of the message, thus effectively reducing the amount of data that needs to be transmitted through the WAN. Eliminating redundant or duplicate data in WAN traffic can improve network efficiency and save bandwidth, so for large enterprises, Internet service providers and network equipment vendors, the application value of redundant traffic elimination systems is very large.

尽管冗余流量消除系统已经被成功部署应用在未加密的流量，然而将它们应用于加密流量并且支持多用户的去重仍然是非常大的挑战。特别是现有的安全传输协议例如TLS和IPSec均采用端到端加密，这要求信息在进入网络中前已经被发送端加密且只能在离开网络后被接收端解密。在发送端和接收端中间的去重代理只能看到加密后的无意义的字节，没有办法像在未加密的流量中那样寻找匹配的部分。此外，多用户的去重系统使得去重代理的任务更加复杂，因为不仅需要与发送端已发送的信息进行匹配，还需要与其他用户的历史信息进行匹配。Although redundant traffic elimination systems have been successfully deployed on unencrypted traffic, applying them to encrypted traffic and supporting multi-user deduplication remains a great challenge. In particular, existing security transmission protocols such as TLS and IPSec all use end-to-end encryption, which requires that information has been encrypted by the sender before entering the network and can only be decrypted by the receiver after leaving the network. A deduplication proxy in the middle of the sender and receiver only sees encrypted nonsense bytes, there is no way to find matching parts like in unencrypted traffic. In addition, a multi-user deduplication system complicates the task of deduplication agents, since it needs to match not only the information already sent by the sender, but also the historical information of other users.

除了上述问题，在信息的传输过程中，恶意攻击者还可能发动投毒攻击。在一个多用户的冗余流量消除系统中，代理使用全局缓存来执行去重，同时所有的用户参与其中来更新维护该缓存。一个恶意的用户可能在缓存中插入错误的内容来发动投毒攻击，后续的用户在利用缓存中的内容来恢复去重过的信息时将无法得到正确的内容。事实上，局域网中存在大量用户，投毒攻击发生的概率很高而且哪怕只有一个用户被恶意攻击者控制，这也会对所有用户造成系统级的危害。一种简单的想法是发送端在信息上附加错误检测码以便接收端可以检测恢复的信息是否与之一致。然而这并不能解决投毒攻击，因为被篡改的内容仍旧存在于代理的缓存中并且继续影响后续用户。唯一的解决办法是将被篡改的内容从代理缓存中清除，但这也非常困难。具体来讲，用户需要使代理相信缓存中的某块信息被篡改了，同时又不能损害信息的机密性。In addition to the above problems, in the process of information transmission, malicious attackers may also launch poisoning attacks. In a multi-user redundant traffic elimination system, the proxy uses a global cache to perform deduplication, and all users participate in updating and maintaining the cache. A malicious user may insert wrong content in the cache to launch a poisoning attack, and subsequent users will not be able to get the correct content when using the content in the cache to restore the deduplicated information. In fact, there are a large number of users in the local area network, the probability of poisoning attacks is very high, and even if only one user is controlled by a malicious attacker, it will cause system-level harm to all users. A simple idea is that the sender attaches an error detection code to the message so that the receiver can check whether the recovered message is consistent with it. However, this does not solve the poisoning attack, because the tampered content still exists in the proxy's cache and continues to affect subsequent users. The only solution is to clear the tampered content from the proxy cache, which is also very difficult. Specifically, the user needs to convince the proxy that a certain piece of information in the cache has been tampered with without compromising the confidentiality of the information.

发明内容SUMMARY OF THE INVENTION

本发明针对现有技术中存在的在广域网传输中存在重复的流量，而现在去重(RE)系统都是在非加密的场景的不足，提供一种基于广域网场景下的多用户的安全的加密去重方法，通过在加密环境下实现跨用户的去重，能够有效实现消息的安全性，同时减少带宽开销；另外，还使系统能够防御恶意用户的投毒攻击，而这在已有的去重系统中尚未进行过研究。The present invention provides a multi-user secure encryption based on the wide area network scenario in view of the existing in the prior art that duplicate traffic exists in the wide area network transmission, and the current deduplication (RE) systems are all in non-encrypted scenarios. The deduplication method, by realizing cross-user deduplication in an encrypted environment, can effectively achieve message security and reduce bandwidth overhead; in addition, it also enables the system to defend against poisoning attacks by malicious users, which is a problem in existing deduplication methods. It has not been studied in heavy systems.

为实现上述目的，本发明采用以下技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种基于广域网场景下的多用户的安全的加密去重方法，所述加密去重方法包括以下步骤：A multi-user secure encryption and deduplication method based on a wide area network scenario, the encryption and deduplication method comprises the following steps:

S1，对于每个待发送的原文信息，发送端依据数据特征将原有信息分解为一定量数据块组成的数组；S1, for each original text message to be sent, the sender decomposes the original message into an array composed of a certain amount of data blocks according to the data characteristics;

S2，针对每个数据块，发送端计算得到对应的密钥和独有的用于区分不同数据块的指纹；S2, for each data block, the sender calculates the corresponding key and a unique fingerprint for distinguishing different data blocks;

S3，发送端将每个数据块的指纹与本地缓存中的所有指纹信息进行比对，采用对应的密钥替代可去重的数据块；S3, the sender compares the fingerprint of each data block with all the fingerprint information in the local cache, and uses the corresponding key to replace the deduplicated data block;

S4，发送端对去重后的数据块进行加密处理后发送至接收端，同时将所有发送数据块对应的<指纹，密文>集合更新至发送端的本地缓存和发送端局域网的去重代理；S4, the sender encrypts the deduplicated data block and sends it to the receiver, and at the same time updates the set of <fingerprint, ciphertext> corresponding to all sent data blocks to the sender's local cache and the sender's local area network deduplication agent;

S5，借助接收端局域网的去重代理，接收端结合替代密钥获取去重后的数据块，将获取结果与解密得到的发送端发送的数据块信息融合后，恢复得到原文信息；同时将所有发送数据块对应的<指纹，密文>集合更新至接收端和接收端局域网的去重代理。S5, with the help of the deduplication agent of the local area network of the receiving end, the receiving end obtains the deduplicated data block in combination with the substitute key, fuses the obtained result with the decrypted data block information sent by the transmitting end, and restores the original text information; The set of <fingerprint, ciphertext> corresponding to the sent data block is updated to the deduplication agent of the receiving end and the local area network of the receiving end.

为优化上述技术方案，采取的具体措施还包括：In order to optimize the above technical solutions, the specific measures taken also include:

进一步地，步骤S1中，所述发送端采用CDC算法依据数据特征将原有信息分解为一定量数据块组成的数组；所述数据块基于内容进行分块，每个数据块的长度在限定的最小值和最大值之间。Further, in step S1, the transmitting end adopts the CDC algorithm to decompose the original information into an array composed of a certain amount of data blocks according to the data characteristics; between the minimum and maximum values.

进一步地，步骤S4中，所述发送端上传所有发送数据块对应的<指纹，密文>集合至发送端局域网的去重代理，再将去重代理中的所有指纹信息集合更新至发送端的本地缓存；Further, in step S4, the sending end uploads the set of <fingerprint, ciphertext> corresponding to all the sending data blocks to the deduplication proxy of the local area network of the sending end, and then updates all fingerprint information sets in the deduplication proxy to the local area of the sending end. cache;

其中，发送端的本地缓存仅通过指纹缓存表以维护发送端所在局域网下所有用户发送或接收的所有数据块对应的指纹信息；发送端的去重代理同时维护前述指纹缓存表和对应的密文表，密文表中存储有指纹缓存表中所有指纹对应的数据块的密文。Among them, the local cache of the sender only maintains the fingerprint information corresponding to all data blocks sent or received by all users under the local area network where the sender is located through the fingerprint cache table; the deduplication agent of the sender also maintains the aforementioned fingerprint cache table and the corresponding ciphertext table, The ciphertext table stores ciphertexts of data blocks corresponding to all fingerprints in the fingerprint cache table.

进一步地，步骤S5中，所述接收端将接收到的数据块对应的<指纹，密文>上传至接收端局域网的去重代理，接收端局域网的去重代理整合所有指纹信息集合后将整合结果返回接收端。Further, in step S5, the receiving end uploads the corresponding <fingerprint, ciphertext> of the received data block to the deduplication proxy of the local area network of the receiving end, and the deduplication proxy of the local area network of the receiving end integrates all fingerprint information sets and will integrate the information. The result is returned to the receiver.

进一步地，步骤S3中，所述发送端将每个数据块的指纹与本地缓存中的所有指纹信息进行比对，采用对应的密钥替代可去重的数据块是指，Further, in step S3, the sending end compares the fingerprint of each data block with all the fingerprint information in the local cache, and using the corresponding key to replace the data block that can be deduplicated refers to,

对于每个数据块，发送端检索本地缓存中的指纹表，查看其所对应的指纹是否存储在指纹表中；如果存在匹配的指纹，判定该数据块已经传输过，允许去重，采用步骤S2中生成的与数据块对应的密钥替代原来数据块；如果没有匹配的指纹，则保留原有的数据块。For each data block, the sender searches the fingerprint table in the local cache to check whether the corresponding fingerprint is stored in the fingerprint table; if there is a matching fingerprint, it is determined that the data block has been transmitted, and deduplication is allowed, and step S2 is adopted. The generated key corresponding to the data block replaces the original data block; if there is no matching fingerprint, the original data block is retained.

进一步地，所述接收端和发送端采用TLS连接。Further, the receiving end and the sending end are connected by TLS.

进一步地，步骤S5中，所述借助接收端局域网的去重代理，接收端结合替代密钥获取去重后的数据块，将获取结果与发送端发送的数据块融合后，恢复得到原文信息的过程包括以下步骤：Further, in step S5, with the help of the deduplication agent of the local area network of the receiving end, the receiving end obtains the deduplicated data block in combination with the substitute key, and after merging the obtained result with the data block sent by the transmitting end, the original text information is recovered. The process includes the following steps:

S51，接收端接收TLS加密后的数据块信息，解密得到步骤S3执行去重后的数据块对应的密钥；S51, the receiving end receives the data block information encrypted by TLS, and decrypts to obtain the key corresponding to the data block after performing deduplication in step S3;

S52，针对每个去重后的数据块，接收端根据接收到的密钥计算数据块对应的指纹信息，上传计算得到的指纹信息至接收端局域网的去重代理，使接收端局域网的去重代理返回指纹信息对应的数据块的密文给接收端；S52, for each deduplicated data block, the receiving end calculates the fingerprint information corresponding to the data block according to the received key, and uploads the calculated fingerprint information to the deduplication agent of the local area network of the receiving end, so that the deduplication of the local area network of the receiving end is performed. The agent returns the ciphertext of the data block corresponding to the fingerprint information to the receiver;

S53，接收端执行解密得到经过去重处理的数据块，将接收到的密文信息中的密钥替换为对应的数据块，恢复出原有的完整信息。S53, the receiving end performs decryption to obtain the deduplicated data block, replaces the key in the received ciphertext information with the corresponding data block, and restores the original complete information.

进一步地，所述发送端采用收敛加密算法对数据块进行加密。Further, the sender uses a convergent encryption algorithm to encrypt the data block.

进一步地，所述加密去重方法还包括以下步骤：Further, the encryption and deduplication method also includes the following steps:

S6，接收端采用守卫解密算法来检测密文是否被投毒，利用密钥校验算法来确认当前用户是否诚实用户，并且去除被投毒的密文。S6, the receiving end uses a guard decryption algorithm to detect whether the ciphertext is poisoned, uses a key verification algorithm to confirm whether the current user is an honest user, and removes the poisoned ciphertext.

基于双线性映射e：G₁×G₂→G₃的MLEvd加密对数据块进行处理；对于任意一个对称确定加密算法SDE＝(SK，SE，SD)构建以下MLEvd算法，其中SK是密钥生成算法，SE是加密算法，SD是解密算法：The data block is processed based on the MLEvd encryption of the bilinear map e: G ₁ ×G ₂ →G ₃ ; for any symmetric deterministic encryption algorithm SDE=(SK, SE, SD), the following MLEvd algorithm is constructed, where SK is the key Generation algorithm, SE is the encryption algorithm, SD is the decryption algorithm:

A，参数生成算法PG(1^λ)→P：A, parameter generation algorithm PG(1 ^λ )→P:

对于输入的安全参数1^λ，参数生成算法生成了群G₁＝<g₁>，G₂＝<g₂>，且群的阶为素数p，他们对应的双线性映射为e。For the input security parameter 1 ^λ , the parameter generation algorithm generates groups G ₁ =<g ₁ >, G ₂ =<g ₂ >, and the order of the group is a prime number p, and their corresponding bilinear mapping is e.

目标群G₃；H₁：{0，1}^*→G₁，H₂：{0，1}^*→G₂，h：

H：{0，1}^*→K为密码学安全的哈希函数，返回的公开参数P包含{e，g₁，g₂，p，H₁，H₂，h，H}；Target group G ₃ ; H ₁ : {0, 1} ^* →G ₁ , H ₂ : {0, 1} ^* →G ₂ , h:

H: {0, 1} ^* → K is a cryptographically secure hash function, and the returned public parameter P includes {e, g ₁ , g ₂ , p, H ₁ , H ₂ , h, H};

B，指纹生成算法FG(P，m)→F：B, fingerprint generation algorithm FG(P, m)→F:

对于输入的明文m，指纹

For the input plaintext m, the fingerprint

C，密钥生成算法KG(P，m)→(K(t)，t)：C, the key generation algorithm KG(P, m)→(K(t), t):

采样一个随机数t,得到密钥

输出(K(t)，t)；Sampling a random number t to get the key

output(K(t), t);

D，加密算法ENC(P，m)→C：D, encryption algorithm ENC(P, m)→C:

通过调用KG获取(K(t)，t)，计算Obtain (K(t), t) by calling KG, calculate

c＝SE(m，H(K(t)))c=SE(m, H(K(t)))

输出密文C＝(c，T)；Output ciphertext C=(c, T);

E，解密算法DEC(C，K(t))→m'：E, decryption algorithm DEC(C, K(t))→m':

计算对应的明文m'＝SD(c，H(K(t)))；Calculate the corresponding plaintext m'=SD(c, H(K(t)));

F，守卫解密算法GDEC(C，K(t))→{m'，⊥}：F, the guard decryption algorithm GDEC(C, K(t))→{m', ⊥}:

在解密算法的基础上，检验是否T^h(m′)＝K(t)，如果不满足说明密文有问题，返回⊥；On the basis of the decryption algorithm, check whether ^Th(m') = K(t), if it does not satisfy the ciphertext problem, return ⊥;

G，密钥校验算法KV(C，F，K)→{0，1}：G, the key verification algorithm KV(C, F, K) → {0, 1}:

检验e(F，T)＝e(g₁，K)，满足返回1代表密钥正确，否则返回0。Check that e(F, T)=e(g ₁ , K), if it returns 1, it means the key is correct, otherwise it returns 0.

本发明的有益效果是：The beneficial effects of the present invention are:

(1)本发明可以在保证数据安全的情况下，对广域网的流量进行去重，大大减少了网络的无用流量。对于恶意用户的投毒攻击，本发明提出MLEvd可以支持动态密钥和密钥的验证。通过MLEvd的新特性，来对用户上传的内容进行验证，以此防范了投毒攻击。这大大提高了系统的安全性和可靠性。(1) The present invention can de-duplicate the traffic of the wide area network under the condition of ensuring data security, which greatly reduces the useless traffic of the network. For poisoning attacks by malicious users, the present invention proposes that MLEvd can support dynamic keys and key verification. Through the new features of MLEvd, the content uploaded by the user is verified to prevent poisoning attacks. This greatly improves the security and reliability of the system.

(2)本发明减少了广域网传输的流量可以帮助不同地区的团队互相交流更加迅速，减少了对有限地广域网的占用(2) The present invention reduces the traffic transmitted by the wide area network, can help teams in different regions communicate with each other more quickly, and reduces the occupation of limited wide area networks

附图说明Description of drawings

图1是本发明的基于广域网场景下的多用户的安全的加密去重方法流程图。FIG. 1 is a flow chart of a method for secure encryption and deduplication based on multi-users in a wide area network scenario of the present invention.

图2是其中一个应用实施例的模型示意图。FIG. 2 is a schematic diagram of a model of one of the application embodiments.

具体实施方式Detailed ways

现在结合附图对本发明作进一步详细的说明。The present invention will now be described in further detail with reference to the accompanying drawings.

需要注意的是，发明中所引用的如“上”、“下”、“左”、“右”、“前”、“后”等的用语，亦仅为便于叙述的明了，而非用以限定本发明可实施的范围，其相对关系的改变或调整，在无实质变更技术内容下，当亦视为本发明可实施的范畴。It should be noted that the terms such as "up", "down", "left", "right", "front", "rear", etc. quoted in the invention are only for the convenience of description and clarity, and are not used for Limiting the applicable scope of the present invention, the change or adjustment of the relative relationship shall be regarded as the applicable scope of the present invention without substantially changing the technical content.

结合图1，本发明提及一种基于广域网场景下的多用户的安全的加密去重方法，所述加密去重方法包括以下步骤：In conjunction with Fig. 1, the present invention refers to a secure encryption and deduplication method based on multiple users in a wide area network scenario. The encryption and deduplication method includes the following steps:

S1，对于每个待发送的原文信息，发送端依据数据特征将原有信息分解为一定量数据块组成的数组。S1, for each original text message to be sent, the sender decomposes the original message into an array composed of a certain amount of data blocks according to data characteristics.

在步骤S1中，所述发送端采用CDC算法依据数据特征将原有信息分解为一定量数据块组成的数组；所述数据块基于内容进行分块，每个数据块的长度在限定的最小值和最大值之间。In step S1, the transmitting end adopts the CDC algorithm to decompose the original information into an array composed of a certain amount of data blocks according to the data characteristics; the data blocks are divided into blocks based on the content, and the length of each data block is within a defined minimum value. and the maximum value.

S2，针对每个数据块，发送端计算得到对应的密钥和独有的用于区分不同数据块的指纹。S2, for each data block, the sender calculates a corresponding key and a unique fingerprint for distinguishing different data blocks.

步骤S2中的密钥是依据数据块产生，指纹是数据块独一无二的标识，可以用于与其他数据块区分。接收端收到密钥后可以找到该数据块对应的密文，然后解密得到该数据块。The key in step S2 is generated according to the data block, and the fingerprint is a unique identifier of the data block, which can be used to distinguish it from other data blocks. After receiving the key, the receiver can find the ciphertext corresponding to the data block, and then decrypt to obtain the data block.

S3，发送端将每个数据块的指纹与本地缓存中的所有指纹信息进行比对，采用对应的密钥替代可去重的数据块。S3, the sender compares the fingerprint of each data block with all the fingerprint information in the local cache, and uses the corresponding key to replace the deduplicated data block.

步骤S3中的去重过程指对于每个数据块，发送端检索本地缓存中的指纹表，查看其所对应的指纹是否在指纹表中出现。如果可以找到，则认为该数据块已经传输过，可以去重，那么发送端便用步骤S2生成的密钥替代原来数据块。如果没有匹配的指纹，则保留原有的数据块。The deduplication process in step S3 means that for each data block, the sender searches the fingerprint table in the local cache to check whether the corresponding fingerprint appears in the fingerprint table. If it can be found, it is considered that the data block has been transmitted and can be deduplicated, and then the sender replaces the original data block with the key generated in step S2. If there is no matching fingerprint, the original data block is kept.

S4，发送端对去重后的数据块进行加密处理后发送至接收端，同时将所有发送数据块对应的<指纹，密文>集合更新至发送端的本地缓存和发送端局域网的去重代理。此处发送端和接收端之间构建的为可信连接，例如TLS连接，可以确保端到端的安全性。此处的密文是指利用步骤S2所生成的密钥加密得到的。该加密算法为收敛加密算法，区别于传统的加密方法。密钥基于明文信息产生，对于同一个明文信息，加密得到的密文完全相同。S4, the sender encrypts the deduplicated data block and sends it to the receiver, and at the same time updates the set of <fingerprint, ciphertext> corresponding to all sent data blocks to the sender's local cache and the deduplication proxy of the sender's local area network. Here, a trusted connection, such as a TLS connection, is built between the sender and the receiver, which can ensure end-to-end security. The ciphertext here is obtained by encrypting with the key generated in step S2. The encryption algorithm is a convergent encryption algorithm, which is different from the traditional encryption method. The key is generated based on the plaintext information. For the same plaintext information, the encrypted ciphertext is exactly the same.

步骤S4中的更新是指：(1)发送端维护一张指纹缓存表，记录了该发送端所在局域网下所有用户发送/接收的所有数据块所对应的指纹信息；(2)发送端的本地去重代理维护一张指纹缓存表以及对应的密文表，指纹缓存表记录经过该代理的块的指纹，密文表则存储指纹缓存表中指纹所对应的数据块的密文。The updating in step S4 refers to: (1) the sending end maintains a fingerprint cache table, which records the fingerprint information corresponding to all data blocks sent/received by all users under the local area network where the sending end is located; (2) the local go to the sending end. The re-agent maintains a fingerprint cache table and the corresponding ciphertext table, the fingerprint cache table records the fingerprints of the blocks passing through the agent, and the ciphertext table stores the ciphertext of the data blocks corresponding to the fingerprints in the fingerprint cache table.

在步骤S5中，接收端的恢复过程是指收到TLS加密的信息后，接收端首先解密得到步骤S3执行去重后的信息。接着，对于其中可去重的块，接收端根据收到的密钥计算数据块对应的指纹信息，然后上传该指纹信息到局域网去重代理。去重代理会返回指纹信息对应的数据块的密文给接收端，接收端然后执行解密得到对应的数据块，再将信息中的密钥替换为对应的数据块，从而恢复出原有的完整信息。In step S5, the recovery process of the receiving end means that after receiving the TLS-encrypted information, the receiving end first decrypts to obtain the information after performing deduplication in step S3. Next, for the block that can be deduplicated, the receiving end calculates the fingerprint information corresponding to the data block according to the received key, and then uploads the fingerprint information to the local area network deduplication agent. The deduplication agent will return the ciphertext of the data block corresponding to the fingerprint information to the receiver, the receiver will then perform decryption to obtain the corresponding data block, and then replace the key in the information with the corresponding data block, thereby restoring the original complete information.

作为其中的一种优选例，所述加密去重方法还包括以下步骤：As one of the preferred examples, the encryption and deduplication method further includes the following steps:

S6，接收端采用守卫解密算法来检测密文是否被投毒，利用密钥校验算法来确认当前用户是否诚实用户，并且去除被投毒的密文。具体的，本发明提出DETp协议来防止投毒攻击，在进行完类似于DETs的步骤之后，增加了投毒消息检测和恢复步骤，采用了守卫解密算法来检测密文是否被投毒，利用密钥校验算法来确认当前用户是诚实用户，并且去除被投毒的密文。S6, the receiving end uses a guard decryption algorithm to detect whether the ciphertext is poisoned, uses a key verification algorithm to confirm whether the current user is an honest user, and removes the poisoned ciphertext. Specifically, the present invention proposes the DETp protocol to prevent poisoning attacks. After the steps similar to DETs are performed, the poisoning message detection and recovery steps are added, and a guard decryption algorithm is used to detect whether the ciphertext is poisoned. The key verification algorithm is used to confirm that the current user is an honest user and remove the poisoned ciphertext.

两个加密环境下跨用户去重的协议，包含以下步骤：A protocol for cross-user deduplication in two encryption environments, including the following steps:

(1)提取了跨广域网加密去重的问题模型，定义了对手模型和安全性。(1) The problem model of cross-WAN encryption and deduplication is extracted, and the adversary model and security are defined.

(2)设计了DETs去重协议来实现一种去重后加密的跨用户协议。(2) The DETs deduplication protocol is designed to realize a cross-user protocol encrypted after deduplication.

(3)优化了消息锁定加密(message locked encryption)，提出MLEvd可以进一步支持动态密钥和密钥验证。(3) The message locked encryption is optimized, and it is proposed that MLEvd can further support dynamic keys and key verification.

(4)设计了DETp去重协议可以防止恶意用户的投毒攻击。(4) The DETp deduplication protocol is designed to prevent poisoning attacks by malicious users.

问题模型如图2所示，本发明考虑一个公司在不同的城市有两个分部，同时也可以扩展到多个分部的情况。每一个分部都有他自己的局域网(LAN)来连接内部的用户(user)。例如左边的分部有一个用户Alice，而右边的分部有一个用户Bob。同时两个分部被一个公共的广域网(WAN)所连接。在每一个分部有一个代理(agent)驻留的网关节点上，并帮助两个局域网的用户节点在其广域网通信流量上执行重复数据删除。例如左边的分部的代理为Carol，右边的分部代理为David。The problem model is shown in Fig. 2. The present invention considers the situation that a company has two branches in different cities, and can also be extended to multiple branches. Each branch has its own local area network (LAN) to connect internal users (users). For example, the left branch has a user Alice, and the right branch has a user Bob. At the same time the two branches are connected by a public wide area network (WAN). An agent resides on the gateway node in each branch and assists the user nodes of both LANs to perform deduplication on their WAN traffic. For example, the agent of the branch on the left is Carol, and the agent of the branch on the right is David.

本发明提出了DETs去重协议，包含了对消息分块，使用满足密码学安全的单项哈希函数生成了消息块对应的指纹和加密密钥，根据指纹判断消息块是否重复然后对重复消息块进行去重，基于TLS的端到端的安全传输，接收到消息后的去重消息恢复，用户与代理的消息同步。The invention proposes a DETs deduplication protocol, which includes dividing the message into blocks, using a single-item hash function satisfying cryptographic security to generate the fingerprint and encryption key corresponding to the message block, judging whether the message block is repeated according to the fingerprint, and then dividing the repeated message block. Perform deduplication, end-to-end secure transmission based on TLS, recovery of deduplicated messages after receiving messages, and synchronization of messages between users and agents.

另外，本发明提出了基于双线性映射(bilinear map)e：

的MLEvd加密。对于任意一个对称确定加密算法SDE＝(SK，SE，SD)，其中SK是密钥生成算法，SE是加密算法，SD是解密算法，本发明可以构建以下七步MLEvd算法：In addition, the present invention proposes a bilinear map based on e:

MLEvd encryption. For any symmetrically determined encryption algorithm SDE=(SK, SE, SD), where SK is a key generation algorithm, SE is an encryption algorithm, and SD is a decryption algorithm, the present invention can construct the following seven-step MLEvd algorithm:

对于输入的安全参数1^λ，参数生成算法生成了群G₁＝<g₁>，G₂＝<g₂>，且群的阶为素数p，他们对应的双线性映射为e；For the input security parameter 1 ^λ , the parameter generation algorithm generates a group G ₁ =<g ₁ >, G ₂ =<g ₂ >, and the order of the group is a prime number p, and their corresponding bilinear mapping is e;

目标群G₃；H₁：{0，1}^*→G₁，H₂：{0，1}^★*→G₂，h：

H：{0，1}^*→K为密码学安全的哈希函数，返回的公开参数P包含{e，g₁，g₂，p，H₁，H₂，h，H}；Target group G ₃ ; H ₁ : {0, 1} ^* → G ₁ , H ₂ : {0, 1} ^★* → G ₂ , h:

对于输入的明文m，指纹

For the input plaintext m, the fingerprint

采样一个随机数t，得到密钥

输出(K(t)，t)；Sampling a random number t to get the key

output(K(t), t);

D，加密算法ENC(P，m)→C：D, encryption algorithm ENC(P, m)→C:

c＝SE(m，H(K(t)))c=SE(m, H(K(t)))

输出密文C＝(c，T)；Output ciphertext C=(c, T);

E，解密算法DEC(C，K(t))→m'：E, decryption algorithm DEC(C, K(t))→m':

如图一所示，本发明考虑一个公司在不同的城市有两个分部，同时也可以扩展到多个分部的情况。每一个分部都有他自己的局域网(LAN)来连接内部的用户(user)。同时两个分部被一个公共的广域网(WAN)所连接。在每一个分部有一个代理(agent)驻留的网关节点上，并帮助两个局域网的用户节点在其广域网通信流量上执行重复数据删除。出于安全考虑，用户之间通过TLS连接进行通信，它提供了令人称赞的端到端安全性。用户和代理共同合作来安全运行DTEs(DTEp)协议，实现在加密连接上进行重复数据删除。As shown in Figure 1, the present invention considers the situation that a company has two branches in different cities, and can also be extended to multiple branches. Each branch has its own local area network (LAN) to connect internal users (users). At the same time the two branches are connected by a public wide area network (WAN). An agent resides on the gateway node in each branch and assists the user nodes of both LANs to perform deduplication on their WAN traffic. For security reasons, users communicate over TLS connections, which provide admirable end-to-end security. Users and agents work together to securely run the DTEs (DTEp) protocol, enabling deduplication over encrypted connections.

本发明提出的DTEs协议包含了以下步骤：The DTEs protocol proposed by the present invention includes the following steps:

1.分块：对于Alice发送的每一条信息M，Alice运行CDC分块算法，把M分成一系列信息块：1. Block: For each piece of information M sent by Alice, Alice runs the CDC block algorithm to divide M into a series of information blocks:

M＝{m₁，m₂，...，m_n}。M={m ₁ , m ₂ , . . . , m _n }.

2.生成密钥和指纹：对于每一个信息块m_i，Alice用计算它的密钥2. Generate keys and fingerprints: For each information block m _i , Alice computes its key using

K_i＝h(m_i)K _i =h(m _i )

和指纹and fingerprints

F_i＝h(K_i)F _i =h(K _i )

其中h是满足密码学安全的单项哈希函数。where h is a single-item hash function that satisfies cryptographic security.

3.去重：对于每一个信息块，Alice去本地指纹库T_La检查这个信息块的指纹是否存在。如果存在，说明这个信息块已经发送过了，可以被去重，在发送的时候只需要发送密钥而不是完整的信息块。这可以大大减少通信。有了这个密钥，接受者可以找到正确的密文并且解密文件。去重之后，Alice发送的信息M'＝{m₁'，m₂'，...，m_n'}，其中m_i'为：3. Deduplication: For each information block, Alice goes to the local fingerprint database T _La to check whether the fingerprint of this information block exists. If it exists, it means that this information block has been sent and can be de-duplicated. When sending, only the key needs to be sent instead of the complete information block. This can greatly reduce communication. With this key, the recipient can find the correct ciphertext and decrypt the file. After deduplication, Alice sends information M'={m ₁ ', m ₂ ',..., m _n '}, where m _i ' is:

4.端对端传输：Alice和Bob建立TLS连接，然后Alice通过连接把M'发送给Bob。TLS连接保证了只有Bob可以看到消息。4. End-to-end transmission: Alice and Bob establish a TLS connection, and then Alice sends M' to Bob through the connection. The TLS connection guarantees that only Bob can see the message.

5.消息恢复：Bob接收到消息M'之后，他需要在David的帮助下把消息恢复出来。这里的David是Bob所在局域网的去重代理。具体细节如下，对于M'中的每个K_i，Bob计算它的指纹F_i，同时去David那边下载指纹对应的加密后的消息块c_i。然后Bob进行解密：5. Message recovery: After Bob receives the message M', he needs to recover the message with the help of David. David here is the deduplication proxy for Bob's LAN. The specific details are as follows. For each K _i in M', Bob calculates its fingerprint F _i , and at the same time goes to David to download the encrypted message block c _i corresponding to the fingerprint. Bob then decrypts:

m_i＝DEC(c_i，K_i)m _i =DEC( _{ci ,K i} ₎

同时把M'中的每个K_i替换成m_i，这样就恢复了M。At the same time, each _Ki in M' is replaced by _mi , thus restoring M.

6.Alice同步：对于每一个

Alice使用K_i加密m_i得到6.Alice synchronization: for each

Alice uses K _i to encrypt m _i to get

c_i＝ENC(m_i，K_i)c _i =ENC(m _i ,K _i )

Alice发送所有的<F_i，c_i>给他所在的局域网代理Carol。同时Alice从Carol那边下载全局的指纹表T_Gc，并且更新自己的局部指纹表T_La。Alice sends all <F _i , c _i > to Carol, her local area network agent. At the same time, Alice downloads the global fingerprint table T _Gc from Carol and updates her local fingerprint table T _La .

7.Carol同步：接收到<F_i，c_i>之后，Carol把他们插入到全局指纹表T_Gc，同时把最新的全局指纹表同步给Alice。7. Carol synchronization: After receiving <Fi _{, c i} _> , Carol inserts them into the global fingerprint table T _Gc and synchronizes the latest global fingerprint table to Alice.

8.Bob同步：对于每一个接收到的，Bob计算然后加密得到，Bob同时计算，发送所有的给他所在的局域网代理David。同时Bob从David那边下载全局的指纹表，并且更新自己的局部指纹表。8. Bob synchronization: For each received, Bob calculates and encrypts it, Bob calculates at the same time, and sends all to his local LAN agent David. At the same time, Bob downloads the global fingerprint table from David and updates his local fingerprint table.

9.David同步：接收到<F_i，c_i>之后，David把他们插入到全局指纹表T_Gd，同时把最新的全局指纹表同步给Bob。9. David synchronization: After receiving <Fi _{, c i} _> , David inserts them into the global fingerprint table T _Gd and synchronizes the latest global fingerprint table to Bob.

在DTEs协议下，用户可以上传错误的加密后的消息块，从而使得接收端获取到错误的信息，为了防止这种情况，本发明进一步提出了DTEp，协议细节如下：Under the DTEs protocol, the user can upload the wrong encrypted message block, so that the receiver can obtain the wrong information. In order to prevent this situation, the present invention further proposes DTEp. The details of the protocol are as follows:

M＝{m₁，m₂，...，m_n}。M={m ₁ , m ₂ , . . . , m _n }.

2.生成密钥和指纹：为了实现高效的查询操作，DETp使用确定的指纹2. Generate keys and fingerprints: In order to achieve efficient query operations, DETp uses certain fingerprints

对于加密解密使用的密钥，本发明达到了David可以验证密钥和指纹是否匹配的同时他不能得知如何从指纹计算密钥。为了达到这个目的，密钥为：For the key used for encryption and decryption, the present invention achieves that David can verify whether the key and the fingerprint match, while he cannot know how to calculate the key from the fingerprint. For this purpose, the keys are:

3.去重：对于每一个信息块，Alice去本地指纹库T_La检查这个信息块的指纹是否存在。与DTEs不同，对于重复的消息块，我们发送他们的哈希值h(m_i)。去重之后，Alice发送的信息M'＝{m₁'，m₂'，...，m_n'}，其中m_i'为：3. Deduplication: For each information block, Alice goes to the local fingerprint database T _La to check whether the fingerprint of this information block exists. Unlike DTEs, for repeated message blocks, we send their hash value h(m _i ). After deduplication, Alice sends information M'={m ₁ ', m ₂ ',..., m _n '}, where m _i ' is:

5.消息恢复：Bob接收到消息M'之后，他需要在David的帮助下把消息恢复出来。具体细节如下，对于M'中的每个h(m_i)，Bob计算它的指纹F_i和密钥K_i(t)，同时去David那边下载指纹对应的加密后的消息块C_i(t)。因为支持动态密钥，DETp的密文需要增加一个只能使用一次的随机数(nonce)：5. Message recovery: After Bob receives the message M', he needs to recover the message with the help of David. The specific details are as follows. For each h(m _i ) in M', Bob calculates its fingerprint F _i and key K _i (t), and at the same time goes to David to download the encrypted message block C _i ( t). Because it supports dynamic keys, the ciphertext of DETp needs to add a nonce that can only be used once:

C_i(t)＝(c_i(t)，T)C _i (t)=( _ci (t), T)

其中：in:

c_i＝ENC(m_i，K_i(t))c _i =ENC(m _i ,K _i (t))

在得到密文C_i(t)之后，Bob计算密钥：After obtaining the ciphertext C _i (t), Bob calculates the key:

K_i(t)＝(T)^h(m)。K _i (t)=(T) ^h(m) .

然后解密c_i(t)得到明文：Then decrypt _ci (t) to get the plaintext:

m_i＝DEC(c_i(t)，K_i(t))。m _i =DEC( _{ci (t),K i} ₍ t)).

最后，为确保恢复的消息正确，Bob可以计算恢复的消息m_i的指纹，并将其与他从Alice那里收到的指纹进行比较。如果两者相同，则消息正确。否则，恢复的消息为假，Bob需要请求Alice发送正确的消息，这个消息不能是去重后发送的哈希值。Finally, to ensure that the recovered message is correct, Bob can _compute the fingerprint of the recovered message mi and compare it to the fingerprint he received from Alice. If both are the same, the message is correct. Otherwise, the recovered message is false, and Bob needs to request Alice to send the correct message, which cannot be the hash value sent after deduplication.

6.Alice同步：对于每一个

Alice使用K_i加密m_i得到c_i，同时生成一个随机数t，计算T，得到密文C_i(t)。6.Alice synchronization: for each

Alice uses K _i to encrypt m _i to obtain c _i , generates a random number t at the same time, calculates T, and obtains the ciphertext C _i (t).

Alice发送所有的<F_i，C_i(t)>给他所在的局域网代理Carol。同时Alice从Carol那边下载全局的指纹表T_Gc，并且更新自己的局部指纹表T_La。Alice sends all <F _i , C _i (t)> to Carol, her local area network agent. At the same time, Alice downloads the global fingerprint table T _Gc from Carol and updates her local fingerprint table T _La .

7.Carol同步：接收到<F_i，C_i(t)>之后，Carol把他们插入到全局指纹表T_Gc，同时把最新的全局指纹表同步给Alice。7. Carol synchronization: After receiving <F _i , C _i (t)>, Carol inserts them into the global fingerprint table T _Gc and synchronizes the latest global fingerprint table to Alice.

8.Bob同步：对于每一个接收到的m_i，Bob计算K_i然后加密m_i得到C_i(t)，Bob同时计算F_i，发送所有的<F_i，C_i(t)>给他所在的局域网代理David。同时Bob从David那边下载全局的指纹表T_Gd，并且更新自己的局部指纹表T_Lb。8. Bob synchronization: for each received _{mi, Bob calculates K i} _and encrypts _mi to get C _i (t), Bob calculates F _i at the same time, and sends all <F _i ,C _i (t)> to him Where the LAN agent David. At the same time, Bob downloads the global fingerprint table T _Gd from David and updates his local fingerprint table T _Lb .

9.David同步：接收到<F_i，C_i(t)>之后，David把他们插入到全局指纹表T_Gd，同时把最新的全局指纹表同步给Bob。9. David synchronization: After receiving <F _i , C _i (t)>, David inserts them into the global fingerprint table T _Gd , and synchronizes the latest global fingerprint table to Bob.

10.检测和改正被投毒的数据：如果Bob看到密文被投毒，他需要提交正确的<F_i，K_i(t)>。10. Detect and correct poisoned data: If Bob sees the ciphertext being _poisoned , he needs to submit the correct <Fi, _Ki (t)>.

David校验David check

e(F_i，T)＝e(g₁，K_i(t))e(F _i , T)=e(g ₁ , K _i (t))

如果满足，David认为K_i(t)是正确的密钥，他就会对校验在他手上的密文。校验的方法和Bob相同。如果密文出错，他会采用让Bob同步正确的密文。If satisfied, David considers K _i (t) to be the correct key, and he will check the ciphertext in his hand. The verification method is the same as Bob. If the ciphertext is wrong, he will use the correct ciphertext for Bob to synchronize.

以上仅是本发明的优选实施方式，本发明的保护范围并不仅局限于上述实施例，凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理前提下的若干改进和润饰，应视为本发明的保护范围。The above are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions that belong to the idea of the present invention belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principle of the present invention should be regarded as the protection scope of the present invention.

Claims

1. a multi-user security based deduplication method based on a wide area network scenario, is characterized in that, the encryption deduplication method comprises the following steps:

S1, for each original text message to be sent, the sender decomposes the original message into an array composed of a certain amount of data blocks according to the data characteristics;

S2, for each data block, the sender calculates the corresponding key and a unique fingerprint for distinguishing different data blocks;

S3, the sender compares the fingerprint of each data block with all the fingerprint information in the local cache, and uses the corresponding key to replace the deduplicated data block;

S4, the sender encrypts the deduplicated data block and sends it to the receiver, and at the same time updates the set of <fingerprint, ciphertext> corresponding to all sent data blocks to the sender's local cache and the sender's local area network deduplication agent;

S5, with the help of the deduplication agent of the local area network of the receiving end, the receiving end obtains the deduplicated data block in combination with the substitute key, fuses the obtained result with the decrypted data block information sent by the transmitting end, and restores the original text information; The set of <fingerprint, ciphertext> corresponding to the sent data block is updated to the deduplication agent of the receiving end and the local area network of the receiving end.

2. the multi-user security encryption and deduplication method based on the wide area network scenario according to claim 1, is characterized in that, in step S1, described sending end adopts CDC algorithm to decompose original information into a certain amount according to data characteristic An array of data blocks; the data blocks are divided into blocks based on content, and the length of each data block is between a defined minimum value and a maximum value.

3. the multi-user's secure encryption and deduplication method based on the wide area network scenario according to claim 1, is characterized in that, in step S4, described sending end uploads the corresponding <fingerprint, ciphertext> set of all sending data blocks To the deduplication agent of the local area network of the sender, and then update all fingerprint information sets in the deduplication agent to the local cache of the sender;

Among them, the local cache of the sender only maintains the fingerprint information corresponding to all data blocks sent or received by all users under the local area network where the sender is located through the fingerprint cache table; the deduplication agent of the sender also maintains the aforementioned fingerprint cache table and the corresponding ciphertext table, The ciphertext table stores ciphertexts of data blocks corresponding to all fingerprints in the fingerprint cache table.

4. the multi-user security-based encryption deduplication method according to claim 1, is characterized in that, in step S5, described receiver will receive the corresponding <fingerprint, ciphertext> of the data block Upload it to the deduplication agent of the local area network of the receiving end, and the deduplication agent of the local area network of the receiving end integrates all fingerprint information sets and returns the integration result to the receiving end.

5. the encryption and deduplication method based on the security of multi-users under the wide area network scenario according to claim 1, it is characterized in that, in step S3, described sending end will the fingerprint of each data block and all fingerprints in the local cache The information is compared, and the corresponding key is used to replace the data block that can be deduplicated.

For each data block, the sender searches the fingerprint table in the local cache to check whether the corresponding fingerprint is stored in the fingerprint table; if there is a matching fingerprint, it is determined that the data block has been transmitted, and deduplication is allowed, and step S2 is adopted. The generated key corresponding to the data block replaces the original data block; if there is no matching fingerprint, the original data block is retained.

6 . The secure encryption and deduplication method based on multiple users in a wide area network scenario according to claim 1 , wherein the receiving end and the sending end are connected by TLS. 7 .

7. the multi-user security-based deduplication method according to claim 1, is characterized in that, in step S5, described by the deduplication agent of the local area network of the receiving end, the receiving end obtains in conjunction with the substitute key After deduplicating the data block, the process of recovering the original text information includes the following steps:

S51, the receiving end receives the data block information encrypted by TLS, and decrypts to obtain the key corresponding to the data block after performing deduplication in step S3;

S52, for each deduplicated data block, the receiving end calculates the fingerprint information corresponding to the data block according to the received key, and uploads the calculated fingerprint information to the deduplication agent of the local area network of the receiving end, so that the deduplication of the local area network of the receiving end is performed. The agent returns the ciphertext of the data block corresponding to the fingerprint information to the receiver;

S53, the receiving end performs decryption to obtain the deduplicated data block, replaces the key in the received ciphertext information with the corresponding data block, and restores the original complete information.

8. The secure encryption and deduplication method based on multi-users in a wide area network scenario according to claim 1, wherein the encryption and deduplication method further comprises the following steps:

The receiving end uses the guard decryption algorithm to detect whether the ciphertext is poisoned, uses the key verification algorithm to confirm whether the current user is an honest user, and removes the poisoned ciphertext, specifically:

The data block is processed based on MLEvd encryption of bilinear mapping e: G ₁ ×G ₂ →G ₃ ; for any symmetric deterministic encryption algorithm SDE=(SK, SE, SD), the following MLEvd algorithm is constructed, where SK is the key Generation algorithm, SE is the encryption algorithm, SD is the decryption algorithm:

A, parameter generation algorithm PG(1 ^λ )→P:

For the input security parameter 1 ^λ , the parameter generation algorithm generates a group G ₁ =<g ₁ >, G ₂ =<g ₂ >, and the order of the group is a prime number p, and their corresponding bilinear mapping is e;