CN117313869B

CN117313869B - A privacy-preserving inference method for large models based on model segmentation

Info

Publication number: CN117313869B
Application number: CN202311418709.9A
Authority: CN
Inventors: 乔一帆; 邵硕; 秦湛; 王志波; 任奎
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-04-05
Anticipated expiration: 2043-10-30
Also published as: CN117313869A

Abstract

The present invention discloses a privacy protection reasoning method for a large model based on model segmentation, and belongs to the field of computer artificial intelligence and large model security technology. Through model segmentation: the Encoder and Decoder of the original large model are deployed on the client, and the middle part of the large model is left locally on the server; model compression: the middle layer is compressed by the server and sent to the client, and a small model with the basic functions of the original large model is formed on the client; model fine-tuning: the client fine-tunes the model through the loss function; model reasoning: the client sends the trained Encoder to the server according to the protocol, and the intermediate result is obtained by reasoning, and then combined with the Decoder on the local server to complete the training. The present invention takes into account both model performance and privacy protection, effectively prevents reconstruction attacks, and has no negative impact on the effect of the model; at the same time, it prevents the model privacy leakage of the large model and the data leakage of the user, has high computational efficiency, and does not require a large amount of computing resources on the client.

Description

A privacy-preserving inference method for large models based on model segmentation

技术领域Technical Field

本发明涉及计算机人工智能和大模型安全技术领域，尤其是涉及一种基于模型分割的大模型隐私保护推理方法。The present invention relates to the fields of computer artificial intelligence and large model security technology, and in particular to a large model privacy protection reasoning method based on model segmentation.

背景技术Background technique

基于模型分割的大模型隐私保护推理在人工智能领域和大模型安全领域有着非常重要的应用。模型分割技术是指将一个完整的神经网络模型分割成两个或多个子模块，然后分别处理这些子模块以完成不同的任务。其核心思想是通过模块化设计,使模型更易于理解、优化和调试。模型分割技术起源于20世纪90年代,早期工作主要采用简单的串联模块连接方式。进入21世纪,提出了更复杂的树结构和多分支连接。如今，人工智能大模型正处在快速发展的阶段，人工智能大模型是指参数规模极大的神经网络模型,通常有数十亿到千亿参数，通过在海量数据上进行预训练获得通用的语言或视觉能力。早期的大模型有GPT系列、BERT等语言模型和Vision Transformer类视觉模型。近两年参数量出现爆炸式增长,出现了百亿参数级别的模型，如GPT-3、Switch Transformer等，随着算力的进一步增强，预计参数量将继续快速增大。Privacy-preserving reasoning for large models based on model segmentation has very important applications in the fields of artificial intelligence and large model security. Model segmentation technology refers to dividing a complete neural network model into two or more sub-modules, and then processing these sub-modules separately to complete different tasks. Its core idea is to make the model easier to understand, optimize and debug through modular design. Model segmentation technology originated in the 1990s, and early work mainly used simple serial module connection methods. In the 21st century, more complex tree structures and multi-branch connections were proposed. Today, artificial intelligence large models are in a stage of rapid development. Artificial intelligence large models refer to neural network models with extremely large parameter scales, usually with billions to hundreds of billions of parameters, and obtain general language or visual capabilities through pre-training on massive data. Early large models include language models such as the GPT series and BERT and vision transformer-like visual models. In the past two years, the number of parameters has exploded, and models with tens of billions of parameters have emerged, such as GPT-3 and Switch Transformer. With the further enhancement of computing power, it is expected that the number of parameters will continue to increase rapidly.

如今，随着深度学习和大模型的发展，模型分割技术在自然语言处理、计算机视觉等领域得到广泛应用。模型分割技术通过模块化设计降低训练难度,提高模型适应新任务的能力,是实现迁移学习的重要技术手段。随着迁移学习技术的不断发展，新兴的Offsite-Tuning技术采用了一种隐私保护的方法，数据所有者不需要将其敏感数据共享给模型所有者。传统的迁移学习方法可能需要数据所有者共享其数据并支付昂贵的费用，以便模型所有者能够进行完全微调，而Offsite-Tuning通过将轻量级适配器和模拟器发送给数据所有者，降低了共享数据的需求，从而降低了成本。对于大型基础模型，Offsite-Tuning在计算上更加高效，数据所有者只需在本地对适配器进行微调，而不需要访问完整模型权重，因此可以节省大量计算时间和资源。Nowadays, with the development of deep learning and large models, model segmentation technology has been widely used in natural language processing, computer vision and other fields. Model segmentation technology reduces the difficulty of training through modular design and improves the ability of the model to adapt to new tasks. It is an important technical means to achieve transfer learning. With the continuous development of transfer learning technology, the emerging Offsite-Tuning technology adopts a privacy-preserving method, and the data owner does not need to share his sensitive data with the model owner. Traditional transfer learning methods may require data owners to share their data and pay expensive fees so that the model owner can fully fine-tune, while Offsite-Tuning reduces the need to share data by sending lightweight adapters and simulators to data owners, thereby reducing costs. For large base models, Offsite-Tuning is more computationally efficient. The data owner only needs to fine-tune the adapter locally without accessing the full model weights, so a lot of computing time and resources can be saved.

但是迁移学习也存在着一些安全性上的问题，例如针对源模型或目标模型的对抗样本攻击，利用模型提取进行模型泄露，以及源域和目标域样本数据可能存在隐私泄露风险等，因此本发明提出了一种基于模型分割的大模型隐私保护推理方法，以解决上述问题。However, transfer learning also has some security issues, such as adversarial sample attacks on the source model or target model, model leakage through model extraction, and the risk of privacy leakage of source and target domain sample data. Therefore, the present invention proposes a large model privacy-preserving reasoning method based on model segmentation to solve the above problems.

发明内容Summary of the invention

本发明的目的是提供一种基于模型分割的大模型隐私保护推理方法，兼顾了模型性能和隐私保护，有效的防范了重建攻击，且对于模型的效果没有负面影响；同时模型分割技术防止了大模型的模型隐私泄露和用户的数据泄露，计算效率高，在客户端不需要大量的计算资源。The purpose of the present invention is to provide a large model privacy protection reasoning method based on model segmentation, which takes into account both model performance and privacy protection, effectively prevents reconstruction attacks, and has no negative impact on the effect of the model; at the same time, the model segmentation technology prevents the model privacy leakage of the large model and the user's data leakage, has high computational efficiency, and does not require a large amount of computing resources on the client.

为实现上述目的，本发明采用了如下技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种基于模型分割的大模型隐私保护推理方法，包括以下步骤：A large model privacy-preserving reasoning method based on model segmentation includes the following steps:

S1、模型分割：将原始大模型的前n层Encoder和最后n层Decoder部署在客户端，大模型的中间部分留在服务端本地；S1. Model segmentation: The first n layers of encoders and the last n layers of decoders of the original large model are deployed on the client, and the middle part of the large model is kept locally on the server.

S2、模型压缩：通过服务器端将中间层压缩后发送至客户端，并在客户端组成一个具有原始大模型基本功能的小模型；S2, model compression: The server compresses the intermediate layer and sends it to the client, and forms a small model with the basic functions of the original large model on the client;

S3、模型微调：客户端通过损失函数微调模型；S3, model fine-tuning: The client fine-tunes the model through the loss function;

S4、模型推理：客户端将训练好的前n层Encoder按协议发送至服务器端，然后进行推理得到中间结果，再与本地服务器端的最后n层Decoder结合完成训练，得到最终结果。S4, model reasoning: The client sends the trained first n layers of Encoder to the server according to the protocol, and then performs reasoning to obtain the intermediate results, and then combines them with the last n layers of Decoder on the local server to complete the training and obtain the final result.

优选的，步骤S1中，客户端的原始数据用于训练n个模型层且不离开本地，通过在客户端完成大模型前n层和最后n层的训练，得到训练好的Encoder和Decoder。Preferably, in step S1, the original data of the client is used to train n model layers without leaving the local machine, and the trained Encoder and Decoder are obtained by completing the training of the first n layers and the last n layers of the large model on the client.

优选的，步骤S2中，将步骤S1的剩余中间层压缩后形成一个用于在适配过程中提供近似梯度方向的模拟器模块，模拟器模块由服务器端发送至客户端，在客户端和原始大模型部署的Encoder和Decoder结合，形成一个完整的小模型。Preferably, in step S2, the remaining intermediate layers of step S1 are compressed to form a simulator module for providing approximate gradient directions during the adaptation process. The simulator module is sent from the server to the client, and is combined with the Encoder and Decoder deployed on the client and the original large model to form a complete small model.

优选的，步骤S3中，通过损失函数对步骤S2得到的小模型进行微调，损失函数L为：Preferably, in step S3, the small model obtained in step S2 is fine-tuned by a loss function, and the loss function L is:

其中，L₁为原本大模型任务的损失函数，L₂为f₂、f'₂、f'_n-2的余弦距离，f₂为中间特征，f'₂为f_n-2与原本大模型的中间特征。Among them, _L1 is the loss function of the original large model task, _L2 is the cosine distance of _f2 , _f'2 , and f'n _-2 , _f2 is the intermediate feature, and _f'2 is the intermediate feature between fn _-2 and the original large model.

优选的，步骤S4中，Encoder在客户端进行任务微调后与服务器端结合，并与大模型的剩余部分进行联合训练，协调一致服务器端与客户端之间的模型参数，最后服务器端将训练好的中间结果发送回客户端并与训练好的Encoder结合，提炼模型，得到最终输出。Preferably, in step S4, the Encoder is combined with the server after fine-tuning the task on the client, and is jointly trained with the rest of the large model to coordinate the model parameters between the server and the client. Finally, the server sends the trained intermediate results back to the client and combines them with the trained Encoder to refine the model and obtain the final output.

因此，本发明采用上述一种基于模型分割的大模型隐私保护推理方法，实现的有益效果为：Therefore, the present invention adopts the above-mentioned large model privacy protection reasoning method based on model segmentation to achieve the following beneficial effects:

1、本发明通过将大模型的部分部署在客户端本地并进行训练，可以同时保护服务器端的模型隐私和用户端的数据隐私。1. The present invention can protect both the model privacy on the server side and the data privacy on the user side by deploying part of the large model locally on the client and performing training.

2、通过在客户端本地利用模型压缩技术的微调方法，将大模型的中间部分压缩成为一个模拟器，协助客户端本地进行Encoder和Decoder的微调，且同时均衡了模型性能和隐私保护的损失函数，保证了微调模型时的性能不下降的同时防范了重建攻击。2. By using the fine-tuning method of model compression technology locally on the client, the middle part of the large model is compressed into a simulator to assist the client in fine-tuning the encoder and decoder locally, while balancing the loss function of model performance and privacy protection, ensuring that the performance of the fine-tuning model does not decrease while preventing reconstruction attacks.

下面通过附图和实施例，对本发明的技术方案做进一步的详细描述。The technical solution of the present invention is further described in detail below through the accompanying drawings and embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

具体实施方式Detailed ways

以下通过附图和实施例对本发明的技术方案作进一步说明。The technical solution of the present invention is further described below through the accompanying drawings and embodiments.

除非另外定义，本发明使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本发明中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性，而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同，而不排除其他元件或者物件。术语“设置”、“安装”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。“上”、“下”、“左”、“右”等仅用于表示相对位置关系，当被描述对象的绝对位置改变后，则该相对位置关系也可能相应地改变。Unless otherwise defined, the technical terms or scientific terms used in the present invention shall have the usual meanings understood by persons with ordinary skills in the field to which the present invention belongs. The words "first", "second" and similar words used in the present invention do not indicate any order, quantity or importance, but are only used to distinguish different components. The words "include" or "comprise" and similar words mean that the elements or objects appearing before the word include the elements or objects listed after the word and their equivalents, but do not exclude other elements or objects. The terms "set", "install" and "connect" should be understood in a broad sense, for example, it can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be directly connected, or indirectly connected through an intermediate medium, or it can be the internal connection of two elements. "Up", "down", "left", "right" and the like are only used to indicate relative positional relationships. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.

实施例Example

如图1所示，本发明提供了一种基于模型分割的大模型隐私保护推理方法，包括以下步骤：As shown in FIG1 , the present invention provides a large model privacy protection reasoning method based on model segmentation, comprising the following steps:

一、(步骤S1)模型分割：将原始大模型的前n层Encoder和最后n层Decoder部署在客户端，大模型的中间部分留在服务端本地。1. (Step S1) Model segmentation: The first n layers of encoders and the last n layers of decoders of the original large model are deployed on the client, and the middle part of the large model is left locally on the server.

原始大模型包括Transformer模型、Seq2Seq模型等；前n层设置为w₁、w₁等；最后n层设置为w_n-1、w_n等。The original large model includes the Transformer model, Seq2Seq model, etc.; the first n layers are set to w ₁ , w ₁ , etc.; the last n layers are set to w _n-1 , w _n , etc.

通过在客户端完成前n层和最后n层的训练，得到训练好的Encoder和Decoder，这个策略确保了客户端的数据在训练过程中对于服务器端来说是不可见的，且客户端的原始数据不离开本地，仅仅用于训练这n个关键的模型层。不仅可以保护敏感数据，不会在网络传输中暴露，而且降低了数据泄露的风险。只有前n层和最后n层需要在客户端进行训练，相比整个大型模型，所需的计算资源大幅减少，即使在资源有限的设备上，也能够轻松地进行训练，无需庞大的计算集群，这种有效的计算资源利用方式降低了训练过程的成本和复杂度。By completing the training of the first n layers and the last n layers on the client, we can obtain the trained Encoder and Decoder. This strategy ensures that the client data is invisible to the server during the training process, and the client's original data does not leave the local computer, but is only used to train these n key model layers. This not only protects sensitive data from being exposed during network transmission, but also reduces the risk of data leakage. Only the first n layers and the last n layers need to be trained on the client. Compared with the entire large model, the required computing resources are greatly reduced. Even on devices with limited resources, training can be easily performed without the need for a large computing cluster. This effective way of utilizing computing resources reduces the cost and complexity of the training process.

二、(步骤S2)模型压缩：通过服务器端将中间层压缩后发送至客户端，并在客户端组成一个具有原始大模型基本功能的小模型。2. (Step S2) Model compression: The intermediate layer is compressed by the server and sent to the client, and a small model with the basic functions of the original large model is formed on the client.

1)服务器端将步骤S1的剩余中间层R压缩后形成一个模拟器模块E，用于在适配过程中提供近似的梯度方向，这一部分包含了原始模型的主要功能信息，是固定的不可训练的部分。1) The server compresses the remaining intermediate layer R of step S1 to form a simulator module E, which is used to provide approximate gradient directions during the adaptation process. This part contains the main functional information of the original model and is a fixed and untrainable part.

2)模拟器模块的构建过程发生在服务器端，其中的中间层被经过精心设计的压缩算法进行处理，以确保模拟器的大小得以最小化，同时保留了关键的模型功能特性。这个压缩过程是有损的，旨在将模型的主要信息保留下来，同时减小了模拟器模块的尺寸，使其能够高效地在网络上传输。2) The simulator module construction process occurs on the server side, where the intermediate layer is processed by a carefully designed compression algorithm to ensure that the size of the simulator is minimized while retaining key model functional characteristics. This compression process is lossy and aims to preserve the main information of the model while reducing the size of the simulator module so that it can be efficiently transmitted on the network.

3)将上述得到的模拟器模块由服务器端发送至客户端，在客户端和原始大模型部署的Encoder和Decoder结合，形成一个完整的小模型。这个小模型具有足够的性能和功能来完成具体的任务，同时还受到了模拟器模块的辅助，在小模型的协助下用户使用自己的数据调整Encoder和Decoder。3) The simulator module obtained above is sent from the server to the client, where it is combined with the encoder and decoder deployed by the original large model to form a complete small model. This small model has sufficient performance and functions to complete specific tasks, and is also assisted by the simulator module. With the assistance of the small model, users can adjust the encoder and decoder using their own data.

三、(步骤S3)模型微调：客户端通过损失函数微调模型。3. (Step S3) Model fine-tuning: The client fine-tunes the model through the loss function.

对步骤S2得到的小模型利用损失函数进行微调，具体公式为：The small model obtained in step S2 is fine-tuned using the loss function. The specific formula is:

1)原本大模型任务的损失函数L₁通常根据具体任务的性质而定，可以是交叉熵损失、最大似然损失或其他适用于任务的损失函数，L₁的作用是确保微调后的小模型在原始任务上保持高性能，即模型的性能不会下降，这部分损失函数确保了模型在任务上的有效性。1) The loss function _L1 of the original large model task is usually determined according to the nature of the specific task. It can be cross entropy loss, maximum likelihood loss or other loss functions suitable for the task. The role of _L1 is to ensure that the fine-tuned small model maintains high performance on the original task, that is, the performance of the model will not decrease. This part of the loss function ensures the effectiveness of the model on the task.

2)余弦距离L₂度量了模型在中间特征空间中的相似性，通过最小化L₂的损失，可以防范重建攻击(即攻击者试图根据中间特征来重构原始数据)，且L₂损失的引入增强了模型的安全性，确保了模型的中间表示不容易泄露敏感信息。2) The cosine distance _L2 measures the similarity of the model in the intermediate feature space. By minimizing the _L2 loss, reconstruction attacks (i.e., attackers try to reconstruct the original data based on intermediate features) can be prevented. The introduction of _L2 loss enhances the security of the model and ensures that the intermediate representation of the model is not easy to leak sensitive information.

四、(步骤S4)模型推理：客户端将训练好的前n层Encoder按协议发送至服务器端，然后进行推理得到中间结果，再与本地服务器端的最后n层Decoder结合完成训练，得到最终结果。4. (Step S4) Model inference: The client sends the trained first n layers of Encoder to the server according to the protocol, and then performs inference to obtain intermediate results, and then combines them with the last n layers of Decoder on the local server to complete the training and obtain the final result.

客户端将经过微调的训练完成的Encoder发送至服务器端，Encoder通过任务微调，以适应特定的应用场景或任务要求；服务器端将会与这个Encoder结合，与大模型的剩余部分进行联合训练。这个过程确保了服务器端和客户端之间的模型参数协调一致，从而最大程度地维持了模型的性能和功能。The client sends the fine-tuned trained encoder to the server, and the encoder is fine-tuned to adapt to specific application scenarios or task requirements; the server will combine with this encoder and conduct joint training with the rest of the large model. This process ensures that the model parameters between the server and the client are coordinated and consistent, thereby maintaining the performance and functionality of the model to the greatest extent.

服务器端将训练后的中间结果发送回客户端，这些中间结果包含了服务器端对模型的进一步训练结果以及可能的性能提升；客户端将这些中间结果与自身的训练完成的Encoder结合，进一步提炼模型，得到最终输出。步骤S4是整个流程的最后一环，旨在确保模型在服务器端和客户端的协作下达到了最佳性能水平。The server sends the intermediate results after training back to the client. These intermediate results include the server's further training results of the model and possible performance improvements. The client combines these intermediate results with its own trained encoder to further refine the model and obtain the final output. Step S4 is the last part of the entire process, which aims to ensure that the model reaches the best performance level through the collaboration between the server and the client.

因此，本发明采用一种基于模型分割的大模型隐私保护推理方法，兼顾了模型性能和隐私保护，有效的防范了重建攻击，且对于模型的效果没有负面影响；同时模型分割技术防止了大模型的模型隐私泄露和用户的数据泄露，计算效率高，在客户端不需要大量的计算资源。Therefore, the present invention adopts a large model privacy protection reasoning method based on model segmentation, which takes into account both model performance and privacy protection, effectively prevents reconstruction attacks, and has no negative impact on the model effect; at the same time, model segmentation technology prevents the model privacy leakage of large models and the data leakage of users, has high computational efficiency, and does not require a large amount of computing resources on the client.

最后应说明的是：以上实施例仅用以说明本发明的技术方案而非对其进行限制，尽管参照较佳实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对本发明的技术方案进行修改或者等同替换，而这些修改或者等同替换亦不能使修改后的技术方案脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that they can still modify or replace the technical solution of the present invention with equivalents, and these modifications or equivalent replacements cannot cause the modified technical solution to deviate from the spirit and scope of the technical solution of the present invention.

Claims

1. The large model privacy protection reasoning method based on model segmentation is characterized by comprising the following steps of:

s1, model segmentation: deploying the front n layers of encoders and the last n layers of encoders of the original large model at the client, and leaving the middle part of the large model at the server;

s2, model compression: the middle layer is compressed through the server side and then sent to the client side, and a small model with the basic function of the original large model is formed at the client side;

s3, fine adjustment of a model: the client side carries out fine adjustment on the small model obtained in the step S2 through a loss function;

s4, model reasoning: the client sends the trained front n layers of encodings to the server according to the protocol, then performs reasoning to obtain an intermediate result, and then completes training by combining with the last n layers of encodings of the client to obtain a final result;

in step S2, the rest intermediate layer in step S1 is compressed to form a simulator module for providing approximate gradient direction in the adapting process, the simulator module is sent to the client side from the server side, and the client side is combined with the Encoder and the Decode deployed by the original large model to form a complete small model.

2. The large model privacy preserving reasoning method based on model segmentation as set forth in claim 1, wherein: in step S1, the original data of the client is used for training 2n model layers without leaving the local area, and the trained Encoder and the Encoder are obtained by training the n layers before and the last n layers before the client completes the large model.

3. The large model privacy preserving reasoning method based on model segmentation as set forth in claim 2, wherein: in step S4, the Encoder is combined with the server after the client performs task fine tuning, and is combined with the rest of the large model for training, model parameters between the server and the client are coordinated and consistent, and finally the server sends a trained intermediate result back to the client and is combined with the last n layers of decoders for training, so that a final result is obtained.