TW202139045A - Privacy protection-based target service model determination - Google Patents

Privacy protection-based target service model determination Download PDF

Info

Publication number
TW202139045A
TW202139045A TW110110604A TW110110604A TW202139045A TW 202139045 A TW202139045 A TW 202139045A TW 110110604 A TW110110604 A TW 110110604A TW 110110604 A TW110110604 A TW 110110604A TW 202139045 A TW202139045 A TW 202139045A
Authority
TW
Taiwan
Prior art keywords
model
sub
business
business model
initial
Prior art date
Application number
TW110110604A
Other languages
Chinese (zh)
Other versions
TWI769754B (en
Inventor
濤 熊
Original Assignee
大陸商支付寶(杭州)信息技術有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商支付寶(杭州)信息技術有限公司 filed Critical 大陸商支付寶(杭州)信息技術有限公司
Publication of TW202139045A publication Critical patent/TW202139045A/en
Application granted granted Critical
Publication of TWI769754B publication Critical patent/TWI769754B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Abstract

Embodiments of the description provide a privacy protection-based target service model determination method and device. The method comprises: initially training a selected complex service model to obtain an initial service model, then pruning the initial service model, and training the pruned service model under the condition that a parameter is reset to an initialized state, so as to check whether a pruned model parameter is always unimportant, and selecting a target service model from multiple obtained sub-models by means of a differential privacy mode. In this way, a compression model for privacy protection can be obtained, and the privacy protection is provided for the model on the basis of realizing model compression.

Description

基於隱私保護確定目標業務模型的方法及裝置Method and device for determining target business model based on privacy protection

本說明書一個或多個實施例涉及電腦技術領域,尤其涉及透過電腦基於隱私保護確定目標業務模型的方法和裝置。 One or more embodiments of this specification relate to the field of computer technology, and in particular to a method and device for determining a target business model based on privacy protection through a computer.

隨著機器學習技術的發展,深度類神經網路(Deep Neural Network,DNN)由於模仿人腦的思考方式,比簡單的線性模型有更好的效果,而受到本領域技術人員的青睞。深度類神經網路是一種具備至少一個隱藏層的類神經網路,能夠為複雜非線性系統建模,提高模型能力。 深度類神經網路由於複雜的網路結構,特徵和模型參數體系也非常大。例如,一個深度類神經網路可以包括高達數百萬個參數。因此,希望尋求模型壓縮的方法,減少模型的資料量和複雜度。為此,常規技術中通常利用訓練樣本調整深度類神經網路中的數百萬個參數,然後刪除或“修剪”不必要的權重,以將網路結構縮減到更易於管理的大小。減小模型尺寸有助於最大程度地減小其內部儲存器、推理和計算需求。在一些業務場景中,類神經網路中的許多權重有時可以被削減多達99%,從而產生更小、更稀疏的網路。 然而,這種訓練完成之後又刪減的方式,需要較高的計算成本,進行了大量“無效”計算。於是設想在原始類神經網路的子網路中尋找一個盡可能滿足要求的網路進行訓練。同時,常規技術中,較簡單的類神經網路更易於獲取原始資料。為此,需要提供一種方法,即能夠保護資料的隱私,又可以壓縮模型的大小來實現實時計算和端上部署,從多方面提高模型的性能。 With the development of machine learning technology, Deep Neural Networks (DNN) are favored by those skilled in the art because they mimic the way of thinking of the human brain and have better effects than simple linear models. A deep neural network is a neural network with at least one hidden layer, which can model complex nonlinear systems and improve model capabilities. Due to the complex network structure of the deep neural network, the feature and model parameter system is also very large. For example, a deep neural network can include up to millions of parameters. Therefore, it is hoped to find a method of model compression to reduce the amount of data and complexity of the model. For this reason, conventional techniques usually use training samples to adjust millions of parameters in a deep neural network, and then delete or "prune" unnecessary weights to reduce the network structure to a more manageable size. Reducing the size of the model helps minimize its internal storage, reasoning, and calculation requirements. In some business scenarios, many weights in a neural network can sometimes be reduced by as much as 99%, resulting in a smaller and sparser network. However, this method of deleting after the training is completed requires a high computational cost, and a large number of "invalid" calculations are performed. So imagine looking for a network that satisfies the requirements as much as possible for training in the sub-network of the original neural network. At the same time, in conventional technology, simpler neural networks are easier to obtain original data. For this reason, it is necessary to provide a method that can protect the privacy of data, and can compress the size of the model to realize real-time calculation and end-to-end deployment, and improve the performance of the model in many ways.

本說明書一個或多個實施例描述了一種基於隱私保護確定目標業務模型的方法及裝置,用以解決背景技術提到的一個或多個問題。 根據第一方面,提供了一種基於隱私保護確定目標業務模型的方法,所述目標業務模型用於處理給定的業務資料,得到相應的業務預測結果;所述方法包括:按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化所述選定的業務模型;使用多個訓練樣本訓練經過初始化的所述選定的業務模型至模型參數收斂,得到初始業務模型;基於對所述初始業務模型的修剪,確定所述初始業務模型的多個子模型,其中,各個子模型各自對應有透過以下方式重新訓練確定的模型參數以及模型指標:將修剪後的業務模型的模型參數重置為初始化的業務模型中的相應模型參數的初始值;將多個訓練樣本依次輸入修剪後的業務模型,並基於相應樣本標簽與修剪後的業務模型的輸出結果的對比,調整模型參數;基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型。 在一個實施例中,所述基於對所述初始業務模型的修剪,確定所述初始業務模型的多個子模型包括:按照所述初始業務模型的模型參數,對所述初始業務模型進行修剪,得到第一修剪模型;將對應有經過重新訓練得到的模型參數的第一修剪模型,作為第一子模型;迭代修剪所述第一子模型得到後續子模型,直至滿足結束條件。 在一個實施例中,所述結束條件包括,迭代次數達到預定次數、子模型數量達到預定數量、最後一個子模型的規模小於設定的規模閾值中的至少一項。 在一個實施例中,對模型的修剪基於以下之一的方式,按照模型參數由小到大的順序進行:修剪掉預定比例的模型參數、修剪掉預定數量的模型參數、修剪得到規模不超過預定大小的模型。 在一個實施例中,所述差分隱私的第一方式為指數機制,所述基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型包括:按照各個子模型各自對應的模型指標,確定各個子模型分別對應的各個可用性係數;根據各個可用性係數,利用指數機制確定各個子模型分別對應的各個採樣機率;按照各個採樣機率在所述多個子模型中採樣,將被採樣到的子模型作為目標業務模型。 在一個實施例中,所述方法還包括:利用多個訓練樣本對所述目標業務模型基於差分隱私的第二方式進行訓練,使得訓練後的目標業務模型用於針對給定的業務資料進行保護資料隱私的業務預測。 在一個實施例中,所述多個訓練樣本包括第一批樣本,所述第一批樣本中的樣本i對應有經所述目標業務模型處理後得到的損失,所述利用多個訓練樣本對所述目標業務模型基於差分隱私的第二方式進行訓練包括:確定所述樣本i對應的損失的原始梯度;利用所述差分隱私的第二方式在所述原始梯度上添加雜訊,得到包含雜訊的梯度;利用所述包含雜訊的梯度,以最小化所述樣本i對應的損失為目標,調整所述目標業務模型的模型參數。 在一個實施例中,所述差分隱私的第二方式為添加高斯雜訊,所述利用所述差分隱私的第二方式在所述原始梯度上添加雜訊,得到包含雜訊的梯度包括:基於預設的裁剪閾值,對所述原始梯度進行裁剪,得到裁剪梯度;利用基於所述裁剪閾值確定的高斯分佈,確定用於實現差分隱私的高斯雜訊,其中,所述高斯分佈的變異數與所述裁剪閾值的平方正相關;將所述高斯雜訊與所述裁剪梯度疊加,得到所述包含雜訊的梯度。 在一個實施例中,所述業務資料包括圖片、音檔、字符中的至少一項。 根據第二方面,提供了一種基於隱私保護確定目標業務模型的裝置,所述目標業務模型用於處理給定的業務資料,得到相應的業務預測結果;所述裝置包括: 初始化單元,配置為按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化所述選定的業務模型; 初始訓練單元,配置為使用多個訓練樣本訓練經過初始化的所述選定的業務模型至模型參數收斂,得到初始業務模型; 修剪單元,配置為基於對所述初始業務模型的修剪,確定所述初始業務模型的多個子模型,其中,各個子模型各自對應有透過所述初始化單元以下和所述初始訓練單元重新訓練確定的模型參數以及模型指標:所述初始化單元將修剪後的業務模型的模型參數重置為初始化的業務模型中的相應模型參數的初始值;所述初始訓練單元將多個訓練樣本依次輸入修剪後的業務模型,並基於相應樣本標簽與修剪後的業務模型的輸出結果的對比,調整模型參數; 確定單元,配置為基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型。 根據第三方面,提供了一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行第一方面的方法。 根據第四方面,提供了一種計算設備,包括儲存器和處理器,其特徵在於,所述儲存器中儲存有可執行碼,所述處理器執行所述可執行碼時,實現第一方面的方法。 透過本說明書實施例提供的方法和裝置,先對選定的複雜業務模型進行初始訓練,得到初始業務模型,然後對初始業務模型進行修剪,並對修剪後的業務模型在參數重置回初始化狀態的情形下進行訓練,以考驗修剪掉的模型參數是否自始不重要。對於得到的多個子模型,透過差分隱私的方式,從中選擇目標業務模型。這樣,可以獲取隱私保護的壓縮模型,在實現模型壓縮的基礎上,為模型提供隱私保護。 One or more embodiments of this specification describe a method and device for determining a target business model based on privacy protection, so as to solve one or more problems mentioned in the background art. According to the first aspect, a method for determining a target business model based on privacy protection is provided. The target business model is used to process given business data to obtain corresponding business prediction results; the method includes: The business model determines the initial value corresponding to each model parameter, thereby initializing the selected business model; using multiple training samples to train the initialized selected business model until the model parameters converge to obtain the initial business model; The pruning of the initial business model determines multiple sub-models of the initial business model, wherein each sub-model corresponds to the model parameters and model indicators determined by retraining in the following way: reset the model parameters of the pruned business model to The initial values of the corresponding model parameters in the initialized business model; input multiple training samples into the pruned business model in turn, and adjust the model parameters based on the comparison of the corresponding sample labels with the output results of the pruned business model; based on each sub The model indicators corresponding to each model use the first method of differential privacy to select the target business model from each sub-model. In one embodiment, the determining the multiple sub-models of the initial business model based on the pruning of the initial business model includes: pruning the initial business model according to the model parameters of the initial business model to obtain The first pruning model; the first pruning model corresponding to the model parameters obtained through retraining is used as the first sub-model; the first sub-model is iteratively pruned to obtain subsequent sub-models until the end condition is satisfied. In an embodiment, the end condition includes at least one of the number of iterations reaching a predetermined number, the number of sub-models reaching a predetermined number, and the scale of the last sub-model is less than a set scale threshold. In one embodiment, the pruning of the model is based on one of the following methods, in the order of model parameters from small to large: pruning the model parameters of a predetermined proportion, pruning a predetermined number of model parameters, and pruning the scale to obtain a size not exceeding a predetermined The size of the model. In one embodiment, the first method of differential privacy is an exponential mechanism, and the first method of differential privacy to select a target business model from each sub-model based on the model indicators corresponding to each sub-model includes: Each sub-model corresponds to the model index to determine the respective availability coefficients of each sub-model; according to each availability coefficient, the index mechanism is used to determine the respective sampling probabilities of each sub-model; in the multiple sub-models according to each sampling probability Sampling, using the sampled sub-model as the target business model. In one embodiment, the method further includes: using a plurality of training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used to protect the given business data Business forecasts for data privacy. In an embodiment, the multiple training samples include a first batch of samples, and sample i in the first batch of samples corresponds to a loss obtained after processing by the target business model, and the multiple training sample pairs Training the target service model based on the second method of differential privacy includes: determining the original gradient of the loss corresponding to the sample i; using the second method of differential privacy to add noise to the original gradient to obtain the The gradient of the signal; using the gradient containing the noise, to minimize the loss corresponding to the sample i as the goal, adjust the model parameters of the target business model. In one embodiment, the second method of differential privacy is to add Gaussian noise, and the second method of using the differential privacy to add noise to the original gradient to obtain a gradient containing noise includes: A preset cropping threshold is used to crop the original gradient to obtain a cropped gradient; the Gaussian distribution determined based on the cropping threshold is used to determine the Gaussian noise used to achieve differential privacy, wherein the variance of the Gaussian distribution is equal to The square of the clipping threshold is positively correlated; the Gaussian noise and the clipping gradient are superimposed to obtain the gradient containing the noise. In an embodiment, the business data includes at least one of pictures, audio files, and characters. According to a second aspect, a device for determining a target business model based on privacy protection is provided, the target business model is used to process given business data to obtain corresponding business prediction results; the device includes: The initialization unit is configured to determine the respective initial values of each model parameter for the selected business model in a predetermined manner, so as to initialize the selected business model; The initial training unit is configured to use a plurality of training samples to train the initialized selected business model until the model parameters converge to obtain the initial business model; The pruning unit is configured to determine a plurality of sub-models of the initial business model based on the pruning of the initial business model, wherein each sub-model corresponds to the one determined through the initialization unit and retraining of the initial training unit Model parameters and model indicators: the initialization unit resets the model parameters of the pruned business model to the initial values of the corresponding model parameters in the initialized business model; the initial training unit sequentially inputs multiple training samples into the pruned business model Business model, and adjust the model parameters based on the comparison between the corresponding sample label and the output result of the pruned business model; The determining unit is configured to select a target business model from each sub-model by using the first method of differential privacy based on the model index corresponding to each sub-model. According to a third aspect, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect. According to a fourth aspect, there is provided a computing device, including a storage and a processor, characterized in that executable code is stored in the storage, and when the processor executes the executable code, the first aspect is implemented method. Through the method and device provided in the embodiment of this specification, the selected complex business model is first trained to obtain the initial business model, and then the initial business model is pruned, and the parameters of the pruned business model are reset to the initialization state. Training is carried out under the circumstances to test whether the trimmed model parameters are not important from the beginning. For the multiple sub-models obtained, the target business model is selected through differential privacy. In this way, a compression model for privacy protection can be obtained, and on the basis of implementing model compression, privacy protection is provided for the model.

下面結合圖式,對本說明書提供的方案進行描述。 圖1示出了根據本說明書技術構思的一個實施架構示意圖。本說明書的技術構思下,業務模型可以是用於對業務資料進行諸如分類、評分等各種業務處理的機器學習模型。圖1示出的業務模型透過類神經網路實現,實踐中,還可以透過其他方式實現,例如決策樹、線性回歸等等。業務資料可以是字符、音檔、圖像、動畫等多種方式中的至少一種,根據具體的業務場景確定,在此不作限定。 例如,業務模型可以是用於借貸平臺輔助評估用戶借貸業務風險度的機器學習模型,針對的業務資料可以是單個用戶的歷史借貸行為資料、違約資料、用戶畫像等等,業務預測結果為用戶的風險分數。再例如,業務模型也可以是用於對圖片上的目標進行分類的模型(如卷積類神經網路),針對的業務資料可以是各種圖片,業務預測結果例如可以是第一目標(如小汽車)、第二目標(自行車)、其他類別等。 特別地,本說明書實施架構尤其適用於業務模型是較複雜的非線性模型的情況。基於隱私保護確定目標業務模型的過程可以是從複雜的初始業務模型中確定出模型指標符合要求的精簡子模型的過程。 以業務模型為類神經網路為例,如圖1所示,初始類神經網路可以是較複雜的類神經網路,該類神經網路中可以包括較多的特徵、權重參數、其他參數(如常數參數、輔助矩陣)等。初始類神經網路的模型參數可以透過預定方式初始化,例如隨機初始化、設定為預定值等。在該實施架構下,首先經過多個訓練樣本對初始類神經網路進行訓練,至初始類神經網路的模型參數(或者損失函數)收斂。之後,對初始類神經網路進行修剪,得到多個子網路。在對類神經網路修剪過程中,可以按照預定參數比例(如20%)、預定參數數量(如1000個)、預定規模(如至少20兆位元組)等等方式進行。 常規技術中,對初始類神經網路的修剪得到的子網路通常採用繼續訓練、在其基礎上再次修剪、繼續訓練這樣的方式進行。也就是說,是對初始類神經網路一步步壓縮的過程。而在本說明書實施例的構思下,在對初始類神經網路進行修剪之後,將修剪得到的子網路進行參數重置(恢復初始化狀態),並對重置參數後的修剪網路進行訓練。這樣做的目的是可以檢驗被修剪掉的類神經網路結構是否自始不需要。這種是否自始不需要的結論可以透過模型的評價指標,例如準確度、召回率、收斂性等進行展現。 值得說明的是,類神經網路的修剪可以包括去除類神經網路中的一部分神經元以及/或者去除神經元中的一部分連接關係的過程。在可選的實現方式中,捨棄哪些神經元,可以以神經元對應的權重參數作為參考。權重參數描述出神經元的重要度,以全連接類神經網路為例,可以將一個神經元到對映到下一層的各個神經元分別對應的各個權重求平均,或者取最大值,得到參考權重。進一步按照各個神經元的參考權重由小到大的順序進行捨棄(修剪)。 如圖2所示,給出了本說明書實施架構下一個具體例子的子網路修剪流程。在圖2中,對於修剪之後的剩餘部分的類神經網路,將模型參數重置到初始化狀態,利用訓練樣本對其重新訓練,得到第一子網路。同時,可以將第一子網路的網路結構、評價指標等記錄下來。然後,如左側箭頭所示,進入修剪模型的步驟開始循環。按照訓練好的第一子網路中的模型參數,對第一子網路進行修剪,並針對修剪後的類神經網路,將其模型參數重置到初始化狀態,利用訓練樣本對其重新訓練,作為第二子網路。繼續沿左側箭頭循環,以此類推,直至得到滿足結束條件的第N子網路。其中,這裡的結束條件例如可以是,迭代次數達到預定次數(如預設次數N)、子模型數量達到預定數量(如預設數量N)、最後一個子模型的規模小於設定的規模閾值(如100兆位元組等)等等中的至少一項。 如此,可以得到初始類神經網路的多個子網路。在一些可選的實施方式中,圖2左側的箭頭可以回到最上端,也就是得到第一子網路後,重新初始化最初的類神經網路,訓練該重新初始化的類神經網路,並進行修剪,對修剪後的子網路進行訓練,作為第二子網路,以此類推,直至得到第N子網路。其中,各個子網路可以具有不同的規模,例如第一子網路為初始類神經網路的80%,第二子網路為初始類神經網路的60%,等等。在這種方式下,每次初始化類神經網路時,還可以進行一些隨機化,也就是說每次在特徵或者神經元上進行隨機採樣,捨棄一小部分(如1%)特徵及初始化參數,以對初始的類神經網路造成小的擾動,在保證每次的初始化類神經網路都和最初的類神經網路一致的情況下,具有小差別,以考驗不同的神經元作用。 繼續參考圖1所示。針對各個子網路,可以從中選擇出一個子網路作為目標類神經網路。根據一個實施例,為了保護資料隱私,可以將修剪得到的各個子網路看作初始類神經網路的子網路集,基於差分隱私原理,隨機選擇出一個子網路作為目標類神經網路。這樣透過差分隱私的方式,基於隱私保護確定目標業務模型,可以更好地保護業務模型和/或業務資料隱私,提高目標類神經網路的實用性。 可以理解的是,圖1示出的實施架構以業務模型是類神經網路為例,當業務模型是其他機器學習模型時,以上描述中的神經元也可以換做其他模型元素,例如業務模型是決策樹時,神經元可以換成決策樹中的樹節點,等等。 目標類神經網路用於針對業務資料進行業務預測,得到相應的業務預測結果。例如,根據圖片資料,得到識別到的目標類別的業務預測結果,根據用戶行為資料,得到用戶的金融借貸風險性的業務預測結果,等等。 下面詳細描述基於隱私保護確定目標業務模型的具體流程。 圖3示出一個實施例的基於隱私保護確定目標業務模型的流程。其中,這裡的業務模型可以是用於針對給定的業務資料,進行諸如分類、評分等業務處理的模型。這裡的業務資料可以是文字、圖像、語音、影片、動畫等各種類型的資料。該流程的執行主體可以是具有一定計算能力的系統、設備、裝置、平臺或伺服器。 如圖3所示,基於隱私保護確定目標業務模型的方法可以包括以下步驟:步驟301,按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化選定的業務模型;步驟302,使用多個訓練樣本訓練經過初始化的業務模型至模型參數收斂,得到初始業務模型;步驟303,基於對初始業務模型的修剪,確定初始業務模型的多個子模型,其中,各個子模型各自對應有透過以下方式重新訓練確定的模型參數以及模型指標:將修剪後的業務模型的模型參數重置為初始化的業務模型中的相應模型參數的初始值;將多個訓練樣本依次輸入修剪後的業務模型,並基於相應樣本標簽與修剪後的業務模型的輸出結果的對比,調整模型參數;步驟304,基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型。 首先,在步驟301,按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化選定的業務模型。 可以理解,對於選定的業務模型,為了能夠訓練模型,首先需要對模型參數進行初始化。也就是為各個模型參數確定初始值。在選定的業務模型是類神經網路時,模型參數例如可以是各個神經元的權重、常數參數、輔助矩陣等等中的至少一項。在選定的業務模型是決策樹時,模型參數例如是各個節點的權重參數、節點之間的連接關係及連接權重等等。在選定的業務模型是其他形式的機器學習模型時,模型參數還可以是其他參數,在此不再一一例舉。 這些模型參數的初始值可以按照預定方式確定,例如完全隨機取值、在預設區間內隨機取值、賦予設定值等等。有了這些初始值,當接收到業務資料,或者根據業務資料提取的相關特徵時,業務模型就可以給出相應業務預測結果,例如分類結果、評分結果等等。 接著,在步驟302中,使用多個訓練樣本訓練經過初始化的業務模型至模型參數收斂,得到初始業務模型。 由於經過步驟301的模型參數初始化之後,一旦接收到業務資料,選定的業務模型可以按照相應邏輯運行,給出相應的業務預測結果,如此就可以利用訓練樣本對初始化的業務模型進行訓練。各個訓練樣本可以對應有樣本業務資料,以及對應的樣本標簽。對初始化的業務模型的訓練過程例如可以是:依次將各條樣本業務資料輸入經過初始化的業務模型,根據業務模型輸出的業務預測結果與相應業務標簽的對比,調整模型參數。 經過一定數量的訓練樣本的調整之後,業務模型的每個模型參數變化將越來越小,直至趨近於某個值。也就是模型參數收斂。模型參數收斂可以透過各個模型參數的波動值來描述,也可以透過損失函數來描述。這是因為,損失函數通常是模型參數的函數,當損失函數收斂時,代表著模型參數收斂。例如當損失函數的最大變化值或者模型參數的波動小於預定閾值時,可以確定模型參數收斂。選定的業務模型完成當前階段訓練,得到的業務模型可以稱為初始業務模型。 這裡的初始業務模型訓練過程可以採用任何合適的方式進行,在此不再贅述。 然後,在步驟303,基於對初始業務模型的修剪,確定初始業務模型的多個子模型。可以理解,為了從初始業務模型中獲取可以代替初始業務模型的子模型,可以按照業務需求對初始業務模型進行修剪,從而得到多個初始模型的子模型。這些子模型又可以稱為候選模型。 值得說明的是,對初始業務模型的修剪可以是在初始業務模型的基礎上多次進行修剪,也可以是在修剪後的子模型基礎上疊加修剪,如前文對圖2示出的示例部分的描述,在此不再贅述。 對模型的修剪基於以下之一的方式,按照模型參數由小到大的順序進行:修剪掉預定比例(如20%)的模型參數、修剪掉預定數量(如1000個)的模型參數、修剪得到規模不超過預定大小(如1000兆位元組)的模型,等等。 可以理解,通常有至少一部分的模型參數,可以在一定程度上展現模型單元(如神經元、樹節點等)的重要程度,例如權重參數。在對業務模型進行修剪時,為了減少參數數量,可以修剪模型單元,也可以修剪模型單元之間的連接關係。下面參考圖4所示,以業務模型為類神經網路,模型單元為神經元為例進行時說明。 一種實施例可以透過減少預定數量或預定比例的模型單元來實現對模型的修剪。例如,在類神經網路的每個隱藏層修剪掉100個或10%的神經元。參考圖4所示,由於神經元的重要度需要透過不同隱藏層的神經元之間的表達關係(圖4中的連接線)對應的權重來描述,因此,可以利用權重參數的值來決定刪除哪些神經元。圖4示出的是一個類神經網路中的部分隱藏層的示意。圖4中,在第i隱藏層,假設虛線表示的神經元對應的與前一層神經元或向後一隱藏層神經元連接的連接線對應的權重參數都很小,那麼這個神經元的重要度比較小,可以被修剪掉。 另一種實施例可以透過減少預定數量或預定比例的連接邊來實現對模型的修剪。仍參考圖4所示,對於類神經網路中的各個連接邊(如神經元X1和第i隱藏層的虛線表示的神經元之間的連接邊),如果其對應的權重參數較小,則表明前一個神經元對應後一個神經元的重要度較低,可以將相應連接邊刪除。這樣的網路結構不再是原始的全連接結構,而是前一隱藏層的各個神經元只對後一隱藏層相對重要的神經元起作用,後一隱藏層的各個神經元只關注對其重要性較高的前一隱藏層的神經元。這樣,業務模型的規模也會變小。 在其他實施例中,還可以採用同時減少連接邊和模型單元的方式實現模型的修剪,在此不再贅述。修剪模型單元、修剪連接關係都是模型修剪的具體手段,本說明書對具體手段不做限定。透過這樣的修剪手段,可以實現修剪掉預定比例的模型參數、修剪掉預定數量的模型參數、修剪得到規模不超過預定大小的模型等等。 其中,具體修剪掉業務模型的多大一部分,可以根據預定的修剪規則或子模型的規模需求來確定。修剪規則例如可以為:子模型的規模為預定位元組數(如1000兆位元組)、子模型的規模為初始業務模型的預定比例(如70%)、修剪後的子模型規模與修剪前的模型規模成預定比例(如90%)、修剪掉權重小於預定權重閾值的連接邊等等。總之,修剪後的模型可以放棄重要度低的模型單元或者連接邊,保留重要度高的模型單元及連接邊。 在獲取子模型的過程中,一方面,剪切掉一部分後的初始業務模型的參數需要進一步調整,因此,需要對剪切後的模型進一步訓練。另一方面,需要驗證初始業務模型裁剪掉的部分是否自始不需要,因此,可以將修剪後的模型中的模型參數重置為初始化狀態,並利用多個訓練樣本進行訓練。訓練後的模型記為初始業務模型的子模型。 可以理解的是,由於初始業務模型在被訓練至收斂時停止,這樣,在修剪掉其中一部分時,可能誤刪重要的模型單元,造成模型性能下降等問題。因此,在訓練修剪後的模型時,得到的子模型性能是不確定的。例如,修剪掉一部分後的業務模型,如果誤刪了重要模型單元,可能會導致模型參數(或損失函數)不會收斂、收斂速度降低,或者模型準確度降低等。因此,還可以記錄各個子模型在訓練後相應的性能指標,例如準確度、模型大小、收斂性等等。 在本步驟303中,假設可以得到N個子模型。其中,N是一個正整數,其可以是預設的迭代次數(預定次數)、預設的子模型數量(預定數量),也可以是按照設定的修剪條件達到的數量。例如,在修剪後的子模型基礎上疊加修剪的情況下,越後得到的子模型規模越小,修剪條件可以為最後得到的子模型規模小於預定的規模閾值(如100兆位元組)。此時,可以在子模型規模小於預定規模時,結束修剪,得到的子模型數量N為實際得到的子模型數量。 接著,透過步驟304,基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型。 差分隱私(differential privacy)是密碼學中的一種手段,旨在提供一種當從統計資料庫查詢時,最大化資料查詢的準確性,同時最大限度減少識別其記錄的機會。設有隨機演算法M,PM為M所有可能的輸出構成的集合。對於任意兩個鄰近資料集D和D’以及PM的任何子集SM,若隨機演算法M滿足:Pr[M(D)∈SM]<=eε ×Pr[M(D’)∈SM],則稱演算法M提供ε-差分隱私保護,其中參數ε稱為隱私保護預算,用於平衡隱私保護程度和準確度。ε通常可以預先設定。ε越接近0,eε 越接近1,隨機演算法對兩個鄰近資料集D和D’的處理結果越接近,隱私保護程度越強。 在該步驟304中,相當於在壓縮率和模型指標之間進行平衡。差分隱私的經典實現例如拉普拉斯機制、指數機制等。通常。拉普拉斯機制可以用於為數值添加雜訊擾動,而對於數值擾動沒有意義的情況,更適合採用指數機制。這裡,從多個子模型中選擇出一個子模型作為目標業務模型,由於是對子模型的選擇,而非對子模型內部結構等進行處理,可以看作是對於數值擾動沒有意義的情況,可以優選採用指數機制進行。 下面作為一個具體示例,詳細介紹在差分隱私的第一方式為指數機制的情況下,如何利用差分隱私的第一方式從子模型中選擇出目標業務模型的過程。 步驟303中確定的N個子模型可以看作N個實體物件,每個實體物件對應一個數值ri ,其中i的取值範圍例如可以是1至N,各個數值ri 構成查詢函數的輸出值域R。這裡旨在從值域R中選擇一個ri ,將其對應的實體物件,即子模型作為目標業務模型。假設用D表示給定資料集(這裡可以理解為訓練樣本集),在指數機制下,函數q(D,ri )稱為輸出值ri 的可用性函數。 結合各個子模型,其可用性與模型指標密切相關。例如在模型指標包括相較於初始業務模型的壓縮率、在測試樣本集上的準確度的情況下,由於壓縮率越大子模型的規模越小,準確度越高表明子模型越理想,因此,在一個具體例子中,可用性函數可以與相應子模型i的壓縮率si 、準確度zi 正相關。這裡,可以將各個子模型分別對應的可用性函數的函數值記為相應子模型的可用性係數,例如:

Figure 02_image001
在其他具體例子中,模型指標可能包括召回率、F1分數等等,可用性函數也可以根據實際的模型指標具有其他合理表達,在此不再贅述。 在指數機制ε-差分隱私中,對於給定的隱私代價ε(預設的值,例如0.1),給定資料集D及可用性函數q(D,r),隱私保護機制A(D,q)在當且僅當下述表達式成立時,滿足ε-差分隱私:
Figure 02_image003
其中,
Figure 02_image005
表示正比於。Δq 可以為敏感因子,用於表示單一資料(上面的示例中的單個訓練樣本)的改變導致的可用性函數的最大改變值。這裡,由於準確度和壓縮率都在0到1之間取值,因此,單一資料改變時,q的最大改變為1,也就是說Δq 取1。在其他實施例中,q的表達方式不同,Δq 可以根據其他方式來確定,在此不作限定。 在一個具體例子中,隱私保護機制A可以為按照採樣機率進行採樣的機制,子模型i對應的採樣機率可以記為
Figure 02_image007
。例如,第i個子模型的採樣機率可以為:
Figure 02_image009
其中,j表示任一個子模型。這樣,在對各個子模型進行採樣的採樣機率中引入差分隱私的指數機制,按照各個子模型對應的被採樣到的採樣機率,可以在值域R中採樣(即在各個子模型中採樣)。 採樣時,根據一個具體例子,可以將0-1之間的數劃分為與值域R中的數值個數(子模型數量)一致的子區間,每個子區間的長度與上述採樣機率對應。當使用預先選定的隨機演算法生成0-1之間的一個隨機數時,將隨機數所在區間對應的值域R中的某個數值(對應一個子模型)作為採樣到的目標值。該目標值對應的子模型可以作為目標業務模型。根據另一個具體例子,值域R為連續數值區間,可以按照採樣機率劃分為長度與相應子模型的採樣機率正相關的子區間,則直接在至於R上隨機取值,所取值落入的區間對應的子模型就可以作為目標業務模型。 可以理解的是,這裡透過差分隱私中的指數機制,按照採樣機率完成對子模型的採樣,對從子模型中選擇目標業務模型增加了隨機性。由此,難以根據初始業務模型推測出子模型的具體結構,使得目標業務模型難以推測實現對目標業務模型和業務資料的隱私保護。 可以理解,在確定目標業務模型的過程中,各個子模型經過初步的訓練,以從中挑選指出合適的子模型,作為最終的子模型,來避免對龐大的初始業務模型進行完全訓練之後大量刪除模型參數導致的大量計算。因此,所選擇的目標業務模型可以進一步訓練,以更好地用於針對給定的業務資料,進行業務預測,得到業務預測結果(例如評分結果、分類結果等)。 對目標業務模型的一個訓練過程例如為:將各個訓練樣本輸入選擇出的目標業務模型,並根據輸出結果和樣本標簽的對比,調整模型參數。 通常,輸出結果和樣本標簽的對比,在輸出結果為數值的情況下,可以透過諸如差值、差值的絕對值之類方式衡量損失,在輸出結果為向量或多個數值的情況下,可以透過諸如變異數、歐氏距離之類的方式衡量損失。在得到損失之後,可以以最小化損失為目標調整模型參數。該過程中還可以採用一些最佳化演算法,以加快模型參數(或損失函數)的收斂速度。例如採用梯度下降法等最佳化演算法。 根據一個可能的設計,為了進一步保護資料隱私,還可以透過在損失梯度中添加干擾雜訊的方式,引入差分隱私的方法,調整模型參數,以基於隱私保護訓練目標業務模型。此時,圖3示出的流程還可以包括以下步驟: 步驟305,利用多個訓練樣本對目標業務模型基於差分隱私的第二方式進行訓練,使得訓練後的目標業務模型用於針對給定的業務資料進行業務預測。差分隱私的實現方式有很多,這裡引入差分隱私的目的在於為資料添加雜訊,例如可以透過高斯雜訊、拉普拉斯雜訊等方式實現,在此不做限定。 在一個實施方式中,針對輸入目標業務模型的第一批樣本,可以透過以下步驟調整模型參數:首先,確定第一批樣本所對應的損失的原始梯度;接著向該原始梯度添加用於實現差分隱私的雜訊,得到包含雜訊的梯度;然後,利用包含雜訊的梯度,調整目標業務模型的模型參數。可以理解,這裡的第一批樣本可以是一個訓練樣本,也可以是多個訓練樣本。在第一批樣本包含多個訓練樣本的情況下,第一批樣本對應的損失可以是多個訓練樣本對應的損失和、平均損失等。 作為一個示例,假設針對上述第一批樣本,得到的第一原始梯度為:
Figure 02_image011
其中,
Figure 02_image013
表示當前為第
Figure 02_image013
輪次的迭代訓練,
Figure 02_image015
表示第一批樣本中的第
Figure 02_image017
個樣本,
Figure 02_image019
表示第
Figure 02_image013
輪中第
Figure 02_image017
個樣本的損失梯度,
Figure 02_image021
表示第
Figure 02_image013
輪訓練開始時的模型參數,
Figure 02_image023
表示第i個樣本對應的損失函數。 如前所述,對上述原始梯度添加實現差分隱私的雜訊,可以透過諸如拉普拉斯雜訊、高斯雜訊等方式實現。 在一個實施例中,以差分隱私的第二方式為高斯雜訊為例,可以基於預設的裁剪閾值,對原始梯度進行梯度裁剪,得到裁剪梯度,再基於該裁剪閾值和預定的雜訊縮放係數(預先設定的超參),確定用於實現差分隱私的高斯雜訊,然後將裁剪梯度與高斯雜訊融合(例如求和),得到包含雜訊的梯度。可以理解的是,該第二方式一方面對原始梯度進行裁剪,另一方面將裁剪後的梯度疊加,從而對損失梯度進行滿足高斯雜訊的差分隱私處理。 例如,將原始梯度進行梯度裁剪為:
Figure 02_image025
其中,
Figure 02_image027
表示對第
Figure 02_image013
輪中第
Figure 02_image017
個樣本裁剪後的梯度,
Figure 02_image029
表示裁剪閾值,
Figure 02_image031
表示
Figure 02_image019
的二階範數。也就是說,在梯度小於或等於裁剪閾值
Figure 02_image029
的情況下,保留原始梯度,而梯度大於裁剪閾值
Figure 02_image029
的情況下,將原始梯度按照大於裁剪閾值
Figure 02_image029
的比例裁剪到相應大小。 為裁剪後的梯度添加高斯雜訊,得到包含雜訊的梯度,例如為:
Figure 02_image033
其中,
Figure 02_image035
表示第一批樣本所包含的樣本數量,
Figure 02_image037
表示第
Figure 02_image013
輪中
Figure 02_image035
個樣本對應的包含雜訊的梯度;
Figure 02_image039
表示機率密度符合以0為均值、
Figure 02_image041
為變異數的高斯分佈的高斯雜訊;
Figure 02_image043
表示上述雜訊縮放係數,為預先設定的超參,可以按需設定;
Figure 02_image029
為上述裁剪閾值;
Figure 02_image045
表示指示函數,可以取0或1,比如,可以設定在多輪訓練中的偶數輪次取1,而奇數輪次取0。上式中,第一批樣本包含多個訓練樣本時,包含雜訊的梯度為對這多個訓練樣本的原始梯度裁剪後的平均裁剪梯度上疊加高斯雜訊。當第一批樣本僅包含一個訓練樣本時,上式中包含雜訊的梯度為對該訓練樣本的原始梯度裁剪後疊加高斯雜訊。 於是,使用添加高斯雜訊後的梯度,仍以最小化所述樣本i對應的損失為目標,模型參數可以按照以下方式調整為:
Figure 02_image047
其中,
Figure 02_image049
表示第
Figure 02_image013
輪的學習步長,或者說學習率,為預先設定的超參數,例如為0.5、0.3等;
Figure 02_image051
表示經過第
Figure 02_image013
輪(包含第一批樣本)訓練得到的調整後模型參數。在梯度添加高斯雜訊滿足差分隱私的情況下,模型參數的調整滿足差分隱私。 據此,經過多輪迭代訓練後,可以得到基於差分隱私的目標業務模型。由於模型訓練過程中加入了高斯雜訊,因此,難以從目標業務模型所呈現出來的資料推測模型結構或者反推業務資料,如此,可以進一步提高隱私資料保護的有效性。 訓練後的目標業務模型可以用於,針對給定的業務資料,進行相應業務預測。這裡的業務資料是和訓練樣本類型一致的業務資料,例如用戶的金融相關資料,可以透過目標業務模型進行用戶借貸風險性預測 回顧以上過程,本說明書實施例提供的基於隱私保護確定目標業務模型的方法,先對選定的複雜業務模型進行初始訓練,得到初始業務模型,然後對初始業務模型進行修剪,並對修剪後的業務模型在參數重置回初始化狀態的情形下進行訓練,以考驗修剪掉的模型參數是否自始不重要。對於得到的多個子模型,透過差分隱私的方式,從中選擇目標業務模型。這樣,可以獲取隱私保護的壓縮模型,在實現模型壓縮的基礎上,為模型提供隱私保護。 根據另一方面的實施例,還提供一種基於隱私保護確定目標業務模型的裝置。其中,這裡的業務模型可以是用於針對給定的業務資料,進行諸如分類、評分等業務處理的模型。這裡的業務資料可以是文字、圖像、語音、影片、動畫等各種類型的資料。該裝置可以設置於具有一定計算能力的系統、設備、裝置、平臺或伺服器。 圖5示出根據一個實施例的基於隱私保護確定目標業務模型的裝置的示意性方塊圖。如圖5所示,裝置500包括: 初始化單元51,配置為按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化選定的業務模型; 初始訓練單元52,配置為使用多個訓練樣本訓練經過初始化的選定的業務模型至模型參數收斂,得到初始業務模型; 修剪單元53,配置為基於對初始業務模型的修剪,確定初始業務模型的多個子模型,其中,各個子模型各自對應有透過初始化單元51和初始訓練單元52重新訓練確定的模型參數以及模型指標:初始化單元51將修剪後的業務模型的模型參數重置為初始化的業務模型中的相應模型參數的初始值;初始訓練單元52將多個訓練樣本依次輸入修剪後的業務模型,並基於相應樣本標簽與修剪後的業務模型的輸出結果的對比,調整模型參數; 確定單元54,配置為基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型。 根據一個實施方式,修剪單元53進一步可以配置為: 按照初始業務模型的模型參數,對初始業務模型進行修剪,得到第一修剪模型; 將對應有經過重新訓練得到的模型參數的第一修剪模型,作為第一子模型; 迭代修剪第一子模型得到後續子模型,直至滿足結束條件。 在一個實施例中,上述結束條件可以包括,迭代次數達到預定次數、子模型數量達到預定數量、最後一個子模型的規模小於設定的規模閾值等等中的至少一項。 在一個可選的實現方式中,修剪單元53對模型的修剪基於以下之一的方式,按照模型參數由小到大的順序進行:修剪掉預定比例的模型參數、修剪掉預定數量的模型參數、修剪得到規模不超過預定大小的模型,等等。 根據一個可能的設計,差分隱私的第一方式為指數機制,確定單元54進一步可以配置為: 按照各個子模型各自對應的模型指標,確定各個子模型分別對應的各個可用性係數; 根據各個可用性係數,利用指數機制確定各個子模型分別對應的各個採樣機率; 按照各個採樣機率在多個子模型中採樣,將被採樣到的子模型作為目標業務模型。 在一個實施方式中,裝置500還可以包括隱私訓練單元55,配置為: 利用多個訓練樣本對目標業務模型基於差分隱私的第二方式進行訓練,使得訓練後的目標業務模型用於針對給定的業務資料進行保護資料隱私的業務預測。 在一個進一步的實施例中,多個訓練樣本包括第一批樣本,第一批樣本中的樣本i對應有經目標業務模型處理後得到的損失,隱私訓練單元55進一步配置為: 確定樣本i對應的損失的原始梯度; 利用差分隱私的第二方式在原始梯度上添加雜訊,得到包含雜訊的梯度; 利用包含雜訊的梯度,以最小化樣本i對應的損失為目標,調整目標業務模型的模型參數。 在一個更進一步的實施例中,差分隱私的第二方式為添加高斯雜訊,隱私訓練單元55還可以配置為: 基於預設的裁剪閾值,對原始梯度進行裁剪,得到裁剪梯度; 利用基於裁剪閾值確定的高斯分佈,確定用於實現差分隱私的高斯雜訊,其中,高斯分佈的變異數與裁剪閾值的平方正相關; 將高斯雜訊與裁剪梯度疊加,得到包含雜訊的梯度。 值得說明的是,圖5所示的裝置500是與圖3示出的方法實施例相對應的裝置實施例,圖3示出的方法實施例中的相應描述同樣適用於裝置500,在此不再贅述。 根據另一方面的實施例,還提供一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行結合圖3所描述的方法。 根據再一方面的實施例,還提供一種計算設備,包括儲存器和處理器,所述儲存器中儲存有可執行碼,所述處理器執行所述可執行碼時,實現結合圖3所述的方法。 本領域技術人員應該可以意識到,在上述一個或多個示例中,本說明書實施例所描述的功能可以用硬體、軟體、韌體或它們的任意組合來實現。當使用軟體實現時,可以將這些功能儲存在電腦可讀媒體中或者作為電腦可讀媒體上的一個或多個指令或代碼進行傳輸。 以上所述的具體實施方式,對本說明書的技術構思的目的、技術方案和有益效果進行了進一步詳細說明,所應理解的是,以上所述僅為本說明書的技術構思的具體實施方式而已,並不用於限定本說明書的技術構思的保護範圍,凡在本說明書實施例的技術方案的基礎之上,所做的任何修改、等同替換、改進等,均應包括在本說明書的技術構思的保護範圍之內。The following describes the solutions provided in this specification in conjunction with the drawings. Fig. 1 shows a schematic diagram of an implementation architecture according to the technical concept of this specification. Under the technical concept of this specification, the business model can be a machine learning model used to perform various business processing such as classification and scoring on business data. The business model shown in Figure 1 is implemented through a neural network. In practice, it can also be implemented in other ways, such as decision trees, linear regression, and so on. The business data can be at least one of multiple methods such as characters, audio files, images, and animations, and is determined according to specific business scenarios, which is not limited here. For example, the business model can be a machine learning model used by a lending platform to assist in evaluating the risk of a user’s lending business. The targeted business data can be a single user’s historical lending behavior data, default data, user portraits, etc. The business prediction result is the user’s Risk score. For another example, the business model can also be a model (such as a convolutional neural network) used to classify the target on the picture, the targeted business data can be various pictures, and the business prediction result can be, for example, the first target (such as small Car), second target (bicycle), other categories, etc. In particular, the implementation framework of this specification is especially suitable for situations where the business model is a more complex nonlinear model. The process of determining the target business model based on privacy protection may be a process of determining a simplified sub-model whose model indicators meet the requirements from a complex initial business model. Take the business model as a neural network as an example. As shown in Figure 1, the initial neural network can be a more complex neural network, which can include more features, weight parameters, and other parameters. (Such as constant parameters, auxiliary matrix), etc. The model parameters of the initial neural network can be initialized in a predetermined manner, such as random initialization, setting to predetermined values, and so on. Under this implementation framework, the initial neural network is trained through multiple training samples, until the model parameters (or loss function) of the initial neural network converge. After that, the initial neural network is pruned to obtain multiple sub-networks. In the process of pruning the neural network, it can be performed according to a predetermined parameter ratio (such as 20%), a predetermined parameter number (such as 1000), a predetermined scale (such as at least 20 megabytes), and so on. In conventional technology, the sub-network obtained by pruning the initial neural network usually adopts the method of continuing training, pruning again on the basis, and continuing training. In other words, it is a step-by-step compression process for the initial neural network. Under the concept of the embodiment of this specification, after the initial neural network is pruned, the pruned subnet is parameterized (restored to the initialization state), and the pruning network after resetting the parameters is trained . The purpose of this is to check whether the pruned neural network structure is unnecessary from the beginning. The conclusion of whether this is not necessary from the beginning can be demonstrated through the evaluation indicators of the model, such as accuracy, recall, and convergence. It is worth noting that the pruning of the neural network may include a process of removing a part of the neurons in the neural network and/or removing a part of the connections of the neurons. In an optional implementation manner, which neurons are to be discarded may be based on the weight parameters corresponding to the neurons. The weight parameter describes the importance of the neuron. Taking the fully connected neural network as an example, the weights corresponding to each neuron mapped to the next layer can be averaged, or the maximum value can be taken to obtain a reference Weights. Further discard (pruning) according to the reference weight of each neuron in ascending order. As shown in Figure 2, the subnet pruning process of a specific example under the implementation framework of this specification is given. In Figure 2, for the remaining part of the neural network after trimming, the model parameters are reset to the initialization state, and the training samples are used to retrain it to obtain the first subnet. At the same time, the network structure and evaluation indicators of the first subnet can be recorded. Then, as indicated by the arrow on the left, the steps to enter the trimming model begin to loop. According to the trained model parameters in the first subnet, the first subnet is pruned, and the model parameters of the pruned neural network are reset to the initial state, and the training samples are used to retrain it , As the second subnet. Continue to cycle along the left arrow, and so on, until you get the Nth subnet that meets the end condition. Wherein, the end condition here may be, for example, that the number of iterations reaches a predetermined number (such as the preset number N), the number of sub-models reaches a predetermined number (such as the preset number N), and the scale of the last sub-model is less than a set scale threshold (such as 100 megabytes, etc.) at least one item among others. In this way, multiple subnets of the initial neural network can be obtained. In some alternative implementations, the arrow on the left side of Figure 2 can return to the top, that is, after obtaining the first subnet, reinitialize the original neural network, train the reinitialized neural network, and Perform pruning, train the pruned subnet as the second subnet, and so on, until the Nth subnet is obtained. Among them, each sub-network can have a different scale, for example, the first sub-network is 80% of the initial neural network, the second sub-network is 60% of the initial neural network, and so on. In this way, each time the neural network is initialized, some randomization can also be performed, that is to say, random sampling is performed on the features or neurons each time, and a small part (such as 1%) of the features and initialization parameters are discarded. In order to cause small disturbances to the initial neural network, while ensuring that each initialization neural network is consistent with the initial neural network, there are small differences to test the effects of different neurons. Continue to refer to Figure 1. For each subnet, a subnet can be selected as the target neural network. According to one embodiment, in order to protect data privacy, each pruned sub-network can be regarded as a sub-network set of the initial neural network, and based on the principle of differential privacy, a sub-network is randomly selected as the target neural network . In this way, through the method of differential privacy, the target business model is determined based on privacy protection, which can better protect the privacy of the business model and/or business data, and improve the practicability of the target neural network. It is understandable that the implementation architecture shown in Figure 1 takes the business model as a neural network as an example. When the business model is other machine learning models, the neurons described above can also be replaced with other model elements, such as business models. When it is a decision tree, neurons can be replaced with tree nodes in the decision tree, and so on. The target neural network is used to make business predictions based on business data and obtain corresponding business prediction results. For example, according to the picture data, the business prediction result of the identified target category is obtained, and the business prediction result of the user's financial loan risk is obtained according to the user's behavior data, and so on. The following describes in detail the specific process of determining the target business model based on privacy protection. Fig. 3 shows a process of determining a target business model based on privacy protection according to an embodiment. Among them, the business model here can be a model used to perform business processing such as classification and scoring for a given business data. The business data here can be text, image, voice, video, animation and other types of data. The execution subject of this process can be a system, equipment, device, platform or server with certain computing capabilities. As shown in Figure 3, the method for determining a target business model based on privacy protection may include the following steps: Step 301: Determine the respective initial values of each model parameter for the selected business model in a predetermined manner, thereby initializing the selected business model; Step 302 , Use multiple training samples to train the initialized business model until the model parameters converge to obtain the initial business model; step 303, based on the pruning of the initial business model, determine multiple sub-models of the initial business model, where each sub-model corresponds to Retrain the determined model parameters and model indicators through the following methods: reset the model parameters of the pruned business model to the initial values of the corresponding model parameters in the initialized business model; input multiple training samples into the pruned business model in turn , And adjust the model parameters based on the comparison between the corresponding sample label and the output result of the pruned business model; step 304, based on the model indicators corresponding to each sub-model, use the first method of differential privacy to select the target from each sub-model Business model. First, in step 301, initial values corresponding to each model parameter are determined for the selected business model in a predetermined manner, so as to initialize the selected business model. It can be understood that for the selected business model, in order to be able to train the model, the model parameters need to be initialized first. That is, initial values are determined for each model parameter. When the selected business model is a neural network-like network, the model parameters may be, for example, at least one of the weight of each neuron, a constant parameter, an auxiliary matrix, and so on. When the selected business model is a decision tree, the model parameters are, for example, the weight parameters of each node, the connection relationship between the nodes, and the connection weight. When the selected business model is another form of machine learning model, the model parameters can also be other parameters, and we will not list them one by one here. The initial values of these model parameters can be determined in a predetermined manner, for example, a completely random value, a random value within a preset interval, a set value, and so on. With these initial values, when the business data is received, or the relevant features extracted from the business data, the business model can give the corresponding business prediction results, such as classification results, scoring results, and so on. Next, in step 302, a plurality of training samples are used to train the initialized business model until the model parameters converge to obtain the initial business model. After the model parameters are initialized in step 301, once the business data is received, the selected business model can run according to the corresponding logic and give the corresponding business prediction results, so that the initialized business model can be trained using the training samples. Each training sample can correspond to sample business data and corresponding sample labels. The training process of the initialized business model may be, for example, inputting each piece of sample business data into the initialized business model in turn, and adjusting the model parameters according to the comparison between the business prediction result output by the business model and the corresponding business label. After a certain number of training samples are adjusted, the change of each model parameter of the business model will become smaller and smaller until it approaches a certain value. That is, the model parameters converge. The convergence of model parameters can be described by the fluctuation value of each model parameter, or it can be described by the loss function. This is because the loss function is usually a function of the model parameters. When the loss function converges, it represents the convergence of the model parameters. For example, when the maximum change value of the loss function or the fluctuation of the model parameter is less than a predetermined threshold, it can be determined that the model parameter converges. The selected business model completes the current stage of training, and the obtained business model can be called the initial business model. The initial business model training process here can be performed in any suitable manner, and will not be repeated here. Then, in step 303, based on the pruning of the initial business model, multiple sub-models of the initial business model are determined. It can be understood that in order to obtain sub-models that can replace the initial business model from the initial business model, the initial business model can be pruned according to business requirements, so as to obtain multiple initial model sub-models. These sub-models can also be called candidate models. It is worth noting that the pruning of the initial business model can be performed multiple times on the basis of the initial business model, or it can be pruned on the basis of the pruned sub-model, as shown in the previous section on the example part shown in Figure 2. The description will not be repeated here. The pruning of the model is based on one of the following methods, in the order of model parameters from small to large: pruning a predetermined proportion (such as 20%) of model parameters, trimming a predetermined number (such as 1000) of model parameters, and trimming Models whose scale does not exceed a predetermined size (such as 1000 megabytes), etc. It can be understood that there are usually at least a part of model parameters, which can show the importance of model units (such as neurons, tree nodes, etc.) to a certain extent, such as weight parameters. When pruning the business model, in order to reduce the number of parameters, the model units can be pruned, and the connection relationship between the model units can also be pruned. As shown in Figure 4, the business model is taken as a neural network and the model unit is a neuron as an example. An embodiment can realize the pruning of the model by reducing a predetermined number or a predetermined proportion of model units. For example, pruning 100 or 10% of neurons in each hidden layer of the neural network. As shown in Figure 4, since the importance of neurons needs to be described by the weights corresponding to the expression relationships between neurons in different hidden layers (connecting lines in Figure 4), the value of the weight parameter can be used to determine the deletion Which neurons. Figure 4 shows a schematic diagram of a part of the hidden layer in a neural network. In Figure 4, in the i-th hidden layer, assuming that the neuron indicated by the dashed line corresponds to the connection line of the previous layer of neuron or the next hidden layer, the weight parameters corresponding to the connection line of the neuron of the next hidden layer are very small, then the importance of this neuron is compared Small and can be trimmed away. In another embodiment, the model can be trimmed by reducing a predetermined number or a predetermined proportion of connecting edges. Still referring to Figure 4, for each connection edge in the neural network (such as the connection edge between the neuron X1 and the dashed line of the i-th hidden layer), if the corresponding weight parameter is small, then It shows that the importance of the previous neuron corresponding to the next neuron is low, and the corresponding connecting edge can be deleted. Such a network structure is no longer the original fully connected structure, but each neuron in the previous hidden layer only works on the relatively important neurons in the latter hidden layer, and each neuron in the latter hidden layer only pays attention to it. The neurons of the previous hidden layer with higher importance. In this way, the scale of the business model will also become smaller. In other embodiments, the pruning of the model can also be achieved by reducing the connecting edges and model units at the same time, which will not be repeated here. Pruning model units and pruning connection relations are specific means of model pruning, and this specification does not limit the specific means. Through this pruning method, it is possible to trim off a predetermined proportion of model parameters, trim off a predetermined number of model parameters, trim a model whose scale does not exceed a predetermined size, and so on. Among them, how much part of the business model is specifically pruned can be determined according to the predetermined pruning rule or the scale requirement of the sub-model. The pruning rule can be, for example, the size of the sub-model is a predetermined number of bytes (such as 1000 megabytes), the size of the sub-model is a predetermined proportion of the initial business model (such as 70%), the size of the pruned sub-model and the trimming The previous model scale is a predetermined proportion (such as 90%), and the connected edges whose weight is less than the predetermined weight threshold are trimmed, and so on. In short, the trimmed model can abandon the model units or connecting edges with low importance, and retain the model units and connecting edges with high importance. In the process of obtaining the sub-model, on the one hand, the parameters of the initial business model after a part of the cut need to be further adjusted, therefore, further training of the cut model is required. On the other hand, it is necessary to verify whether the trimmed part of the initial business model is not needed from the beginning. Therefore, the model parameters in the trimmed model can be reset to the initialized state, and multiple training samples can be used for training. The trained model is recorded as a sub-model of the initial business model. It is understandable that because the initial business model stops when it is trained to converge, when trimming a part of it, important model units may be deleted by mistake, causing problems such as model performance degradation. Therefore, when training the pruned model, the performance of the sub-model obtained is uncertain. For example, if a part of the business model is trimmed, if important model units are mistakenly deleted, the model parameters (or loss function) may not converge, the convergence speed will decrease, or the model accuracy will decrease. Therefore, it is also possible to record the corresponding performance indicators of each sub-model after training, such as accuracy, model size, convergence and so on. In this step 303, it is assumed that N sub-models can be obtained. Where N is a positive integer, which can be a preset number of iterations (predetermined number), a preset number of sub-models (predetermined number), or a number reached according to a set trimming condition. For example, in the case of superimposing pruning on the basis of the pruned sub-model, the later the obtained sub-model is smaller in size, and the pruning condition may be that the finally obtained sub-model size is smaller than a predetermined size threshold (such as 100 megabytes). At this time, the pruning can be ended when the size of the sub-model is smaller than the predetermined size, and the number of sub-models obtained N is the number of sub-models actually obtained. Then, through step 304, based on the model indicators corresponding to each sub-model, the first method of differential privacy is used to select the target business model from each sub-model. Differential privacy is a means in cryptography, which aims to provide a way to maximize the accuracy of data query when querying from a statistical database, while minimizing the chance of identifying its records. There is a random algorithm M, and PM is a set of all possible outputs of M. For any two adjacent data sets D and D'and any subset SM of PM, if the random algorithm M satisfies: Pr[M(D)∈SM]<=e ε ×Pr[M(D')∈SM] , It is said that the algorithm M provides ε-differential privacy protection, where the parameter ε is called the privacy protection budget, which is used to balance the degree of privacy protection and accuracy. ε can usually be set in advance. The closer ε is to 0, the closer e ε is to 1, and the closer the processing results of the random algorithm to the two adjacent data sets D and D'are, the stronger the degree of privacy protection. In this step 304, it is equivalent to a balance between the compression ratio and the model index. Classical implementations of differential privacy include Laplace mechanism, exponential mechanism, etc. usually. The Laplace mechanism can be used to add noise perturbation to the value, but for the situation where the value perturbation is meaningless, the exponential mechanism is more suitable. Here, a sub-model is selected from multiple sub-models as the target business model. Since it is the selection of the sub-model, rather than the processing of the internal structure of the sub-model, it can be regarded as a situation that has no meaning for numerical disturbances and can be preferred Use exponential mechanism. As a specific example, the following describes in detail the process of how to use the first method of differential privacy to select the target business model from the sub-models when the first method of differential privacy is the exponential mechanism. The N sub-models determined in step 303 can be regarded as N physical objects, and each physical object corresponds to a value r i , where the value range of i can be, for example, 1 to N, and each value r i constitutes the output range of the query function R. The purpose here is to select a r i from the value range R, and use its corresponding physical object, that is, the sub-model as the target business model. Assuming that D is used to represent a given data set (it can be understood as a training sample set), under the exponential mechanism, the function q(D, r i ) is called the availability function of the output value r i. Combining each sub-model, its availability is closely related to model indicators. For example, when the model indicators include the compression rate compared to the initial business model and the accuracy on the test sample set, since the larger the compression rate, the smaller the scale of the sub-model, and the higher the accuracy, the more ideal the sub-model is. In a specific example, the availability function can be positively correlated with the compression ratio si and accuracy z i of the corresponding sub-model i. Here, the function value of the availability function corresponding to each sub-model can be recorded as the availability coefficient of the corresponding sub-model, for example:
Figure 02_image001
In other specific examples, the model indicators may include recall rate, F1 score, etc., and the usability function may also have other reasonable expressions based on the actual model indicators, which will not be repeated here. In the exponential mechanism ε-differential privacy, for a given privacy cost ε (preset value, such as 0.1), given data set D and availability function q(D, r), privacy protection mechanism A(D, q) The ε-differential privacy is satisfied if and only if the following expression holds:
Figure 02_image003
in,
Figure 02_image005
Represents proportional to. Δq can be a sensitivity factor, which is used to represent the maximum change value of the usability function caused by the change of a single data (a single training sample in the above example). Here, since both the accuracy and the compression ratio take values between 0 and 1, therefore, when a single data is changed, the maximum change of q is 1, that is to say, Δ q takes 1. In other embodiments, the expression of q is different, and Δ q can be determined in other ways, which is not limited here. In a specific example, the privacy protection mechanism A can be a mechanism for sampling according to the sampling probability, and the sampling probability corresponding to submodel i can be denoted as
Figure 02_image007
. For example, the sampling probability of the i-th submodel can be:
Figure 02_image009
Among them, j represents any sub-model. In this way, an exponential mechanism of differential privacy is introduced into the sampling probability of each sub-model. According to the sampling probability corresponding to each sub-model, sampling can be performed in the range R (ie, sampling in each sub-model). When sampling, according to a specific example, the number between 0-1 can be divided into sub-intervals consistent with the number of values (the number of sub-models) in the range R, and the length of each sub-interval corresponds to the aforementioned sampling probability. When a preselected random algorithm is used to generate a random number between 0-1, a certain value (corresponding to a sub-model) in the range R corresponding to the interval of the random number is used as the sampled target value. The sub-model corresponding to the target value can be used as the target business model. According to another specific example, the value range R is a continuous numerical interval, which can be divided into sub-intervals whose length is positively related to the sampling probability of the corresponding sub-model according to the sampling probability. The sub-model corresponding to the interval can be used as the target business model. It is understandable that the exponential mechanism in differential privacy is used to sample the sub-models according to the sampling probability, which adds randomness to the selection of the target business model from the sub-models. As a result, it is difficult to infer the specific structure of the sub-model based on the initial business model, making it difficult for the target business model to infer the privacy protection of the target business model and business data. It can be understood that in the process of determining the target business model, each sub-model undergoes preliminary training to select and point out the appropriate sub-model as the final sub-model to avoid a large number of deleted models after the huge initial business model is fully trained A large number of calculations caused by parameters. Therefore, the selected target business model can be further trained to be better used to make business predictions for the given business data, and obtain business prediction results (such as scoring results, classification results, etc.). A training process for the target business model is, for example, inputting each training sample to the selected target business model, and adjusting the model parameters according to the comparison between the output result and the sample label. Generally, the output result is compared with the sample label. When the output result is a numerical value, the loss can be measured by means such as the difference value and the absolute value of the difference value. When the output result is a vector or multiple values, the loss can be measured. Measure the loss through methods such as variance and Euclidean distance. After the loss is obtained, the model parameters can be adjusted with the goal of minimizing the loss. Some optimization algorithms can also be used in this process to speed up the convergence speed of the model parameters (or loss function). For example, optimization algorithms such as gradient descent are used. According to a possible design, in order to further protect data privacy, the method of differential privacy can be introduced by adding interference noise to the loss gradient, and the model parameters can be adjusted to train the target business model based on privacy protection. At this time, the process shown in FIG. 3 may further include the following steps: Step 305, using multiple training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for a given Business data for business forecasts. There are many ways to implement differential privacy. The purpose of introducing differential privacy here is to add noise to data. For example, it can be implemented through Gaussian noise, Laplacian noise, etc., which is not limited here. In one embodiment, for the first batch of samples input to the target business model, the model parameters can be adjusted through the following steps: first, determine the original gradient of the loss corresponding to the first batch of samples; then add to the original gradient to achieve the difference For privacy noise, the gradient containing the noise is obtained; then, the gradient containing the noise is used to adjust the model parameters of the target business model. It is understandable that the first batch of samples here can be one training sample or multiple training samples. In the case where the first batch of samples contains multiple training samples, the loss corresponding to the first batch of samples may be the sum of the losses corresponding to the multiple training samples, the average loss, and so on. As an example, suppose that for the first batch of samples above, the first original gradient obtained is:
Figure 02_image011
in,
Figure 02_image013
Indicates that the current
Figure 02_image013
Rounds of iterative training,
Figure 02_image015
Represents the first batch of samples
Figure 02_image017
Samples,
Figure 02_image019
Represents the first
Figure 02_image013
Round of
Figure 02_image017
The loss gradient of each sample,
Figure 02_image021
Represents the first
Figure 02_image013
The model parameters at the beginning of the round of training,
Figure 02_image023
Represents the loss function corresponding to the i-th sample. As mentioned earlier, adding noise to the original gradient to achieve differential privacy can be achieved through methods such as Laplace noise and Gaussian noise. In one embodiment, taking Gaussian noise as the second method of differential privacy as an example, gradient clipping may be performed on the original gradient based on the preset clipping threshold to obtain the clipping gradient, and then based on the clipping threshold and the predetermined noise scaling Coefficients (pre-set hyperparameters), determine the Gaussian noise used to achieve differential privacy, and then fuse the clipping gradient with the Gaussian noise (for example, sum) to obtain the gradient containing the noise. It is understandable that the second method cuts the original gradient on the one hand, and superimposes the cut gradient on the other hand, so as to perform differential privacy processing that satisfies Gaussian noise on the loss gradient. For example, the original gradient is gradient cropped as:
Figure 02_image025
in,
Figure 02_image027
Represents the
Figure 02_image013
Round of
Figure 02_image017
The cropped gradients of samples,
Figure 02_image029
Represents the clipping threshold,
Figure 02_image031
Express
Figure 02_image019
The second-order norm of. That is, when the gradient is less than or equal to the clipping threshold
Figure 02_image029
In the case of, keep the original gradient, and the gradient is greater than the clipping threshold
Figure 02_image029
In the case of, the original gradient is greater than the clipping threshold
Figure 02_image029
The proportion of cropped to the corresponding size. Add Gaussian noise to the cropped gradient to obtain a gradient containing noise, for example:
Figure 02_image033
in,
Figure 02_image035
Indicates the number of samples included in the first batch of samples,
Figure 02_image037
Represents the first
Figure 02_image013
In the wheel
Figure 02_image035
Gradient containing noise corresponding to each sample;
Figure 02_image039
Indicates that the probability density conforms to 0 as the mean value,
Figure 02_image041
Gaussian noise with Gaussian distribution of variance;
Figure 02_image043
Indicates the above-mentioned noise scaling factor, which is a pre-set super parameter, which can be set as required;
Figure 02_image029
Is the above clipping threshold;
Figure 02_image045
Indicates the indicator function, which can be 0 or 1. For example, it can be set to be 1 for even-numbered rounds in multiple rounds of training, and 0 for odd-numbered rounds. In the above formula, when the first batch of samples contains multiple training samples, the gradient containing noise is the average clipping gradient after clipping the original gradients of the multiple training samples and superimposing Gaussian noise. When the first batch of samples contains only one training sample, the gradient containing the noise in the above formula is the original gradient of the training sample and superimposed Gaussian noise. Therefore, using the gradient after adding Gaussian noise and still aiming at minimizing the loss corresponding to the sample i, the model parameters can be adjusted as follows:
Figure 02_image047
in,
Figure 02_image049
Represents the first
Figure 02_image013
The learning step length of the round, or the learning rate, is a pre-set hyperparameter, such as 0.5, 0.3, etc.;
Figure 02_image051
Indicates that after the first
Figure 02_image013
The adjusted model parameters obtained by training rounds (including the first batch of samples). In the case that the gradient adds Gaussian noise to meet the differential privacy, the adjustment of the model parameters meets the differential privacy. Accordingly, after multiple rounds of iterative training, a target business model based on differential privacy can be obtained. Since Gaussian noise is added in the model training process, it is difficult to infer the model structure or reverse the business data from the data presented by the target business model. In this way, the effectiveness of privacy data protection can be further improved. The trained target business model can be used to make corresponding business predictions for the given business data. The business data here is the business data consistent with the type of training sample, such as the user’s financial-related data. The user’s loan risk prediction can be performed through the target business model. Method, first perform initial training on the selected complex business model to obtain the initial business model, then trim the initial business model, and train the trimmed business model with the parameters reset back to the initial state to test the trimming It does not matter whether the parameters of the model are from the beginning. For the multiple sub-models obtained, the target business model is selected through differential privacy. In this way, a compression model for privacy protection can be obtained, and on the basis of implementing model compression, privacy protection is provided for the model. According to another embodiment, an apparatus for determining a target business model based on privacy protection is also provided. Among them, the business model here can be a model used to perform business processing such as classification and scoring for a given business data. The business data here can be text, image, voice, video, animation and other types of data. The device can be installed in a system, equipment, device, platform or server with certain computing capabilities. Fig. 5 shows a schematic block diagram of an apparatus for determining a target service model based on privacy protection according to an embodiment. As shown in FIG. 5, the device 500 includes: an initialization unit 51, configured to determine the respective initial values of each model parameter for the selected service model in a predetermined manner, so as to initialize the selected service model; an initial training unit 52, configured to use multiple Training the selected business model after initialization until the model parameters converge to obtain the initial business model; the pruning unit 53 is configured to determine multiple sub-models of the initial business model based on the pruning of the initial business model, wherein each sub-model has its own Corresponding to the model parameters and model indicators determined through the retraining of the initialization unit 51 and the initial training unit 52: the initialization unit 51 resets the model parameters of the pruned business model to the initial values of the corresponding model parameters in the initialized business model; The training unit 52 sequentially inputs multiple training samples into the pruned business model, and adjusts the model parameters based on the comparison between the corresponding sample label and the output result of the pruned business model; the determining unit 54 is configured to correspond to each sub-model Model indicators, using the first method of differential privacy to select the target business model from each sub-model. According to one embodiment, the pruning unit 53 may be further configured to: prun the initial business model according to the model parameters of the initial business model to obtain the first pruning model; and will correspond to the first pruning model with the model parameters obtained after retraining, As the first sub-model; iteratively trim the first sub-model to obtain subsequent sub-models until the end condition is met. In an embodiment, the foregoing end condition may include at least one of the number of iterations reaching a predetermined number, the number of sub-models reaching a predetermined number, and the scale of the last sub-model is less than a set scale threshold, and so on. In an optional implementation manner, the pruning unit 53 trims the model based on one of the following methods, in descending order of model parameters: trimming the model parameters of a predetermined proportion, trimming a predetermined number of model parameters, Pruning to obtain a model whose scale does not exceed a predetermined size, and so on. According to a possible design, the first method of differential privacy is an exponential mechanism, and the determining unit 54 may be further configured to: determine each availability coefficient corresponding to each sub-model according to the model index corresponding to each sub-model; according to each availability coefficient, The index mechanism is used to determine the respective sampling probabilities of each sub-model; sampling in multiple sub-models according to each sampling probability, and the sampled sub-model is used as the target business model. In one embodiment, the device 500 may further include a privacy training unit 55 configured to: use a plurality of training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for a given Business forecasts to protect data privacy. In a further embodiment, the multiple training samples include the first batch of samples, and the sample i in the first batch of samples corresponds to the loss obtained after processing by the target business model, and the privacy training unit 55 is further configured to: determine that the sample i corresponds to Use the second method of differential privacy to add noise to the original gradient to obtain a gradient containing noise; Use the gradient containing noise to minimize the loss corresponding to sample i, and adjust the target business model Model parameters. In a further embodiment, the second method of differential privacy is to add Gaussian noise, and the privacy training unit 55 may also be configured to: crop the original gradient based on a preset cropping threshold to obtain the cropped gradient; The Gaussian distribution determined by the threshold determines the Gaussian noise used to achieve differential privacy, where the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; the Gaussian noise and the clipping gradient are superimposed to obtain a gradient containing noise. It is worth noting that the apparatus 500 shown in FIG. 5 is an apparatus embodiment corresponding to the method embodiment shown in FIG. 3, and the corresponding description in the method embodiment shown in FIG. 3 is also applicable to the apparatus 500. Go into details again. According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 3. According to another aspect of the embodiment, there is also provided a computing device, including a storage and a processor, the storage storing executable code, when the processor executes the executable code, the implementation described in conjunction with FIG. 3 Methods. Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the embodiments of this specification can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The specific implementations described above further describe the purpose, technical solutions, and beneficial effects of the technical concept of this specification in further detail. It should be understood that the above are only specific implementations of the technical concept of this specification, and It is not used to limit the protection scope of the technical concept of this specification. Any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of this specification shall be included in the protection scope of the technical concept of this specification within.

301~305:步驟 X1 :神經元 500:裝置 51:初始化單元 52:初始訓練單元 53:修剪單元 54:確定單元 55:隱私訓練單元301~305: Step X 1 : Neuron 500: Device 51: Initialization Unit 52: Initial Training Unit 53: Pruning Unit 54: Determination Unit 55: Privacy Training Unit

為了更清楚地說明本發明實施例的技術方案,下面將對實施例描述中所需要使用的圖式作簡單地介紹,顯而易見地,下面描述中的圖式僅僅是本發明的一些實施例,對於本領域具有通常知識者來講,在不付出創造性勞動的前提下,還可以根據這些圖式獲得其它的圖式。 [圖1]示出本說明書技術構思中基於隱私保護確定目標業務模型的實施架構示意圖; [圖2]示出一個具體例子中基於對初始類神經網路的修剪確定多個子網路的流程; [圖3]示出根據一個實施例的基於隱私保護確定目標業務模型的方法流程圖; [圖4]示出一個具體例子的對類神經網路修剪的示意圖; [圖5]示出根據一個實施例的基於隱私保護確定目標業務模型的裝置的示意性方塊圖。 In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those with ordinary knowledge in the field, they can also obtain other schemas based on these schemas without making creative efforts. [Figure 1] shows a schematic diagram of the implementation architecture of the target business model based on privacy protection in the technical concept of this specification; [Figure 2] shows the process of determining multiple subnets based on the pruning of the initial neural network in a specific example; [Figure 3] shows a flowchart of a method for determining a target business model based on privacy protection according to an embodiment; [Figure 4] A schematic diagram showing a specific example of pruning a class of neural networks; [Fig. 5] A schematic block diagram showing an apparatus for determining a target service model based on privacy protection according to an embodiment.

Claims (17)

一種基於隱私保護確定目標業務模型的方法,該目標業務模型用於處理給定的業務資料,得到相應的業務預測結果;該方法包括: 按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化該選定的業務模型; 使用多個訓練樣本訓練經過初始化的該選定的業務模型至模型參數收斂,得到初始業務模型; 基於對該初始業務模型的修剪,確定該初始業務模型的多個子模型,其中,各個子模型各自對應有透過以下方式重新訓練確定的模型參數以及模型指標:將修剪後的業務模型的模型參數重置為初始化的業務模型中的相應模型參數的初始值;將多個訓練樣本依次輸入修剪後的業務模型,並基於相應樣本標簽與修剪後的業務模型的輸出結果的對比,調整模型參數; 基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型,其中,在該差分隱私的第一方式為指數機制的情況下:按照各個子模型各自對應的模型指標,確定各個子模型分別對應的各個可用性係數;根據各個可用性係數,利用指數機制確定各個子模型分別對應的各個採樣機率;按照各個採樣機率在該多個子模型中採樣,將被採樣到的子模型作為目標業務模型。A method for determining a target business model based on privacy protection. The target business model is used to process given business data to obtain corresponding business prediction results; the method includes: Determine the respective initial values of each model parameter for the selected business model in a predetermined manner, thereby initializing the selected business model; Use multiple training samples to train the initialized selected business model until the model parameters converge to obtain the initial business model; Based on the pruning of the initial business model, multiple sub-models of the initial business model are determined, where each sub-model corresponds to the model parameters and model indicators determined by retraining in the following way: the model parameters of the pruned business model are reset Set to the initial value of the corresponding model parameter in the initialized business model; input multiple training samples into the pruned business model in turn, and adjust the model parameters based on the comparison of the corresponding sample label with the output result of the pruned business model; Based on the model indicators corresponding to each sub-model, the first method of differential privacy is used to select the target business model from each sub-model. Among them, when the first method of differential privacy is the exponential mechanism: according to each sub-model. Corresponding model indicators, determine the respective availability coefficients of each sub-model; according to each availability coefficient, use the index mechanism to determine the respective sampling probabilities of each sub-model; sampling in the multiple sub-models according to each sampling probability, will be sampled The obtained sub-model is used as the target business model. 根據請求項1所述的方法,其中,該基於對該初始業務模型的修剪,確定該初始業務模型的多個子模型包括: 按照該初始業務模型的模型參數,對該初始業務模型進行修剪,得到第一修剪模型; 將對應有經過重新訓練得到的模型參數的第一修剪模型,作為第一子模型; 迭代修剪該第一子模型得到後續子模型,直至滿足結束條件。The method according to claim 1, wherein the determining a plurality of sub-models of the initial business model based on the pruning of the initial business model includes: Pruning the initial business model according to the model parameters of the initial business model to obtain the first pruning model; Use the first pruning model corresponding to the model parameters obtained through retraining as the first sub-model; Iteratively trim the first sub-model to obtain subsequent sub-models until the end condition is met. 根據請求項2所述的方法,該結束條件包括,迭代次數達到預定次數、子模型數量達到預定數量、最後一個子模型的規模小於設定的規模閾值中的至少一項。According to the method of claim 2, the end condition includes at least one of the number of iterations reaching a predetermined number, the number of sub-models reaching the predetermined number, and the scale of the last sub-model being smaller than a set scale threshold. 根據請求項1或2所述的方法,其中,對模型的修剪基於以下之一的方式,按照模型參數由小到大的順序進行:修剪掉預定比例的模型參數、修剪掉預定數量的模型參數、修剪得到規模不超過預定大小的模型。The method according to claim 1 or 2, wherein the pruning of the model is based on one of the following methods, in descending order of model parameters: pruning the model parameters of a predetermined proportion, pruning the model parameters of a predetermined number , Pruning to obtain a model whose scale does not exceed the predetermined size. 根據請求項1所述的方法,其中,該方法還包括: 利用多個訓練樣本對該目標業務模型基於差分隱私的第二方式進行訓練,使得訓練後的目標業務模型用於針對給定的業務資料進行保護資料隱私的業務預測。The method according to claim 1, wherein the method further includes: Using a plurality of training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for business prediction to protect data privacy for the given business data. 根據請求項5所述的方法,其中,該多個訓練樣本包括第一批樣本,該第一批樣本中的樣本i對應有經該目標業務模型處理後得到的損失,該利用多個訓練樣本對該目標業務模型基於差分隱私的第二方式進行訓練包括: 確定該樣本i對應的損失的原始梯度; 利用該差分隱私的第二方式在該原始梯度上添加雜訊,得到包含雜訊的梯度; 利用該包含雜訊的梯度,以最小化該樣本i對應的損失為目標,調整該目標業務模型的模型參數。The method according to claim 5, wherein the multiple training samples include a first batch of samples, and sample i in the first batch of samples corresponds to a loss obtained after processing the target business model, and the multiple training samples are used Training the target business model based on the second method of differential privacy includes: Determine the original gradient of the loss corresponding to the sample i; Use the second method of differential privacy to add noise to the original gradient to obtain a gradient containing noise; Using the gradient containing noise, with the goal of minimizing the loss corresponding to the sample i, the model parameters of the target business model are adjusted. 根據請求項6所述的方法,其中,該差分隱私的第二方式為添加高斯雜訊,該利用該差分隱私的第二方式在該原始梯度上添加雜訊,得到包含雜訊的梯度包括: 基於預設的裁剪閾值,對該原始梯度進行裁剪,得到裁剪梯度; 利用基於該裁剪閾值確定的高斯分佈,確定用於實現差分隱私的高斯雜訊,其中,該高斯分佈的變異數與該裁剪閾值的平方正相關; 將該高斯雜訊與該裁剪梯度疊加,得到該包含雜訊的梯度。The method according to claim 6, wherein the second method of differential privacy is to add Gaussian noise, and the second method of using the differential privacy to add noise to the original gradient to obtain a gradient containing noise includes: Based on the preset clipping threshold, clipping the original gradient to obtain the clipping gradient; Using the Gaussian distribution determined based on the clipping threshold to determine the Gaussian noise used to achieve differential privacy, where the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; The Gaussian noise and the clipping gradient are superimposed to obtain the noise-containing gradient. 根據請求項1所述的方法,其中,該業務資料包括圖片、音檔、字符中的至少一項。The method according to claim 1, wherein the business data includes at least one of pictures, audio files, and characters. 一種基於隱私保護確定目標業務模型的裝置,該目標業務模型用於處理給定的業務資料,得到相應的業務預測結果;該裝置包括: 初始化單元,配置為按照預定方式為選定的業務模型確定各個模型參數分別對應的初始值,從而初始化該選定的業務模型; 初始訓練單元,配置為使用多個訓練樣本訓練經過初始化的該選定的業務模型至模型參數收斂,得到初始業務模型; 修剪單元,配置為基於對該初始業務模型的修剪,確定該初始業務模型的多個子模型,其中,各個子模型各自對應有透過該初始化單元和該初始訓練單元重新訓練確定的模型參數以及模型指標:該初始化單元將修剪後的業務模型的模型參數重置為初始化的業務模型中的相應模型參數的初始值;該初始訓練單元將多個訓練樣本依次輸入修剪後的業務模型,並基於相應樣本標簽與修剪後的業務模型的輸出結果的對比,調整模型參數; 確定單元,配置為基於各個子模型各自對應的模型指標,利用差分隱私的第一方式從各個子模型中選擇出目標業務模型,其中,在該差分隱私的第一方式為指數機制的情況下,該確定單元配置為:按照各個子模型各自對應的模型指標,確定各個子模型分別對應的各個可用性係數;根據各個可用性係數,利用指數機制確定各個子模型分別對應的各個採樣機率;按照各個採樣機率在該多個子模型中採樣,將被採樣到的子模型作為目標業務模型。A device for determining a target business model based on privacy protection. The target business model is used to process given business data to obtain corresponding business prediction results; the device includes: The initialization unit is configured to determine the respective initial values of each model parameter for the selected business model in a predetermined manner, so as to initialize the selected business model; The initial training unit is configured to use a plurality of training samples to train the initialized selected business model until the model parameters converge to obtain the initial business model; The pruning unit is configured to determine multiple sub-models of the initial business model based on the pruning of the initial business model, wherein each sub-model corresponds to model parameters and model indicators determined through retraining of the initialization unit and the initial training unit : The initialization unit resets the model parameters of the pruned business model to the initial values of the corresponding model parameters in the initialized business model; the initial training unit sequentially inputs multiple training samples into the pruned business model and is based on the corresponding samples Compare the output results of the label and the pruned business model, and adjust the model parameters; The determining unit is configured to select the target business model from each sub-model by using the first method of differential privacy based on the model indicators corresponding to each sub-model, wherein, in the case that the first method of differential privacy is an exponential mechanism, The determining unit is configured to: determine the respective availability coefficients of each sub-model according to the respective model indicators of each sub-model; use the index mechanism to determine the respective sampling probabilities of each sub-model according to the respective availability coefficients; according to the respective sampling probabilities Sample the multiple sub-models, and use the sampled sub-model as the target business model. 根據請求項9所述的裝置,其中,該修剪單元進一步配置為: 按照該初始業務模型的模型參數,對該初始業務模型進行修剪,得到第一修剪模型; 將對應有經過重新訓練得到的模型參數的第一修剪模型,作為第一子模型; 迭代修剪該第一子模型得到後續子模型,直至滿足結束條件。The device according to claim 9, wherein the trimming unit is further configured to: Pruning the initial business model according to the model parameters of the initial business model to obtain the first pruning model; Use the first pruning model corresponding to the model parameters obtained through retraining as the first sub-model; Iteratively trim the first sub-model to obtain subsequent sub-models until the end condition is met. 根據請求項10所述的裝置,該結束條件包括,迭代次數達到預定次數、子模型數量達到預定數量、最後一個子模型的規模小於設定的規模閾值中的至少一項。According to the device according to claim 10, the end condition includes at least one of the number of iterations reaching a predetermined number, the number of sub-models reaching the predetermined number, and the scale of the last sub-model being smaller than a set scale threshold. 根據請求項9或10所述的裝置,其中,該修剪單元對模型的修剪基於以下之一的方式,按照模型參數由小到大的順序進行:修剪掉預定比例的模型參數、修剪掉預定數量的模型參數、修剪得到規模不超過預定大小的模型。The device according to claim 9 or 10, wherein the pruning of the model by the pruning unit is based on one of the following methods, in descending order of model parameters: trimming a predetermined proportion of model parameters, trimming a predetermined number The model parameters and pruning to obtain a model whose scale does not exceed the predetermined size. 根據請求項9所述的裝置,其中,該裝置還包括隱私訓練單元,配置為: 利用多個訓練樣本對該目標業務模型基於差分隱私的第二方式進行訓練,使得訓練後的目標業務模型用於針對給定的業務資料進行保護資料隱私的業務預測。The device according to claim 9, wherein the device further includes a privacy training unit configured to: Using a plurality of training samples to train the target business model based on the second method of differential privacy, so that the trained target business model is used for business prediction to protect data privacy for the given business data. 根據請求項13所述的裝置,其中,該多個訓練樣本包括第一批樣本,該第一批樣本中的樣本i對應有經該目標業務模型處理後得到的損失,該隱私訓練單元進一步配置為: 確定該樣本i對應的損失的原始梯度; 利用該差分隱私的第二方式在該原始梯度上添加雜訊,得到包含雜訊的梯度; 利用該包含雜訊的梯度,以最小化該樣本i對應的損失為目標,調整該目標業務模型的模型參數。The device according to claim 13, wherein the multiple training samples include a first batch of samples, and sample i in the first batch of samples corresponds to a loss obtained after processing the target business model, and the privacy training unit is further configured for: Determine the original gradient of the loss corresponding to the sample i; Use the second method of differential privacy to add noise to the original gradient to obtain a gradient containing noise; Using the gradient containing noise, with the goal of minimizing the loss corresponding to the sample i, the model parameters of the target business model are adjusted. 根據請求項14所述的裝置,其中,該差分隱私的第二方式為添加高斯雜訊,該隱私訓練單元進一步配置為: 基於預設的裁剪閾值,對該原始梯度進行裁剪,得到裁剪梯度; 利用基於該裁剪閾值確定的高斯分佈,確定用於實現差分隱私的高斯雜訊,其中,該高斯分佈的變異數與該裁剪閾值的平方正相關; 將該高斯雜訊與該裁剪梯度疊加,得到該包含雜訊的梯度。The device according to claim 14, wherein the second method of differential privacy is adding Gaussian noise, and the privacy training unit is further configured to: Based on the preset clipping threshold, clipping the original gradient to obtain the clipping gradient; Using the Gaussian distribution determined based on the clipping threshold to determine the Gaussian noise used to achieve differential privacy, where the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; The Gaussian noise and the clipping gradient are superimposed to obtain the noise-containing gradient. 一種電腦可讀儲存媒體,其上儲存有電腦程式,當該電腦程式在電腦中執行時,令電腦執行請求項1-8中任一項的所述的方法。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in any one of the request items 1-8. 一種計算設備,包括儲存器和處理器,其特徵在於,該儲存器中儲存有可執行碼,該處理器執行該可執行碼時,實現請求項1-8中任一項所述的方法。A computing device includes a storage and a processor, and is characterized in that executable code is stored in the storage, and when the processor executes the executable code, the method described in any one of request items 1-8 is implemented.
TW110110604A 2020-04-10 2021-03-24 Method and device for determining target business model based on privacy protection TWI769754B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010276685.8A CN111177792B (en) 2020-04-10 2020-04-10 Method and device for determining target business model based on privacy protection
CN202010276685.8 2020-04-10

Publications (2)

Publication Number Publication Date
TW202139045A true TW202139045A (en) 2021-10-16
TWI769754B TWI769754B (en) 2022-07-01

Family

ID=70655223

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110110604A TWI769754B (en) 2020-04-10 2021-03-24 Method and device for determining target business model based on privacy protection

Country Status (3)

Country Link
CN (2) CN113515770A (en)
TW (1) TWI769754B (en)
WO (1) WO2021204272A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515770A (en) * 2020-04-10 2021-10-19 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection
CN111368337B (en) * 2020-05-27 2020-09-08 支付宝(杭州)信息技术有限公司 Sample generation model construction and simulation sample generation method and device for protecting privacy
CN111475852B (en) * 2020-06-19 2020-09-15 支付宝(杭州)信息技术有限公司 Method and device for preprocessing data aiming at business model based on privacy protection
CN112214791B (en) * 2020-09-24 2023-04-18 广州大学 Privacy policy optimization method and system based on reinforcement learning and readable storage medium
CN114936650A (en) * 2020-12-06 2022-08-23 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model based on privacy protection
CN112561076B (en) * 2020-12-10 2022-09-20 支付宝(杭州)信息技术有限公司 Model processing method and device
CN112632607B (en) * 2020-12-22 2024-04-26 中国建设银行股份有限公司 Data processing method, device and equipment
CN112926090B (en) * 2021-03-25 2023-10-27 支付宝(杭州)信息技术有限公司 Business analysis method and device based on differential privacy
US20220318412A1 (en) * 2021-04-06 2022-10-06 Qualcomm Incorporated Privacy-aware pruning in machine learning
CN113221717B (en) * 2021-05-06 2023-07-18 支付宝(杭州)信息技术有限公司 Model construction method, device and equipment based on privacy protection
CN113420322B (en) * 2021-05-24 2023-09-01 阿里巴巴新加坡控股有限公司 Model training and desensitizing method and device, electronic equipment and storage medium
CN113268772B (en) * 2021-06-08 2022-12-20 北京邮电大学 Joint learning security aggregation method and device based on differential privacy
CN113486402A (en) * 2021-07-27 2021-10-08 平安国际智慧城市科技股份有限公司 Numerical data query method, device, equipment and storage medium
CN113923476B (en) * 2021-09-30 2024-03-26 支付宝(杭州)信息技术有限公司 Video compression method and device based on privacy protection
CN114185619B (en) * 2021-12-14 2024-04-05 平安付科技服务有限公司 Breakpoint compensation method, device, equipment and medium based on distributed operation
CN114338552B (en) * 2021-12-31 2023-07-07 河南信大网御科技有限公司 System for determining delay mimicry
CN114780999B (en) * 2022-06-21 2022-09-27 广州中平智能科技有限公司 Deep learning data privacy protection method, system, equipment and medium
CN115081024B (en) * 2022-08-16 2023-01-24 杭州金智塔科技有限公司 Decentralized business model training method and device based on privacy protection
CN116432039B (en) * 2023-06-13 2023-09-05 支付宝(杭州)信息技术有限公司 Collaborative training method and device, business prediction method and device
CN116805082B (en) * 2023-08-23 2023-11-03 南京大学 Splitting learning method for protecting private data of client
CN117056979B (en) * 2023-10-11 2024-03-29 杭州金智塔科技有限公司 Service processing model updating method and device based on user privacy data

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10586068B2 (en) * 2015-11-02 2020-03-10 LeapYear Technologies, Inc. Differentially private processing and database storage
CN107368752B (en) * 2017-07-25 2019-06-28 北京工商大学 A kind of depth difference method for secret protection based on production confrontation network
US11068605B2 (en) * 2018-06-11 2021-07-20 Grey Market Labs, PBC Systems and methods for controlling data exposure using artificial-intelligence-based periodic modeling
US11341281B2 (en) * 2018-09-14 2022-05-24 International Business Machines Corporation Providing differential privacy in an untrusted environment
US11556846B2 (en) * 2018-10-03 2023-01-17 Cerebri AI Inc. Collaborative multi-parties/multi-sources machine learning for affinity assessment, performance scoring, and recommendation making
CN109657498B (en) * 2018-12-28 2021-09-24 广西师范大学 Differential privacy protection method for top-k symbiotic mode mining in multiple streams
CN110084365B (en) * 2019-03-13 2023-08-11 西安电子科技大学 Service providing system and method based on deep learning
CN110719158B (en) * 2019-09-11 2021-11-23 南京航空航天大学 Edge calculation privacy protection system and method based on joint learning
CN110874488A (en) * 2019-11-15 2020-03-10 哈尔滨工业大学(深圳) Stream data frequency counting method, device and system based on mixed differential privacy and storage medium
CN113515770A (en) * 2020-04-10 2021-10-19 支付宝(杭州)信息技术有限公司 Method and device for determining target business model based on privacy protection

Also Published As

Publication number Publication date
CN111177792A (en) 2020-05-19
WO2021204272A1 (en) 2021-10-14
CN111177792B (en) 2020-06-30
TWI769754B (en) 2022-07-01
CN113515770A (en) 2021-10-19

Similar Documents

Publication Publication Date Title
WO2021204272A1 (en) Privacy protection-based target service model determination
US20190294975A1 (en) Predicting using digital twins
CN109816032B (en) Unbiased mapping zero sample classification method and device based on generative countermeasure network
EP3743859A1 (en) Systems and methods for preparing data for use by machine learning algorithms
JP2018129033A (en) Artificial neural network class-based pruning
CN113961759B (en) Abnormality detection method based on attribute map representation learning
CN110222149B (en) Time sequence prediction method based on news public sentiment
CN113220886A (en) Text classification method, text classification model training method and related equipment
Nazarenko et al. Features of application of machine learning methods for classification of network traffic (features, advantages, disadvantages)
WO2023124386A1 (en) Neural network architecture search method, apparatus and device, and storage medium
WO2022011553A1 (en) Feature interaction via edge search
CN115456043A (en) Classification model processing method, intent recognition method, device and computer equipment
CN115964568A (en) Personalized recommendation method based on edge cache
CN113987236A (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
CN116304518A (en) Heterogeneous graph convolution neural network model construction method and system for information recommendation
WO2023045378A1 (en) Method and device for recommending item information to user, storage medium, and program product
CN114265954B (en) Graph representation learning method based on position and structure information
CN110555161A (en) personalized recommendation method based on user trust and convolutional neural network
CN112231299B (en) Method and device for dynamically adjusting feature library
CN113872703A (en) Method and system for predicting multi-network metadata in quantum communication network
CN113792163B (en) Multimedia recommendation method and device, electronic equipment and storage medium
Guo Comparison Of Neural Network and Traditional Classifiers for Twitter Sentiment Analysis
CN117408736A (en) Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm
CN116628495A (en) Method and device for determining importance of data source, electronic equipment and storage medium
Lin et al. Research on Network Attack Vulnerability Prediction Based on Neural Network Model