TW202123043A

TW202123043A - Adversarial attack detection method and device

Info

Publication number: TW202123043A
Application number: TW109116402A
Authority: TW
Inventors: 宗志遠
Original assignee: 大陸商支付寶（杭州）信息技術有限公司
Priority date: 2019-12-06
Filing date: 2020-05-18
Publication date: 2021-06-16
Also published as: CN111046379B; CN111046379A; WO2021109695A1; TWI743787B

Abstract

The description discloses an adversarial attack detection method and device. The method comprises: acquiring an adversarial sample space of a target model; collecting input data of the target model when being called; determining whether or not the input data falls into the adversarial sample space; and calculating, according to a determination result, a detection parameter of the input data falling into the adversarial sample space during a detection period, and if the detection parameter meets a preset attack condition, determining that an adversarial attack against the target model has been detected. The solution enables effective detection of adversarial attacks, thus effectively reducing security risks such as privacy leakage and capital loss, and ensuring data security.

Description

Monitoring method and device for resisting attack

本說明書涉及人工智慧領域，尤其涉及一種對抗攻擊的監測方法及裝置。This specification relates to the field of artificial intelligence, and in particular to a monitoring method and device against attacks.

隨著人工智慧的不斷發展，機器學習模型越來越複雜，精確度越來越高。然而精確度越高的模型，強健性卻可能越差，即模型的穩健性越差，這就給攻擊製造了機會。以對抗攻擊為例，攻擊者對樣本進行細微的修改形成對抗樣本，並輸入模型，以使模型輸出錯誤的預測結果。對抗攻擊可能會帶來安全風險，例如，對於依靠人臉識別進行身份認證的場景，攻擊者構造了一對抗樣本並輸入人臉識別模型，若模型將該對抗樣本識別為某合法用戶，攻擊者就能夠通過身份認證，帶來私有資料洩露、資金損失等安全風險。With the continuous development of artificial intelligence, machine learning models are becoming more and more complex and more accurate. However, the higher the accuracy of the model, the worse the robustness, that is, the worse the robustness of the model, which creates opportunities for attacks. Taking the adversarial attack as an example, the attacker modifies the sample slightly to form an adversarial sample, and inputs the model to make the model output the wrong prediction result. Adversarial attacks may bring security risks. For example, for scenarios that rely on face recognition for identity authentication, the attacker constructs a confrontation sample and inputs the face recognition model. If the model recognizes the confrontation sample as a legitimate user, the attacker It will be able to pass the identity authentication, which brings security risks such as leakage of private information and loss of funds.

有鑑於此，本說明書提供一種對抗攻擊的監測方法和裝置。具體地，本說明書是透過如下技術方案實現的：一種對抗攻擊的監測方法，包括：獲取目標模型的對抗樣本空間；採集調用所述目標模型的輸入資料；判斷所述輸入資料是否落入所述對抗樣本空間；根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。一種對抗攻擊的監測裝置，包括：獲取單元，獲取目標模型的對抗樣本空間；採集單元，採集調用所述目標模型的輸入資料；判斷單元，判斷所述輸入資料是否落入所述對抗樣本空間；監測單元，根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。一種對抗攻擊的監測裝置，包括：處理器；用於儲存機器可執行指令的記憶體；其中，透過讀取並執行所述記憶體儲存的與對抗攻擊的監測邏輯對應的機器可執行指令，所述處理器被促使：獲取目標模型的對抗樣本空間；採集調用所述目標模型的輸入資料；判斷所述輸入資料是否落入所述對抗樣本空間；根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。本說明書一個實施例實現了，採集調用目標模型的輸入資料，判斷輸入資料是否落入所述目標模型的對抗樣本空間，並根據判斷結果計算監測週期內落入到對抗樣本空間的輸入資料的監測參數，若監測參數滿足攻擊條件，則確認監測到面向目標模型的對抗攻擊。上述方法不影響目標模型的正常使用，還可以及時監測到對抗攻擊，有效降低私有資料洩露、資金損失等安全風險。In view of this, this specification provides a monitoring method and device against attacks. Specifically, this specification is implemented through the following technical solutions: A method of monitoring against attacks, including: Obtain the adversarial sample space of the target model; Collecting and calling the input data of the target model; Determine whether the input data falls into the adversarial sample space; Calculate the monitoring parameters of the input data falling into the confrontation sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected. A monitoring device against attacks, including: The acquisition unit, which acquires the adversarial sample space of the target model; A collection unit, which collects and calls the input data of the target model; A judging unit, judging whether the input data falls into the adversarial sample space; The monitoring unit calculates the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected. A monitoring device against attacks, including: processor; Memory used to store machine executable instructions; Wherein, by reading and executing the machine executable instructions stored in the memory and corresponding to the anti-attack monitoring logic, the processor is prompted to: Obtain the adversarial sample space of the target model; Collecting and calling the input data of the target model; Determine whether the input data falls into the adversarial sample space; Calculate the monitoring parameters of the input data falling into the confrontation sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected. An embodiment of this specification realizes that the input data of calling the target model is collected, whether the input data falls into the confrontation sample space of the target model, and the monitoring of the input data falling into the confrontation sample space during the monitoring period is calculated according to the judgment Parameters, if the monitoring parameters meet the attack conditions, it is confirmed that the target-oriented model-oriented confrontation attack is detected. The above method does not affect the normal use of the target model, and can also detect counterattacks in a timely manner, effectively reducing security risks such as leakage of private data and capital loss.

這裡將詳細地對示例性實施例進行說明，其示例表示在附圖中。下面的描述涉及附圖時，除非另有表示，不同附圖中的相同數字表示相同或相似的要素。以下示例性實施例中所描述的實施方式並不代表與本說明書相一致的所有實施方式。相反，它們僅是與如所附申請專利範圍中所詳述的、本說明書的一些態樣相一致的裝置和方法的例子。在本說明書使用的術語是僅僅出於描述特定實施例的目的，而非旨在限制本說明書。在本說明書和所附申請專利範圍中所使用的單數形式的“一種”、“所述”和“該”也旨在包括多數形式，除非上下文清楚地表示其他含義。還應當理解，本文中使用的術語“及/或”是指並包含一個或多個相關聯的列出項目的任何或所有可能組合。應當理解，儘管在本說明書可能採用術語第一、第二、第三等來描述各種資訊，但這些資訊不應限於這些術語。這些術語僅用來將同一類型的資訊彼此區分開。例如，在不脫離本說明書範圍的情況下，第一資訊也可以被稱為第二資訊，類似地，第二資訊也可以被稱為第一資訊。取決於語境，如在此所使用的詞語“如果”可以被解釋成為“在……時”或“當……時”或“響應於確定”。隨著人工智慧的不斷發展，研究者們不斷地設計出更深、更複雜的機器學習模型，以使模型輸出更準確的預測結果。然而隨著模型精確度的不斷提高，模型的強健性卻可能越來越差，這使得模型很容易遭受攻擊。以對抗攻擊為例，透過對樣本進行細微的修改形成對抗樣本，將對抗樣本輸入模型後能使模型輸出錯誤的預測結果。例如，在圖像識別模型中，這種細微的修改可以是對圖像增加一些具有干擾性的雜訊。將修改後的圖像輸入圖像識別模型後，圖像識別模型可能會把一張小狗的圖片識別為一輛汽車的圖片，導致輸出一個完全錯誤的識別結果。對抗攻擊可以存在於圖像識別、語音識別、文字識別等領域。在一些場景下，對抗攻擊可能會帶來安全風險。例如，對於依靠人臉識別進行身份認證的場景，攻擊者構造了一對抗樣本並輸入人臉識別模型，若人臉識別模型將該對抗樣本識別為某合法用戶，攻擊者就能夠通過身份認證，帶來私有資料洩露、資金損失等安全風險。本說明書提供了一種對抗攻擊的監測方法及裝置。圖1是本說明書一示例性實施例示出的一種對抗攻擊的監測的方法的流程示意圖。所述對抗攻擊的監測方法可以應用於具有處理器、記憶體的電子設備中，例如伺服器或伺服器集群等，本說明書對此不作特殊限制。請參考圖1，所述對抗攻擊的監測方法可以包括以下步驟：步驟101，獲取目標模型的對抗樣本空間。在本說明書中，在應用場景維度下，目標模型可以為語音識別模型、圖像識別模型、文字識別模型等；在模型結構維度下，目標模型可以為基於神經網路的模型等，本說明書對此不作特殊限制。在本說明書中，所述對抗樣本空間可以是在目標模型完成訓練後、正式上線前，經過預先計算得到的。當然，所述對抗樣本空間也可以在目標模型上線後計算得到，本說明書對此不作特殊限制。在本說明書中，可以透過攻擊測試得到對抗樣本，並根據對抗樣本生成對抗樣本空間。在一個例子中，所述攻擊測試可以為基於邊界攻擊的黑盒測試。邊界攻擊指的是先構造一個干擾性較大的對抗樣本以測試目標模型，並在保證對抗性的前提下不斷地降低樣本的干擾性，最終得到干擾性較小的對抗樣本。在實際應用中，在基於原始圖像生成對抗樣本時，可以先生成一個干擾性較大的對抗樣本。例如，可隨機更改原始圖像上的一些像素點的像素值，並將修改後的原始圖像輸入目標模型，若目標模型輸出誤判的預測結果，則將修改後的圖像作為對抗樣本。獲取對抗樣本後，可根據該對抗樣本的空間坐標和該原始圖像的空間坐標，在空間中以所述對抗樣本為起點，沿著靠近原始圖像的方向對所述對抗樣本進行隨機擾動，在保證該對抗樣本對抗性的前提下，不斷減小擾動後的對抗樣本與原始圖像的距離。例如，可將擾動後的對抗樣本輸入目標模型，若目標模型輸出錯誤的預測結果，說明該對抗樣本仍舊具有對抗性，則可進一步對該對抗樣本進行上述方向的隨機擾動，使得其更加靠近原始圖像，最終得到與原始圖像距離最近的對抗樣本，即得到使干擾性最小的對抗樣本。採用上述方法，可得到目標模型的多個對抗樣本。本說明書說，還可以透過其它方法構建對抗樣本，本說明書對此不作特殊限制。在另一個例子中，所述攻擊測試還可以為基於邊界攻擊的白盒測試。白盒測試的步驟參照上述黑盒測試的步驟，在此不再贅述。值得說明的是，白盒測試需要預先獲取完整的目標模型文件，所述目標模型文件可以包括目標模型的結構與參數等。在本說明書中，可以基於所述對抗樣本確定目標模型的對抗樣本空間。在一個例子中，可以確定目標模型每個對抗樣本的空間坐標，基於所述空間坐標確定目標模型的對抗樣本空間。以目標模型是圖像識別模型為例，假設一對抗樣本為像素是64*64的彩色圖像，所述對抗樣本具有64*64個像素點，每個像素點有3個像素值，該對抗樣本共有64*64*3= 12288個像素值，則該圖像識別模型的對抗樣本的空間坐標有12288個維度，即對抗樣本空間具有12288個維度，每個維度的取值分別為對抗樣本對應像素點的某像素值。例如，所述對抗樣本空間的第一個維度可代表對抗樣本第一個像素點的第1個像素值；所述對抗樣本空間的第二個維度可代表對抗樣本第一個像素點的第2個像素值；所述對抗樣本空間的第三個維度可代表對抗樣本第一個像素點的第3個像素值；所述對抗樣本空間的第四個維度可代表對抗樣本第二個像素點的第1個像素值……以此類推。基於所述對抗樣本的空間坐標對所述對抗樣本進行聚類，得到若干對抗樣本簇。聚類算法可以為K-Means算法、DBSCAN（Density-Based Spatial Clustering of Applications with Noise，基於密度的聚類算法）算法等，本說明書對此不作特殊限制。在本例中，可以將所述若干對抗樣本簇作為對抗樣本空間。在另一例子中，獲取了若干對抗樣本簇後，還可以為每個對抗樣本簇生成對應的凸包絡，並將生成的若干凸包絡作為對抗樣本空間。凸包絡的計算方法可以為Graham算法、Melkman算法、Andrew算法等，本說明書對此不作特殊限制。步驟102，採集調用所述目標模型的輸入資料。模型上線後，目標模型可對調用方提供API（Application Programming Interface，應用程式介面）介面，以使調用方根據API介面對目標模型進行調用。採集模型調用方調用模型時的輸入資料。例如，對於圖像識別模型，輸入資料可以為一張圖像；對於語音識別模型，輸入資料可以為一段語音。在一個例子中，可以實時採集目標模型的輸入資料。例如，可以監聽目標模型的調用，在監聽到目標模型被調用時，獲取調用方輸入的輸入資料。在另一個例子中，還可以預設的時間間隔，週期性地採集目標模型的歷史輸入資料，所述時間間隔可以為下述對抗攻擊的監測週期。值得說明的是，步驟101還可以在步驟102之後。例如，步驟102為週期性地採集目標模型的歷史輸入資料，則可在採集目標模型的歷史輸入資料之後，獲取目標模型的對抗樣本空間，再執行步驟103。步驟103，判斷所述輸入資料是否落入所述對抗樣本空間。在一個例子中，可以確定輸入資料的空間坐標，判斷所述空間坐標是否落入目標模型的對抗樣本空間。在一個例子中，可以將所述空間坐標輸入預設的擬合函數，然後根據輸出結果判斷所述空間坐標是否落入任意一個凸包絡。例如，所述空間坐標為x，所述擬合函數為F，則可將x輸入F得到F（x），若F（x）＜0，則確定落入凸包絡，否則，確定未落入凸包絡。若所述空間坐標落入了任意一個凸包絡，則所述空間坐標落入了目標模型的對抗樣本空間。在另一個例子中，還可以根據所述空間坐標計算所述輸入資料與各個對抗樣本簇的距離，判斷所述輸入資料與各個對抗樣本簇的距離是否小於預設的距離閾值。例如，可計算所述輸入資料與各個對抗樣本簇的中心點的距離作為所述輸入資料與對應對抗樣本簇的距離。若存在一個對抗樣本簇，使得所述輸入資料與該對抗樣本簇的距離小於所述預設的距離閾值，則確認所述輸入資料落入對抗樣本空間。所述距離閾值可預先確定。步驟104，根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。在一個例子中，監測參數為落入所述對抗樣本空間的輸入資料的數量，攻擊條件為所述數量達到數量閾值。在實際應用中，可以在預設的監測週期內，監測輸入資料落入對抗樣本空間的數量是否達到數量閾值。若達到數量閾值，確定監測到面向目標模型的對抗攻擊。所述數量閾值的確定方式可以為：將目標模型在若干歷史監測週期內，輸入資料落入對抗樣本空間的平均數量作為數量閾值。例如，假設監測週期是2小時，目標模型最近3天中每兩個小時內輸入資料落入對抗樣本空間的平均數量為200個，則可將200個作為數量閾值。值得注意的是，考慮到調用方在一天內不同時間段對目標模型的調用需求可能是不同的，還可以對監測週期進行差異化的數量閾值確定。再例如，考慮到誤差的存在，還可以將上述數量閾值乘以預設的誤差係數，將計算得到的數值作為最終的數量閾值。再例如，也可以人工設置所述數量閾值。在另一個例子中，監測參數還可以為落入所述對抗樣本空間的輸入資料的比例，攻擊條件可以為所述比例達到比例閾值。在實際應用中，可以在預設的監測週期內，監測輸入資料落入對抗樣本空間的數量占在該檢測週期內所有輸入資料的數量的比例是否達到比例閾值。若達到所述比例閾值，確認監測到面向目標模型的對抗攻擊。比例閾值的確定方式參考上述數量閾值，在此不再贅述。由以上描述可以看出，在本說明書的一個實施例中，可以先對目標模型進行攻擊測試，以得到目標模型的若干對抗樣本，將若干對抗樣本進行計算得到對抗樣本空間。在對目標模型進行對抗攻擊監測時，可以採集調用目標模型的輸入資料，判斷輸入資料是否落入預先計算得到的對抗樣本空間，並根據判斷結果計算監測週期內落入到對抗樣本空間的輸入資料的監測參數，若監測參數滿足攻擊條件，則認為監測到面向目標模型的對抗攻擊。本實施例所述方法，不影響目標模型的正常使用，還可以監測到對抗攻擊。圖2是本說明書一示例性實施例示出的另一種對抗攻擊的監測方法的流程示意圖。所述對抗攻擊的監測方法可以應用於具有處理器、記憶體的電子設備中，例如伺服器或伺服器集群等，本說明書對此不作特殊限制。請參考圖2，所述對抗攻擊的監測方法可以包括以下步驟：步驟201，獲取目標模型的對抗樣本空間。步驟202，採集調用所述目標模型的輸入資料。步驟203，判斷所述輸入資料是否落入所述對抗樣本空間。步驟204，根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。上述步驟201-步驟204請參見步驟101-步驟104，在此不再贅述。步驟205，發送告警資訊。當監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊後，還可以發送告警資訊。在一個例子中，告警資訊可以包括當前監測週期、落入到對抗空間的輸入資料的數量/比例等。例如，告警資訊可以為：“10分鐘內監測到可疑輸入資料223個，疑似存在對抗攻擊”。若所述落入對抗空間的輸入資料的數量仍在上升，則可以更新可疑輸入資料數量/比例，持續報警。在另一個例子中，告警資訊還可以包括輸入資料對應的目標模型調用方的標識，所述標識可以為調用方的ID、名稱、IP地址等。例如，告警資訊可以為：“10分鐘內監測到可疑輸入資料223個，疑似存在對抗攻擊。其中，80%的可疑輸入資料來自用戶A。”調用方標識資訊可透過目標模型調用過程中的調用日誌得到。在另一例子中，告警資訊還可以包括目標模型對落入對抗樣本空間的輸入資料的預測結果，以判斷對抗攻擊是否攻擊成功。例如，若攻擊者企圖將添加干擾後的非法用戶的圖像輸入目標模型，使目標模型輸出的預測結果為合法用戶，則告警資訊可以為：“10分鐘內監測到可疑輸入資料223個，疑似存在對抗攻擊。其中，220個輸入資料輸出結果為非法用戶，2個輸入資料的輸出結果為合法用戶。”則可以根據目標模型輸出的預測結果判斷對抗攻擊是否成功。由以上描述可以看出，在本說明書的另一個實施例中，監測到存在面向目標模型的對抗攻擊後，還可以發送告警資訊。告警資訊可以示出對抗攻擊的攻擊次數、攻擊結果，還可以追溯到攻擊源，後續可根據告警資訊採取一些措施來抵禦對抗攻擊。例如，攔截可疑調用方的調用等，進而有效降低私有資料洩露、資金損失等安全風險。下面結合一個具體的實施例對本說明書對抗攻擊的監測方法進行說明。所述對抗攻擊的監測方法可以應用於伺服器。請參考圖3、圖4，所述對抗攻擊的監測方法可以分為兩個流程：對目標模型進行攻擊測試，以得到對抗樣本空間；監測目標模型的輸入資料，以監測對抗攻擊。圖3是本說明書一示例性實施例示出的一種獲取目標模型對抗樣本空間的方法的流程示意圖。本實施例中，目標模型為用於用戶身份認證的人臉識別模型。步驟301，調用人臉識別模型。本實施例中，需要獲取人臉識別模型的調用方式的說明文檔及調用介面。步驟302，對所述人臉識別模型進行基於邊界攻擊的黑盒測試，以獲取若干對抗樣本。對人臉識別模型進行攻擊測試，本實施例中，攻擊測試為基於邊界攻擊的黑盒測試，先構造干擾性較大的人臉圖像作為對抗樣本並輸入人臉識別模型，透過人臉識別模型輸出的結果，在保證對抗性的前提下不斷地降低對抗樣本的干擾性，最終得到干擾性較小的若干對抗樣本。本實施例中，對抗樣本的干擾可以是在人臉圖像上的增加雜訊、調整特定像素點的像素值等。步驟303，基於所述對抗樣本確定人臉識別模型的對抗樣本空間，所述對抗樣本空間為凸包絡。確定若干對抗樣本的空間坐標，基於所述空間坐標以K-Means算法進行聚類，得到若干對抗樣本簇。基於Graham算法為每個對抗樣本簇生成對應的凸包絡，將生成的若干所述凸包絡作為人臉識別模型的對抗樣本空間。圖4是本說明書一示例性實施例示出的另一種對抗攻擊監測的方法的流程示意圖。步驟401，部署人臉識別模型。步驟402，獲取所述人臉識別模型的對抗樣本空間。步驟403，採集調用所述人臉識別模型的輸入圖像。本實施例中，實時採集人臉識別模型的輸入圖像。步驟404，判斷所述輸入圖像是否落入所述對抗樣本空間。本實施例中，計算所述輸入圖像的坐標，基於預設的擬合函數，判斷所述坐標是否落入任意一個凸包絡。步驟405，根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入圖像的比例。本實施例中，在預設的監測週期內，實時採集人臉識別模型的輸入圖像，每採集一張輸入圖像，則執行步驟404，若判斷結果為輸入圖像落入對抗樣本空間，則將可疑輸入圖像的計數+1，若判斷結果為輸入圖像未落入對抗樣本空間，則可將安全輸入圖像的計數+1。步驟406，若所述比例達到比例閾值，確定監測到面向所述人臉識別模型的對抗攻擊。本實施例中，比例閾值可根據人臉識別模型的歷史輸入資料得到，例如，統計得到：人臉識別模型在過去30天內，平均每小時輸入圖像落入凸包絡的比例為0.05。則將比例閾值確定為0.05，其中監測週期為1小時。在監測週期內，可以實時判斷輸入圖像落入凸包絡的比例是否大於比例閾值0.05。例如，可將步驟405中監測到的可疑輸入圖像的數量，除以可疑輸入圖像和安全輸入圖像的數量之和，判斷得到的可疑輸入圖像的比例是否大於0.05，若大於0.05，則確認監測到對抗攻擊。步驟407，發送告警資訊。在監測到對抗攻擊後，可以發送告警資訊。在本實施例中，告警資訊可以包括當前監測週期、落入到凸包絡的輸入圖像的比例、人臉識別模型調用方的標識等。下表示例性的示出了告警資訊的一種示例：

上表示出了在當前監測週期內，疑似對抗攻擊的輸入圖像比例、調用次數較多的調用方標識及相應的調用次數，全面地反映了人臉識別模型的在當前監測週期內的攻擊狀況。以上表告警資訊為例，當前監測週期內用戶A輸入的可疑輸入圖像最多，為預防對抗攻擊，後續可攔截用戶A的調用請求，例如，攔截用戶A在預設時間段內的調用請求。由以上描述可以看出，可採用說明書提供的對抗攻擊監測方法監測人臉識別模型的對抗攻擊，在確認監測到面向人臉識別模型的對抗攻擊時，可及時採取攔截調用等防禦策略，從而有效降低私有資料洩露、資金損失等安全風險。與前述對抗攻擊的監測方法的實施例相對應，本說明書還提供了對抗攻擊的檢測的裝置的實施例。本說明書對抗攻擊的檢測的裝置的實施例可以應用在伺服器上。裝置實施例可以透過軟體實現，也可以透過硬體或者軟硬體結合的方式實現。以軟體實現為例，作為一個邏輯意義上的裝置，是透過其所在伺服器的處理器將非揮發性記憶體中對應的計算機程式指令讀取到內存記憶體中運行形成的。從硬體層面而言，如圖5所示，為本說明書對抗攻擊的監測裝置所在伺服器的一種硬體結構圖，除了圖5所示的處理器、內存記憶體、網路介面、以及非揮發性記憶體之外，實施例中裝置所在的電子設備通常根據該伺服器的實際功能，還可以包括其他硬體，對此不再贅述。圖6是本說明書一示例性實施例示出的一種對抗攻擊的監測的裝置的框圖。請參考圖6，所述對抗攻擊的檢測的裝置600可以應用在前述圖5所示的伺服器中，包括有：獲取單元610、採集單元620、判斷單元630、監測單元640。其中，獲取單元610，獲取目標模型的對抗樣本空間；採集單元620，採集調用所述目標模型的輸入資料；判斷單元630，判斷所述輸入資料是否落入所述對抗樣本空間；監測單元640，根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。可選的，所述判斷單元630：確定所述輸入資料的空間坐標；判斷所述空間坐標是否落入任意凸包絡；若是，則確定所述輸入資料落入所述對抗樣本空間。可選的，所述判斷單元630：確定所述輸入資料的空間坐標；根據所述空間坐標，判斷所述輸入資料與對抗樣本簇的距離是否小於閾值；若是，則確定所述輸入資料落入所述對抗樣本空間。可選的，所述裝置還包括告警單元640，發送告警資訊。上述裝置中各個單元的功能和作用的實現過程具體詳見上述方法中對應步驟的實現過程，在此不再贅述。對於裝置實施例而言，由於其基本對應於方法實施例，所以相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的，其中所述作為分離部件說明的單元可以是或者也可以不是實體上分開的，作為單元顯示的部件可以是或者也可以不是實體單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模組來實現本說明書方案的目的。本領域普通技術人員在不付出創造性勞動的情況下，即可以理解並實施。上述實施例闡明的系統、裝置、模組或單元，具體可以由計算機芯片或實體實現，或者由具有某種功能的產品來實現。一種典型的實現設備為計算機，計算機的具體形式可以是個人計算機、膝上型計算機、蜂巢式電話、相機電話、智慧電話、個人數位助理、媒體播放器、導航設備、電子郵件收發設備、遊戲控制台、平板計算機、可穿戴設備或者這些設備中的任意幾種設備的組合。與前述對抗攻擊的監測方法的實施例相對應，本說明書還提供一種對抗攻擊的監測裝置，該裝置包括：處理器以及用於儲存機器可執行指令的記憶體。其中，處理器和記憶體通常借由內部匯流排相互連接。在其他可能的實現方式中，所述設備還可能包括外部介面，以能夠與其他設備或者部件進行通信。在本實施例中，透過讀取並執行所述記憶體儲存的與對抗攻擊的監測邏輯對應的機器可執行指令，所述處理器被促使：獲取目標模型的對抗樣本空間；採集調用所述目標模型的輸入資料；判斷所述輸入資料是否落入所述對抗樣本空間；根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。可選的，在確定所述目標模型的對抗樣本空間時，所述處理器被促使：對所述目標模型進行攻擊測試，以獲得所述目標模型的至少一個對抗樣本；基於所述對抗樣本，確定所述目標模型的對抗樣本空間。可選的，在進行所述攻擊測試時，所述處理器被促使：進行基於邊界攻擊的黑盒測試；或進行基於邊界攻擊的白盒測試。可選的，在基於所述對抗樣本，確定所述目標模型的對抗樣本空間時，所述處理器被促使：確定每個對抗樣本的空間坐標；基於所述空間坐標對所述對抗樣本進行聚類，得到若干對抗樣本簇；為每個對抗樣本簇生成對應的凸包絡，作為所述對抗樣本空間。可選的，在判斷所述輸入資料是否落入所述對抗樣本空間時，所述處理器被促使：確定所述輸入資料的空間坐標；判斷所述空間坐標是否落入任意凸包絡；若是，則確定所述輸入資料落入所述對抗樣本空間。可選的，在判斷所述輸入資料是否落入所述對抗樣本空間，所述處理器被促使：確定所述輸入資料的空間坐標；根據所述空間坐標，判斷所述輸入資料與任意對抗樣本簇的距離是否小於距離閾值；若是，則確定所述輸入資料落入所述對抗樣本空間。可選的，在確定監測到面向所述目標模型的對抗攻擊後，所述處理器還被促使：發送告警資訊。與前述對抗攻擊的監測方法的實施例相對應，本說明書還提供一種計算機可讀儲存媒體，所述計算機可讀儲存媒體上儲存有計算機程式，該程式被處理器執行時實現以下步驟：獲取目標模型的對抗樣本空間；採集調用所述目標模型的輸入資料；判斷所述輸入資料是否落入所述對抗樣本空間；根據判斷結果計算監測週期內落入到所述對抗樣本空間的輸入資料的監測參數，當所述監測參數滿足預設的攻擊條件時，確定監測到面向所述目標模型的對抗攻擊。可選的，所述目標模型的對抗樣本空間的確定方式，包括：對所述目標模型進行攻擊測試，以獲得所述目標模型的至少一個對抗樣本；基於所述對抗樣本，確定所述目標模型的對抗樣本空間。可選的，所述攻擊測試，包括：基於邊界攻擊的黑盒測試；或基於邊界攻擊的白盒測試。可選的，所述基於所述對抗樣本，確定所述目標模型的對抗樣本空間，包括：確定每個對抗樣本的空間坐標；基於所述空間坐標對所述對抗樣本進行聚類，得到若干對抗樣本簇；為每個對抗樣本簇生成對應的凸包絡，作為所述對抗樣本空間。可選的，所述判斷所述輸入資料是否落入所述對抗樣本空間，包括：確定所述輸入資料的空間坐標；判斷所述空間坐標是否落入任意凸包絡；若是，則確定所述輸入資料落入所述對抗樣本空間。可選的，所述判斷所述輸入資料是否落入所述對抗樣本空間，包括：確定所述輸入資料的空間坐標；根據所述空間坐標，判斷所述輸入資料與任意對抗樣本簇的距離是否小於距離閾值；若是，則確定所述輸入資料落入所述對抗樣本空間。可選的，所述監測參數為落入所述對抗樣本空間的輸入資料的數量，所述攻擊條件為所述數量達到數量閾值。可選的，所述監測參數為落入所述對抗樣本空間的輸入資料的比例，所述攻擊條件為所述比例達到比例閾值。可選的，所述確定監測到面向所述目標模型的對抗攻擊後，還包括：發送告警資訊。上述對本說明書特定實施例進行了描述。其它實施例在所附申請專利範圍的範圍內。在一些情況下，在申請專利範圍中記載的動作或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外，在附圖中描繪的過程不一定要求示出的特定順序或者連續順序才能實現期望的結果。在某些實施方式中，多任務處理和並行處理也是可以的或者可能是有利的。以上所述僅為本說明書的較佳實施例而已，並不用以限制本說明書，凡在本說明書的精神和原則之內，所做的任何修改、等同替換、改進等，均應包含在本說明書保護的範圍之內。The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with this specification. On the contrary, they are only examples of devices and methods consistent with some aspects of this specification as detailed in the scope of the appended application. The terms used in this specification are only for the purpose of describing specific embodiments, and are not intended to limit the specification. The singular forms of "a", "said" and "the" used in this specification and the scope of the appended applications are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information can also be referred to as second information, and similarly, the second information can also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination". With the continuous development of artificial intelligence, researchers continue to design deeper and more complex machine learning models to make the model output more accurate prediction results. However, as the accuracy of the model continues to improve, the robustness of the model may become worse and worse, which makes the model vulnerable to attacks. Take the adversarial attack as an example. The adversarial sample is formed by making subtle modifications to the sample, and inputting the adversarial sample into the model can make the model output wrong prediction results. For example, in the image recognition model, this subtle modification can add some disturbing noise to the image. After inputting the modified image into the image recognition model, the image recognition model may recognize a picture of a puppy as a picture of a car, resulting in a completely wrong recognition result. Confrontation attacks can exist in image recognition, speech recognition, text recognition and other fields. In some scenarios, adversarial attacks may bring security risks. For example, for a scenario that relies on face recognition for identity authentication, the attacker constructs a confrontation sample and inputs the face recognition model. If the face recognition model recognizes the confrontation sample as a legitimate user, the attacker can pass the identity authentication. Brings security risks such as leakage of private information and loss of funds. This manual provides a monitoring method and device against attacks. Fig. 1 is a schematic flowchart of a method for monitoring against an attack shown in an exemplary embodiment of the present specification. The monitoring method for countering attacks can be applied to electronic devices with processors and memories, such as servers or server clusters, etc. This specification does not impose special restrictions on this. Please refer to FIG. 1, the method for monitoring a confrontation attack may include the following steps: Step 101: Obtain a confrontation sample space of a target model. In this specification, in the application scenario dimension, the target model can be a speech recognition model, image recognition model, text recognition model, etc.; in the model structure dimension, the target model can be a neural network-based model, etc. There are no special restrictions. In this specification, the adversarial sample space may be obtained through pre-calculation after the target model is trained and before it is officially launched. Of course, the adversarial sample space can also be calculated after the target model is online, which is not particularly limited in this specification. In this specification, adversarial samples can be obtained through attack testing, and the adversarial sample space can be generated based on the adversarial samples. In one example, the attack test may be a black box test based on a border attack. Boundary attack refers to constructing a more intrusive adversarial sample to test the target model, and continuously reducing the interference of the sample under the premise of ensuring adversarial resistance, and finally obtaining a less intrusive adversarial sample. In practical applications, when generating a confrontation sample based on the original image, a confrontation sample with greater interference can be generated first. For example, the pixel values of some pixels on the original image can be randomly changed, and the modified original image can be input to the target model. If the target model outputs a misjudged prediction result, the modified image will be used as the adversarial sample. After obtaining the adversarial sample, according to the spatial coordinates of the adversarial sample and the spatial coordinates of the original image, the adversarial sample can be randomly perturbed in the space along the direction close to the original image with the adversarial sample as a starting point. On the premise of ensuring the antagonism of the adversarial example, the distance between the perturbed adversarial example and the original image is continuously reduced. For example, the perturbed adversarial sample can be input to the target model. If the target model outputs an incorrect prediction result, indicating that the adversarial sample is still adversarial, the adversarial sample can be further subjected to random perturbation in the above direction to make it closer to the original For the image, the confrontation sample with the closest distance to the original image is finally obtained, that is, the confrontation sample with the least interference is obtained. Using the above method, multiple adversarial examples of the target model can be obtained. This manual says that other methods can also be used to construct adversarial samples, and this manual does not impose special restrictions on this. In another example, the attack test may also be a white box test based on a border attack. The steps of the white box test refer to the steps of the above-mentioned black box test, which will not be repeated here. It is worth noting that the white box test needs to obtain a complete target model file in advance, and the target model file may include the structure and parameters of the target model. In this specification, the adversarial sample space of the target model can be determined based on the adversarial sample. In an example, the space coordinates of each confrontation sample of the target model can be determined, and the confrontation sample space of the target model can be determined based on the space coordinates. Taking the target model as an image recognition model as an example, suppose an adversarial example is a color image with 64*64 pixels. The adversarial example has 64*64 pixels, and each pixel has 3 pixel values. The adversarial example is a color image with 64*64 pixels. The sample has a total of 64*64*3= 12288 pixel values, then the spatial coordinates of the adversarial sample of the image recognition model have 12288 dimensions, that is, the adversarial sample space has 12288 dimensions, and the value of each dimension corresponds to the adversarial sample. A certain pixel value of a pixel. For example, the first dimension of the adversarial sample space may represent the first pixel value of the first pixel of the adversarial sample; the second dimension of the adversarial sample space may represent the second pixel of the first pixel of the adversarial sample. Pixel values; the third dimension of the adversarial sample space can represent the third pixel value of the first pixel of the adversarial sample; the fourth dimension of the adversarial sample space can represent the second pixel of the adversarial sample The first pixel value...and so on. Clustering the adversarial examples based on the spatial coordinates of the adversarial examples to obtain several adversarial example clusters. The clustering algorithm can be K-Means algorithm, DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based clustering algorithm) algorithm, etc., which are not particularly limited in this specification. In this example, the several adversarial sample clusters can be used as adversarial sample spaces. In another example, after obtaining several adversarial sample clusters, a corresponding convex envelope can be generated for each adversarial sample cluster, and the generated convex envelopes can be used as the adversarial sample space. The calculation method of the convex envelope can be Graham's algorithm, Melkman's algorithm, Andrew's algorithm, etc., which are not particularly limited in this specification. Step 102: Collect input data for calling the target model. After the model is online, the target model can provide an API (Application Programming Interface) interface to the caller, so that the caller can call the target model according to the API interface. Collect the input data when the model caller calls the model. For example, for an image recognition model, the input data can be an image; for a speech recognition model, the input data can be a segment of speech. In one example, the input data of the target model can be collected in real time. For example, you can monitor the call of the target model, and obtain the input data input by the caller when the target model is monitored. In another example, the historical input data of the target model may be collected periodically at a preset time interval, and the time interval may be the following monitoring period for countering attacks. It is worth noting that step 101 may also be after step 102. For example, step 102 is to periodically collect historical input data of the target model. After the historical input data of the target model is collected, the adversarial sample space of the target model can be obtained, and then step 103 is performed. Step 103: Determine whether the input data falls into the adversarial sample space. In an example, the space coordinates of the input data can be determined, and it is determined whether the space coordinates fall into the adversarial sample space of the target model. In an example, the space coordinates can be input into a preset fitting function, and then it is determined whether the space coordinates fall into any convex envelope according to the output result. For example, if the space coordinate is x and the fitting function is F, then x can be input to F to obtain F(x). If F(x)<0, it is determined to fall into the convex envelope, otherwise, it is determined not to fall into Convex envelope. If the space coordinates fall into any convex envelope, the space coordinates fall into the adversarial sample space of the target model. In another example, the distance between the input data and each confrontation sample cluster can be calculated according to the space coordinates, and it is determined whether the distance between the input data and each confrontation sample cluster is less than a preset distance threshold. For example, the distance between the input data and the center point of each adversarial sample cluster can be calculated as the distance between the input data and the corresponding adversarial sample cluster. If there is an adversarial sample cluster, so that the distance between the input data and the adversarial sample cluster is less than the preset distance threshold, it is confirmed that the input data falls into the adversarial sample space. The distance threshold may be predetermined. Step 104: Calculate the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, determine that the confrontation attack facing the target model is detected. In an example, the monitoring parameter is the number of input data that fall into the adversarial sample space, and the attack condition is that the number reaches the number threshold. In practical applications, it is possible to monitor whether the number of input data falling into the adversarial sample space reaches the number threshold within a preset monitoring period. If the number threshold is reached, it is determined that an adversarial attack on the target-oriented model is detected. The method for determining the number threshold may be: taking the average number of input data falling into the adversarial sample space of the target model in a number of historical monitoring periods as the number threshold. For example, assuming that the monitoring period is 2 hours, and the average number of input data falling into the adversarial sample space in the target model every two hours in the last 3 days is 200, 200 can be used as the number threshold. It is worth noting that, considering that the caller's call requirements for the target model may be different at different time periods in a day, the number threshold for the differentiation of the monitoring period can also be determined. For another example, considering the existence of errors, the above-mentioned quantity threshold may be multiplied by a preset error coefficient, and the calculated value may be used as the final quantity threshold. For another example, the number threshold can also be manually set. In another example, the monitoring parameter may also be the proportion of the input data falling into the adversarial sample space, and the attack condition may be that the proportion reaches the proportion threshold. In practical applications, it is possible to monitor whether the ratio of the number of input data falling into the adversarial sample space to the number of all input data in the detection period reaches the ratio threshold within a preset monitoring period. If the ratio threshold is reached, it is confirmed that an adversarial attack facing the target model is detected. For the determination method of the ratio threshold, refer to the above-mentioned quantity threshold, which will not be repeated here. As can be seen from the above description, in an embodiment of this specification, an attack test can be performed on the target model first to obtain several adversarial samples of the target model, and the adversarial sample space is calculated by calculating the several adversarial samples. When monitoring the target model for adversarial attacks, you can collect the input data for calling the target model, determine whether the input data falls into the pre-calculated adversarial sample space, and calculate the input data that falls into the adversarial sample space during the monitoring period according to the judgment result If the monitoring parameters meet the attack conditions, then it is considered that an adversarial attack facing the target model is detected. The method described in this embodiment does not affect the normal use of the target model, and can also detect counterattacks. Fig. 2 is a schematic flowchart of another method for monitoring against attacks according to an exemplary embodiment of the present specification. The monitoring method for countering attacks can be applied to electronic devices with processors and memories, such as servers or server clusters, etc. This specification does not impose special restrictions on this. Please refer to FIG. 2, the method for monitoring the confrontation attack may include the following steps: Step 201: Obtain the confrontation sample space of the target model. Step 202: Collect input data for calling the target model. Step 203: Determine whether the input data falls into the adversarial sample space. Step 204: Calculate the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, determine that the confrontation attack facing the target model is detected. For the above steps 201 to step 204, please refer to steps 101 to 104, which will not be repeated here. Step 205: Send alarm information. When the monitoring parameters meet the preset attack conditions, after it is determined that the confrontational attack facing the target model is detected, warning information can also be sent. In one example, the alarm information may include the current monitoring period, the number/proportion of input data that has fallen into the confrontation space, and so on. For example, the alarm information may be: "223 suspicious input data were detected within 10 minutes, and there is a suspected counterattack attack". If the number of input data falling into the confrontation space is still increasing, the number/ratio of the suspicious input data can be updated, and the alarm can be continued. In another example, the alarm information may also include the identifier of the caller of the target model corresponding to the input data, and the identifier may be the ID, name, IP address, etc. of the caller. For example, the alarm information can be: "223 suspicious input data were detected within 10 minutes, and there is a suspected confrontation attack. Among them, 80% of the suspicious input data comes from user A." The caller identification information can be invoked through the target model invocation process The log gets. In another example, the warning information may also include the prediction result of the target model on the input data falling into the adversarial sample space to determine whether the adversarial attack is successful. For example, if an attacker attempts to input an image of an illegal user with interference added to the target model, and the predicted result output by the target model is a legitimate user, the warning information can be: "223 suspicious input data were detected within 10 minutes, and it is suspected. There is a confrontation attack. Among them, 220 input data output results are illegal users, and 2 input data output results are legitimate users.” Then, the success of the confrontation attack can be judged based on the prediction results output by the target model. It can be seen from the above description that, in another embodiment of this specification, after detecting that there is an adversarial attack against a target-oriented model, warning information can also be sent. The alarm information can show the number of attacks and the results of the counterattack attack, and it can also be traced back to the source of the attack. Follow-up measures can be taken to defend against counterattack attacks based on the warning information. For example, intercept calls from suspicious callers, etc., thereby effectively reducing security risks such as private information leakage and capital loss. The following describes the method for monitoring against attacks in this specification in conjunction with a specific embodiment. The monitoring method for resisting attacks can be applied to a server. Please refer to Figures 3 and 4, the monitoring method of the confrontation attack can be divided into two processes: the attack test is performed on the target model to obtain the confrontation sample space; the input data of the target model is monitored to monitor the confrontation attack. Fig. 3 is a schematic flowchart of a method for obtaining a target model adversarial sample space according to an exemplary embodiment of the present specification. In this embodiment, the target model is a face recognition model used for user identity authentication. Step 301: Invoke the face recognition model. In this embodiment, it is necessary to obtain the description document and the calling interface of the calling method of the face recognition model. Step 302: Perform a black box test based on a boundary attack on the face recognition model to obtain a number of adversarial samples. Perform an attack test on the face recognition model. In this embodiment, the attack test is a black box test based on a boundary attack. First, a more intrusive face image is constructed as a confrontation sample and input to the face recognition model. The output of the model continuously reduces the interference of the adversarial samples under the premise of ensuring the adversarial properties, and finally obtains several adversarial samples with less interference. In this embodiment, the interference of the anti-sample may be adding noise on the face image, adjusting the pixel value of a specific pixel, and so on. Step 303: Determine a confrontation sample space of the face recognition model based on the confrontation sample, where the confrontation sample space is a convex envelope. Determine the spatial coordinates of a number of adversarial samples, and cluster the K-Means algorithm based on the spatial coordinates to obtain a number of adversarial sample clusters. A corresponding convex envelope is generated for each adversarial sample cluster based on the Graham algorithm, and several of the convex envelopes generated are used as the adversarial sample space of the face recognition model. Fig. 4 is a schematic flowchart of another method for counter-attack monitoring according to an exemplary embodiment of the present specification. Step 401: Deploy a face recognition model. Step 402: Obtain the adversarial sample space of the face recognition model. Step 403: Collect and call the input image of the face recognition model. In this embodiment, the input image of the face recognition model is collected in real time. Step 404: Determine whether the input image falls into the adversarial sample space. In this embodiment, the coordinates of the input image are calculated, and based on a preset fitting function, it is determined whether the coordinates fall into any convex envelope. Step 405: Calculate the proportion of input images that fall into the confrontation sample space in the monitoring period according to the judgment result. In this embodiment, within the preset monitoring period, the input images of the face recognition model are collected in real time. Each time an input image is collected, step 404 is executed. If the judgment result is that the input image falls into the adversarial sample space, Then the count of suspicious input images is +1, and if the judgment result is that the input image does not fall into the adversarial sample space, the count of safe input images can be +1. Step 406: If the ratio reaches the ratio threshold, it is determined that a confrontation attack directed to the face recognition model is detected. In this embodiment, the proportional threshold may be obtained based on historical input data of the face recognition model. For example, statistics are obtained: the average hourly ratio of input images falling into the convex envelope of the face recognition model in the past 30 days is 0.05. The ratio threshold is determined to be 0.05, and the monitoring period is 1 hour. During the monitoring period, it can be judged in real time whether the proportion of the input image falling into the convex envelope is greater than the proportion threshold of 0.05. For example, the number of suspicious input images monitored in step 405 can be divided by the sum of the number of suspicious input images and safe input images to determine whether the ratio of the obtained suspicious input images is greater than 0.05. If it is greater than 0.05, Then it is confirmed that the counter attack is detected. Step 407: Send alarm information. After a counter attack is detected, an alert message can be sent. In this embodiment, the alarm information may include the current monitoring period, the proportion of the input image falling into the convex envelope, the identity of the caller of the face recognition model, and so on. The following table exemplarily shows an example of alarm information:

The above shows the proportion of input images suspected of adversarial attacks in the current monitoring cycle, the caller ID with a higher number of calls, and the corresponding number of calls, which fully reflects the attack status of the face recognition model in the current monitoring cycle . Take the alarm information in the above table as an example. User A has the most suspicious input images input in the current monitoring period. To prevent adversarial attacks, user A's call request can be intercepted later, for example, user A's call request during a preset time period. From the above description, it can be seen that the confrontation attack monitoring method provided in the manual can be used to monitor the confrontation attack of the face recognition model. When the confrontation attack facing the face recognition model is confirmed, the defense strategy such as interception call can be adopted in time, which is effective Reduce security risks such as leakage of private information and capital loss. Corresponding to the foregoing embodiment of the monitoring method for countering attacks, this specification also provides an embodiment of a device for countering attack detection. The embodiment of the device for anti-attack detection in this specification can be applied to a server. The device embodiments can be implemented through software, or through hardware or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory memory through the processor of the server where it is located. From the perspective of hardware, as shown in Figure 5, a hardware structure diagram of the server where the monitoring device against attacks is located in this manual, except for the processor, memory, network interface, and non- In addition to the volatile memory, the electronic equipment in which the device is located in the embodiment usually includes other hardware according to the actual function of the server, which will not be repeated here. Fig. 6 is a block diagram of a device for monitoring against attacks shown in an exemplary embodiment of this specification. Please refer to FIG. 6, the device 600 for detecting anti-attack can be applied to the server shown in FIG. 5, and includes: an acquisition unit 610, a collection unit 620, a judgment unit 630, and a monitoring unit 640. Wherein, the acquiring unit 610 acquires the confrontation sample space of the target model; the acquisition unit 620 acquires input data for calling the target model; the judging unit 630 determines whether the input data falls into the confrontation sample space; the monitoring unit 640, Calculate the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is monitored. Optionally, the judging unit 630: determine the spatial coordinates of the input data; judge whether the spatial coordinates fall into any convex envelope; if so, determine that the input data falls into the adversarial sample space. Optionally, the judging unit 630: Determine the spatial coordinates of the input data; According to the spatial coordinates, judge whether the distance between the input data and the adversarial sample cluster is less than a threshold; if so, determine that the input data falls into The adversarial sample space. Optionally, the device further includes an alarm unit 640, which sends alarm information. For the implementation process of the functions and roles of each unit in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, which will not be repeated here. For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Those of ordinary skill in the art can understand and implement without creative work. The systems, devices, modules, or units explained in the foregoing embodiments may be specifically implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. The specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A console, a tablet computer, a wearable device, or a combination of any of these devices. Corresponding to the foregoing embodiment of the monitoring method for countering attacks, this specification also provides a monitoring device for countering attacks. The device includes a processor and a memory for storing machine executable instructions. Among them, the processor and the memory are usually connected to each other through an internal bus. In other possible implementation manners, the device may also include an external interface to be able to communicate with other devices or components. In this embodiment, by reading and executing the machine executable instructions stored in the memory and corresponding to the monitoring logic of the counterattack attack, the processor is prompted to: obtain the countermeasure sample space of the target model; collect and call the target Input data of the model; determine whether the input data falls into the confrontation sample space; calculate the monitoring parameters of the input data that fall into the confrontation sample space during the monitoring period according to the determination result, when the monitoring parameters meet the preset Under the attack condition, it is determined that a confrontation attack facing the target model is detected. Optionally, when determining the adversarial sample space of the target model, the processor is prompted to: perform an attack test on the target model to obtain at least one adversarial sample of the target model; based on the adversarial sample, Determine the adversarial sample space of the target model. Optionally, when performing the attack test, the processor is prompted to: perform a black box test based on a boundary attack; or perform a white box test based on a boundary attack. Optionally, when determining the adversarial sample space of the target model based on the adversarial samples, the processor is prompted to: determine the spatial coordinates of each adversarial sample; and gather the adversarial samples based on the spatial coordinates Class, obtain several adversarial sample clusters; generate a corresponding convex envelope for each adversarial sample cluster as the adversarial sample space. Optionally, when determining whether the input data falls into the adversarial sample space, the processor is prompted to: determine the spatial coordinates of the input data; determine whether the spatial coordinates fall into any convex envelope; if so, It is determined that the input data falls into the confrontation sample space. Optionally, when determining whether the input data falls into the confrontation sample space, the processor is prompted to: determine the spatial coordinates of the input data; determine whether the input data and any confrontation sample are based on the spatial coordinates Whether the distance of the cluster is less than the distance threshold; if so, it is determined that the input data falls into the adversarial sample space. Optionally, after determining that a confrontation attack facing the target model is detected, the processor is further prompted to: send alarm information. Corresponding to the foregoing embodiment of the monitoring method against attacks, this specification also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the program is executed by a processor, the following steps are achieved: Acquire a target The confrontation sample space of the model; collect and call the input data of the target model; determine whether the input data falls into the confrontation sample space; calculate the monitoring of the input data that fall into the confrontation sample space during the monitoring period according to the judgment result Parameter, when the monitoring parameter satisfies a preset attack condition, it is determined that a confrontation attack facing the target model is detected. Optionally, the method for determining the adversarial sample space of the target model includes: performing an attack test on the target model to obtain at least one adversarial sample of the target model; and determining the target model based on the adversarial sample The adversarial sample space. Optionally, the attack test includes: a black box test based on a border attack; or a white box test based on a border attack. Optionally, the determining the adversarial sample space of the target model based on the adversarial samples includes: determining the space coordinates of each adversarial sample; clustering the adversarial samples based on the space coordinates to obtain several adversarial samples Sample clusters; generate a corresponding convex envelope for each confrontation sample cluster as the confrontation sample space. Optionally, the judging whether the input data falls into the adversarial sample space includes: determining the spatial coordinates of the input data; judging whether the spatial coordinates fall within any convex envelope; if so, determining the input The data falls into the adversarial sample space. Optionally, the judging whether the input data falls into the adversarial sample space includes: determining the spatial coordinates of the input data; judging whether the distance between the input data and any adversarial sample cluster is based on the spatial coordinates Is less than the distance threshold; if yes, it is determined that the input data falls into the confrontation sample space. Optionally, the monitoring parameter is the quantity of input data that falls into the adversarial sample space, and the attack condition is that the quantity reaches a quantity threshold. Optionally, the monitoring parameter is a proportion of input data falling into the adversarial sample space, and the attack condition is that the proportion reaches a proportion threshold. Optionally, after the determining that the confrontation attack facing the target model is detected, the method further includes: sending alarm information. The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the attached patent application. In some cases, the actions or steps described in the scope of the patent application may be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. The above are only the preferred embodiments of this specification and are not intended to limit this specification. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this specification shall be included in this specification. Within the scope of protection.

600:裝置 610:獲取單元 620:採集單元 630:判斷單元 640:監測單元600: device 610: Get Unit 620: Acquisition unit 630: Judgment Unit 640: Monitoring unit

[圖1]是本說明書一示例性實施例示出的一種對抗攻擊的監測的方法的流程示意圖。 [圖2]是本說明書一示例性實施例示出的另一種對抗攻擊的監測方法的流程示意圖。 [圖3]是本說明書一示例性實施例示出的一種獲取目標模型對抗樣本空間的方法的流程示意圖。 [圖4]是本說明書一示例性實施例示出的另一種對抗攻擊監測的方法的流程示意圖。 [圖5]是本說明書一示例性實施例示出的一種用於對抗攻擊監測裝置的一結構示意圖。 [圖6]是本說明書一示例性實施例示出的一種對抗攻擊監測裝置的框圖。[Fig. 1] is a schematic flowchart of a method for monitoring against attacks shown in an exemplary embodiment of this specification. [Fig. 2] is a schematic flowchart of another method for monitoring against attacks shown in an exemplary embodiment of this specification. [Fig. 3] is a schematic flowchart of a method for obtaining a target model adversarial sample space according to an exemplary embodiment of this specification. [Fig. 4] is a schematic flowchart of another method for counter-attack monitoring shown in an exemplary embodiment of this specification. [Fig. 5] is a schematic diagram of a structure of an anti-attack monitoring device shown in an exemplary embodiment of this specification. [Fig. 6] is a block diagram of an anti-attack monitoring device shown in an exemplary embodiment of this specification.

Claims

A method of monitoring against attacks, including: Obtain the adversarial sample space of the target model; Collect and call the input data of the target model; Determine whether the input data falls into the adversarial sample space; According to the judgment result, the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period are calculated, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected.

According to the method described in claim 1, the method for determining the adversarial sample space of the target model includes: Performing an attack test on the target model to obtain at least one adversarial sample of the target model; Based on the adversarial sample, the adversarial sample space of the target model is determined.

According to the method described in claim 2, the attack test includes: Black box testing based on border attacks; or White box testing based on border attacks.

According to the method of claim 2, the determining the adversarial sample space of the target model based on the adversarial sample includes: Determine the spatial coordinates of each adversarial sample; Cluster the adversarial samples based on the space coordinates to obtain several adversarial sample clusters; A corresponding convex envelope is generated for each adversarial sample cluster as the adversarial sample space.

According to the method of claim 4, the judging whether the input data falls into the adversarial sample space includes: Determine the spatial coordinates of the input data; Judge whether the space coordinates fall into any convex envelope; If so, it is determined that the input data falls into the adversarial sample space.

According to the method of claim 4, the judging whether the input data falls into the adversarial sample space includes: Determine the spatial coordinates of the input data; According to the space coordinates, determine whether the distance between the input data and any adversarial sample cluster is less than the distance threshold; If so, it is determined that the input data falls into the adversarial sample space.

As in the method described in claim 1, the monitoring parameter is the quantity of input data that falls into the adversarial sample space, and the attack condition is that the quantity reaches the quantity threshold.

As in the method described in claim 1, the monitoring parameter is the proportion of input data falling into the adversarial sample space, and the attack condition is that the proportion reaches the proportion threshold.

According to the method described in claim 1, after the determination and detection of an adversarial attack facing the target model, the method further includes: Send alert information.

A monitoring device against attacks, including: The acquisition unit, which acquires the adversarial sample space of the target model; The acquisition unit collects and calls the input data of the target model; The judging unit judges whether the input data falls into the adversarial sample space; The monitoring unit calculates the monitoring parameters of the input data falling into the confrontation sample space in the monitoring period according to the judgment result, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected.

For the device according to claim 10, the method for determining the adversarial sample space of the target model includes: Performing an attack test on the target model to obtain at least one adversarial sample of the target model; Based on the adversarial sample, the adversarial sample space of the target model is determined.

For the device described in claim 11, the attack test includes: Black box testing based on border attacks; or White box testing based on border attacks.

According to the device of claim 11, the determining the adversarial sample space of the target model based on the adversarial sample includes: Determine the spatial coordinates of each adversarial sample; Cluster the adversarial samples based on the space coordinates to obtain several adversarial sample clusters; A corresponding convex envelope is generated for each adversarial sample cluster as the adversarial sample space.

For the device described in claim 13, the judging unit: Determine the spatial coordinates of the input data; Judge whether the space coordinates fall into any convex envelope; If so, it is determined that the input data falls into the adversarial sample space.

For the device described in claim 13, the judging unit: Determine the spatial coordinates of the input data; According to the space coordinates, determine whether the distance between the input data and the adversarial sample cluster is less than the distance threshold; If so, it is determined that the input data falls into the adversarial sample space.

For the device described in claim 10, the monitoring parameter is the quantity of input data that falls into the confrontation sample space, and the attack condition is that the quantity reaches the quantity threshold.

For the device described in claim 10, the monitoring parameter is the proportion of the input data falling into the confrontation sample space, and the attack condition is that the proportion reaches the proportion threshold.

The device according to claim 10, further comprising: The alarm unit sends alarm information.

A monitoring device against attacks, including: processor; Memory used to store machine executable instructions; Among them, by reading and executing the machine executable instructions stored in the memory and corresponding to the anti-attack monitoring logic, the processor is prompted to: Obtain the adversarial sample space of the target model; Collect and call the input data of the target model; Determine whether the input data falls into the adversarial sample space; According to the judgment result, the monitoring parameters of the input data falling into the confrontation sample space during the monitoring period are calculated, and when the monitoring parameters meet the preset attack conditions, it is determined that the confrontation attack facing the target model is detected.