下面結合附圖,對本說明書提供的方案進行描述。
如前所述,目前API調用過程中存在洩漏隱私資料的風險。在請求方屬於跨境請求方(如跨境商戶)的場景下,檢測隱私資料洩漏風險尤為緊迫。具體地,國內某些大型企業(如阿里巴巴)的業務範圍已擴展到境外,因此存在大量境外商戶,資料跨境調用已成常態。境外商戶應軟硬體環境及業務場景與國內存在差異,現有資料防護架構難免存在不足,從而造成使用者隱私資料洩漏。再者,不同境外商戶的IT架構通常不同,造成API調用系統複雜,梳理難度大,容易被不法分子利用,導致隱私資料(如國內使用者敏感性資料)洩漏。
此外,因為API數量大、API開發管理漏洞難以避免等原因,API實際輸出的資料內容與請求方實際請求獲取的資料或者請求方具有使用權限的資料可能存在差別。例如,對於某個請求方無權調用的API,因API許可權管理存在疏漏等原因,被該某個請求方非法調用,並輸出使用者的個人敏感資訊,造成使用者隱私權洩露。
又例如,某個請求方有權調用某個API,但是其與服務平臺的簽約資料中只包括該某個API可輸出的全量資料(如使用者性別、用戶位址和用戶手機號)中的部分資料內容(如使用者性別)。然而,該某個請求方在調用該某個API時,除向該某個API傳入對應於該部分資料內容的輸入參數以外,還傳入對應於全量資料中其他資料內容(如使用者位址)的輸入參數,因API許可權管理存在疏漏等原因,導致該某個API返回給該某個請求方的資料(如使用者性別和用戶位址)超出簽約的資料範圍(如使用者性別)。
再例如,請求方所調用的API介面,因一些舊的未更新的欄位設置(如業務人員將用戶手機號和身份證號拼接為一個欄位),導致API介面輸出資料的範圍(如使用者手機號和身份證號)與請求方的簽約資料範圍(如使用者手機號)不一致。
基於以上,發明人提出一種針對隱私資料洩漏的風險評估方法及裝置。在一個實施例中,圖1示出根據一個實施例的風險評估方法的實施場景示意圖,如圖1所示,請求方人員可以透過請求方用戶端向服務平臺發送API調用請求(或稱請求消息),相應地,服務平臺可以根據請求消息生成對應的系統日誌,並且向請求方用戶端返回API調用回應(或稱回應訊息)。可以理解,閘道可以對請求消息和回應訊息進行記錄,產生對應的網路流量記錄(或稱網路流量日誌)。
由此,風險評估裝置可以從閘道中獲取系統日誌和網路流量記錄,並對獲取的網路流量記錄進行解析,得到解析資料;另一方面,風險評估裝置還可以從服務平臺中獲取請求方調用API的許可權資料。進一步地,風險評估裝置可以將系統日誌與許可權資料進行比對,並將解析資料與許可權資料進行比對,再結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,從而及時檢測請求方的違規、異常調用行為。
下面結合具體的實施例,描述上述風險評估方法的實施步驟。
首先需要說明的是,本說明書實施例中的描述用於“第一”、“第二”、“第三”等類似用語,僅用於區分同類事物,不具有其他限定作用。
圖2示出根據一個實施例的針對隱私資料洩漏的風險評估方法的流程圖,所述方法的執行主體可以為任何具有計算、處理能力的裝置或設備或平臺或伺服器集群等,例如,所述執行主體可以為圖1中示出的風險評估裝置,又例如,所述執行主體還可以為上述服務平臺。
如圖2所示,所述方法可以包括以下步驟:
步驟S210,獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於所述請求方向所述服務平臺發出的調用API的請求消息而生成,並包括,根據所述請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及所述第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括所述服務平臺針對該請求消息返回的回應訊息。步驟S220,對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。步驟S230,從所述服務平臺獲取所述請求方調用API的許可權資料,所述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。步驟S240,將所述若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。步驟S250,至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。
以上步驟具體如下:
首先,在步驟S210,獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄。
在一個實施例中,其中請求方可以為個人或機構或企業等,其可以透過在服務平臺中註冊的帳號登錄服務平臺,並在使用服務平臺的過程中發起API調用請求。在一個例子中,上述請求方可以是跨境商戶,上述服務平臺可以是跨境商戶系統或跨境商戶開放平臺。可以理解,服務平臺中可以儲存對大量服務物件的基礎屬性資訊,以及大量服務物件在使用服務過程中產生的服務資料。比如說,服務物件在服務平臺中進行註冊時,會填寫一些註冊資訊,又或者,服務物件使用服務會產生訂單資料、評價資訊等。本說明書實施例中,將請求方請求調用的資料所針對的服務物件,稱為目標對象。在一個實施例中,上述隱私資料可以包括服務平臺中儲存的全量資料。
下面對系統日誌和網路流量的產生過程進行介紹。在一個實施例中,請求方可以向服務平臺發送調用API的請求消息,服務平臺在接收到請求消息後,基於該請求消息進行業務記錄,生成對應的系統日誌,並且,針對該請求消息生成回應訊息,並將回應訊息返回給請求方。可以理解,在實體層上,請求方和服務平臺之間的通信會經過閘道,具體地,請求方發送的請求消息會先上傳至閘道,再經由閘道發送給服務平臺,在此上行過程中網路可以對請求消息進行記錄,另外服務平臺返回給請求方的回應訊息也會先下發至閘道,再由閘道發送給請求方,在此下行過程中,閘道可以對回應訊息進行記錄,並且記錄的請求消息和對應的回應訊息可以組成一條網路流量記錄。
對於上述系統日誌的生成,首先需要說明的是,服務平臺中儲存其可以提供的API服務的配置資訊。在一個實施例中,配置資訊中包括可以每個API的名稱,可以向每個API傳入的全量參數,全量參數中每個參數所用於調用資料(如13800001111)的資料含義(手機號)。進一步地,服務平臺在接收到請求消息之後,可以根據其儲存的配置資訊,確定請求消息中包括的目標API,針對目標API輸入的參數,以及這些參數所對應的資料含義,進而生成系統日誌。需要說明,本說明書實施例中,將與隱私相關的資料含義,稱為隱私類別,具體地,可以包括用戶手機號、公司總機號、身份證號、用戶姓名等等。
如上所述,在一個實施例中,上述隱私資料可以包括服務平臺中儲存的全量資料。如此,在本步驟中可以包括:獲取請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄,作為上述若干系統日誌和若干網路流量記錄。
在另一個實施例中,可以將風險評估重點聚焦到某些隱私類別,具體地,可以預先設定需要關注的多個隱私類別。基於此,在獲取請求方調用API產生的多條系統日誌和多條網路流量記錄之後,需要根據預先設定的多個隱私類別,對所述多條系統日誌和多條網路流量記錄進行過濾處理,得到所述若干系統日誌和若干網路流量記錄。
在一個具體的實施例中,上述過濾處理可以包括:利用所述多個隱私類別,對所述多條系統日誌進行匹配,將匹配成功的系統日誌作為所述若干系統日誌。由上述可知,每條系統日誌中包括根據對應的請求消息確定出的API、請求傳入該API的參數,以及該參數對應的可調用資料的含義。由此可以利用多個隱私類別對多條系統日誌中的參數對應的資料含義進行匹配,如此可以匹配到資料含義中包括多個隱私類別中任一類別的系統日誌,歸入上述若干系統日誌。
在另一個具體的實施例中,上述過濾處理還可以包括:利用預先基於所述多個隱私類別設定的過濾項,從所述多條網路流量記錄中篩選出所述若干網路流量記錄,所述過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。需要理解,網路流量記錄中包括請求消息和對應的回應訊息,請求消息和回應訊息中所包括欄位的資料含義往往是不明確的,不同於系統日誌包括基於API配置資訊從請求消息中確定出的資料含義。因此,利用多個隱私類別直接進行匹配是難以實現過濾的。
上述過濾項可以基於多個隱私類別而預先設定,在一個例子中,可以包括針對手機號設定的正則項,用於匹配出具有以下特點的欄位值:首位為1,且前三位屬於已有網號(如中國移動網號138、139等),以將包含該欄位值的網路流量記錄歸入上述若干網路流量記錄。在一個例子中,可以包括針對身份證號設定的自訂函數(User-Defined Function,UDF),用於匹配出符合身份證號編碼規則的欄位值,以將包含該欄位值的網路流量記錄歸入上述若干網路流量記錄。在另一個例子中,可以包括針對用戶姓名設定的關鍵欄位,比如將用於調取用戶姓名的API參數(如User_name)設定為關鍵欄位,由此可以將包括該關鍵欄位的網路流量記錄歸入上述若干網路流量記錄。
以上在步驟S210,可以獲取請求方請求調用目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄。
接著,在步驟S220,對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。
在一個實施例中,本步驟可以包括:先對所述若干網路流量記錄進行解析處理,得到所述API輸出資料,所述API輸出資料中包括多個欄位。可以理解,是對網路流量記錄中的回應訊息進行解析得到上述API輸出資料。然後確定多個欄位中若干隱私欄位對應的若干第三隱私類別。具體地,可以通過機器學習、正則匹配等方式實現。在一個具體的實施例中,可以基於預先訓練的自然語言處理模型,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。在一個例子中,其中自然語言處理模型可以包括Transformer、Bert等模型。在一個例子中,可以確定若干隱私欄位包括李情深、似海有限公司、北京市青年路珍重大廈等,對應的若干第三隱私類別包括:用戶姓名、企業名稱、地址等。在另一個具體的實施例中,可以基於預先設定的多個正則匹配規則,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。在一個例子中,可以確定欄位名為“phone”欄位為隱私欄位,且其對應的第三隱私類別為手機號。在另一個例子中,可以確定欄位值中包括“@”和的欄位為隱私欄位,且其對應的第三隱私類別為郵箱位址。如此,可以確定出若干第三隱私類別。
進一步地,在一個具體的實施中,可以將上述若干第三隱私類別作為若干第二隱私類別。在另一個具體的實施例中,基於若干隱私欄位的欄位值,對所述若干第三隱私類別進行驗證處理,並將通過驗證的第三隱私類別歸入所述若干第二隱私類別。在一個例子中,所述若干隱私欄位中包括任意的第一欄位,對應所述若干第三隱私類別中的第一類別,相應地,上述驗證處理可以包括:利用預先儲存的對應於所述第一類別的多個合法欄位值,對所述第一欄位進行匹配,並在匹配成功的情況下,判定所述第一類別通過驗證。在一個具體的例子中,假定第一類別為用戶姓名,第一欄位為“歐茶”,上述多個合法欄位值包括已實名認證的多個用戶姓名,由此,可以查找多個用戶姓名中是否存在歐茶,如果存在則將用戶姓名歸入若干第二隱私類別。
在另一個例子中,上述驗證處理還可以包括:利用預先訓練的針對所述第一類別的分類模型,對所述第一欄位進行分類,在分類結果指示所述第一欄位屬於所述第一類別的情況下,判定所述第一類別通過驗證。在一個具體的例子中,假定第一類別為郵箱位址,且第一欄位為:明天記得來吃飯,@小花,則分類結果指示該第一欄位不是郵箱位址,再假定第一欄位為58978@ali.cn,則分類結果指示該第一欄位是郵箱位址,並將郵箱位址歸入若干第二隱私類別。如此,可以在確定出若干第三隱私類別的基礎上,進一步驗證得到若干第二隱私類別,以保證確定出的若干第二隱私類別的準確性,進而使得後續得到的針對隱私資料洩漏的風險評估結果更加準確。
以上,可以得到回應訊息中包括的API輸出資料所對應的若干第二隱私類別。另一方面,可選地,還可以對網路流量記錄中包括的請求消息進行解析。需要說明的是,上述系統日誌的生成是在應用層上實現的,網路流量記錄的產生是在底層,在工程實現上,對網路流量記錄進行解析,難以獲取上述服務平臺中儲存的完備的API配置資訊進行精準解析。因此,往往需要考慮其他解析方式。在一個實施例中,解析資料中還包括對請求消息進行解析得到的若干第二目標API和針對若干第二目標API輸入的第二參數。此處解析出的API和參數,相較系統日誌中包括的API名稱和參數而言,不那麼精準,相對粗略。
在一個具體的實施例中,可以利用預先基於多個API設定的API解析規則,從所述若干網路流量記錄中解析出所述若干第二目標API,所述API解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。在另一個具體的實施例中,可以利用預先基於多個參數設定的參數解析規則,從所述若干網路流量記錄中解析出所述若干第二參數,所述參數解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。需要說明的是,對上述API解析規則和參數解析規則中涉及的自訂UDF函數、關鍵欄位和正則項,可以參見前述實施例中對過濾項的相關描述,在此不作贅述。
以上,對若干網路流量記錄進行解析,可以得到解析資料。另一方面,可以執行步驟S230,從所述服務平臺獲取所述請求方調用API的許可權資料。
具體地,上述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。在一個例子中,其中API集合可以包括一個或多個API的名稱,如http://yiteng.cn/data/?id=91,https://niuqi.cn/data/?id=8等。在一個例子中,其中參數集合中的參數可以包括:gender、phone和add.。在一個例子中,其中隱私類別集合中的隱私類別可以包括性別、電話和位址。
在一個實施例中,上述服務平臺中包括使用者授權系統、簽約系統和API管理系統等。需要理解,其中使用者授權系統中可以儲存個人使用者或企業使用者授權允許服務平臺對外提供的部分隱私資料。其中簽約系統中可以儲存請求方與服務平臺協商約定的請求方可以從服務平臺請求獲取的資料範圍。API管理系統中包括服務平臺可以提供給請求方調用的API介面文檔等資訊。基於此,可以從這些系統中分別獲取相關資料,整理後再歸入上述許可權資料。
如此,可以從服務平臺中獲取請求方調用API的許可權資料。
然後,在步驟S240,將若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。
一方面,在一個實施例中,上述得到第一比對結果,可以包括:判斷所述若干第一目標API是否屬於所述API集合,得到第一判斷結果,歸入所述第一比對結果。需要理解,對於若干系統日誌中每條系統日誌中包括的若干第一目標API,均需要判斷其是否屬於許可權資料中的API集合。在一個具體的實施例中,假定若干系統日誌的目標API包括http://user.cn/data/?id=00,上述API集合中包括http://user.cn/data/?id=00和http://company.cn/data/?id=66,通過比對可以確定若干系統日誌中的目標API均屬於API集合,不屬於API集合的個數為0,由此可以將第一判斷結果確定為0。
在另一個實施例中,上述得到第一比對結果,還可以包括:判斷所述第一參數是否屬於所述參數集合,得到第二判斷結果,歸入所述第一比對結果。需要理解,對於若干系統日誌中每條系統日誌中包括的第一參數,均需要判斷其是否屬於許可權資料中的參數集合。在一個例子中,假定上述若干系統日誌中的參數包括phone和IDnumber,上述參數集合中包括phone,通過比對可以確定IDnumber不屬於參數集合,由此可以將第二判斷結果確定為1。
在又一個實施例中,還可以包括:判斷所述若干第一隱私類別是否屬於所述隱私類別集合,得到第三判斷結果,歸入所述第一比對結果。需要理解,對於若干系統日誌中每條系統日誌中包括的若干第一隱私類別,均需要判斷其是否屬於許可權資料中的隱私類別集合。在一個例子中,假定上述若干系統日誌中的第三隱私類別包括手機號和身份證號,上述隱私類別集合中包括手機號,通過比對可以確定身份證號不屬於隱私類別集合,由此可以將隱私類別比對結果確定為1。
由上可以得到第一判斷結果、第二判斷結果和第三判斷結果,作為上述第一比對結果。
另一方面,在一個實施例中,上述得到第二比對結果,可以包括:判斷所述若干第二隱私類別是否屬於所述隱私類別集合,得到第四判斷結果,歸入所述第二比對結果。在另一個實施例中,還可以包括:判斷上述若干第二目標API是否屬於所述API集合,得到第五判斷結果,歸入所述第二比對結果。在又一個實施例中,還可以包括:判斷上述第二參數是否屬於所述參數集合,得到第六判斷結果,歸入所述第二比對結果。
以上,可以得到第一比對結果和第二比對結果。接著,在步驟S250,至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。
在一個實施例中,本步驟中可以包括:將所述第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示所述隱私資料洩漏風險。在一個更具體的實施例中,其中第一風險評估模型可以採用決策樹、隨機森林、adboost、神經網路等機器學習演算法。在一個更具體的實施例中,其中第一預測結果可以為風險分類等級,如高、中、低等。在另一個更具體的實施例中,其中第一預測結果可以為風險評估分數,如20或85等等。需要說明的是,對第一風險評估模型的使用過程和訓練過程類似,因此對訓練過程不作贅述。
在另一個實施例中,本步驟中可以包括:首先,根據所述若干系統日誌和若干網路流量記錄,確定監控指標的指標值,所述監控指標針對請求方API調用行為而預先設定;接著,將預先獲取的所述請求方的歷史指標值與所述指標值進行比對,得到第三比對結果;然後,基於所述第一比對結果、第二比對結果和第三比對結果,評估所述請求方調用API的隱私資料洩漏風險。
在一個具體的實施例中,上述監控指標可以包括以下中的一種或多種:單位時間內請求方向所述服務平臺發送的請求消息的條數,單位時間內請求方請求調用的隱私資料所對應的目標對象的個數,單位時間內請求方請求調用的隱私資料所對應的隱私類別的個數。在一個例子中,其中單位時間可以為每年、每月、每週、每天、每小時、每分鐘等等。在一個具體的例子中,監控指標可以包括請求方每天的調用請求中包括的用戶ID(可以從請求消息的入參中解析得到)的數量。
在一個具體的實施例中,上述歷史指標值可以是根據請求方的調用隱私資料產生的歷史系統日誌和歷史網路流量記錄而確定的。在一個例子中,監控指標中可以包括請求方每分鐘發出的請求消息的條數,假定針對該條數的歷史指標值為20條,而確定當前確定出的指標值為100條,由此可以將4((100-20)/20)確定針對該條數的比對結果,歸入上述第三比對結果。
在一個具體的實施例中,可以結合預先設定的評估規則,根據所述第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏。在一個例子中,其中評估規則可以包括:如果比對結果中超出許可權範圍的隱私類別包括用戶身份證號,則判定請求方的API調用發送隱私資料洩漏。在另一個具體的實施例中,可以將所述第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示所述隱私資料洩漏風險。在一個更具體的實施例中,其中第二風險評估模型可以採用決策樹、隨機森林、adboost、神經網路等機器學習演算法。在一個更具體的實施例中,其中第二預測結果可以為風險分類等級,如極高、較高、中、較低、極低等。在另一個更具體的實施例中,其中第二預測結果可以為風險評估分數,如15或90等等。需要說明的是,對第二風險評估模型的使用過程和訓練過程類似,因此對訓練過程不作贅述。如此,可以基於上述三個比對結果,評估請求方調用的資料洩漏風險。
綜上,在本說明書實施例提供的針對隱私資料洩漏的風險評估方法中,通過獲取請求方調用API產生的系統日誌和網路流量記錄,以及請求方調用API的許可權資料,對網路流量進行解析得到解析資料,再將解析資料與許可權資料進行比對,並將系統日誌與許可權資料進行比對,結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,以及時檢測、發現請求方的違規、異常調用行為。進一步地,還可以利用獲取的系統日誌和解析得到的網路流量記錄,確定針對請求方行為設定的監控指標的指標值,再將該指標值與歷史指標值進行比對,從而進一步提高風險評估結果的準確度和可用性。
根據另一方面的實施例,本說明書還披露一種評估裝置。具體地,圖3示出根據一個實施例的針對隱私資料洩漏的風險評估裝置結構圖。如圖3所示,所述裝置300可以包括:
第一獲取單元310,配置為獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於所述請求方向所述服務平臺發出的調用API的請求消息而生成,並包括,根據所述請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及所述第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括所述服務平臺針對該請求消息返回的回應訊息。解析單元320,配置為對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。第二獲取單元330,配置為從所述服務平臺獲取所述請求方調用API的許可權資料,所述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。比對單元340,配置為將所述若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。評估單元350,配置為至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。
在一個實施例中,第一獲取單元310具體包括:獲取子單元311,配置為獲取所述請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄;過濾子單元312,配置為基於預先設定的多個隱私類別,對所述多條系統日誌和多條網路流量記錄進行過濾處理,得到所述若干系統日誌和若干網路流量記錄。
在一個具體的實施例中,所述過濾子單元312具體配置為:利用所述多個隱私類別,對所述多條系統日誌進行匹配,將匹配成功的系統日誌作為所述若干系統日誌;利用預先基於所述多個隱私類別設定的過濾項,從所述多條網路流量記錄中篩選出所述若干網路流量記錄,所述過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。
在一個實施例中,所述網路流量記錄還包括所述請求消息,所述解析資料還包括對所述請求消息進行解析得到的若干第二目標API和針對若干第二目標API輸入的第二參數。
在一個具體的實施例中,其中解析單元320還配置為:利用預先基於多個API設定的API解析規則,從所述若干網路流量記錄中解析出所述若干第二目標API,所述API解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項;利用預先基於多個參數設定的參數解析規則,從所述若干網路流量記錄中解析出所述若干第二參數,所述參數解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。
在一個實施例中,所述解析單元320具體包括:解析子單元321,配置為對所述若干網路流量記錄進行解析處理,得到所述API輸出資料,所述API輸出資料中包括多個欄位;確定子單元322,配置為確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別;所述解析單元具體還包括:歸入子單元323,配置為將所述若干第三隱私類別作為所述若干第二隱私類別;或驗證子單元324,配置為基於所述若干隱私欄位的欄位值,對所述若干第三隱私類別進行驗證處理,並將通過驗證的第三隱私類別歸入所述若干第二隱私類別。
在一個具體的實施例中,所述確定子單元322具體配置為:基於預先訓練的自然語言處理模型,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別;或,基於預先設定的多個正則匹配規則,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。
在另一個具體的實施例中,所述若干隱私欄位中包括任意的第一欄位,對應所述若干第三隱私類別中的第一類別;其中驗證子單元324具體配置為:利用預先儲存的對應於所述第一類別的多個合法欄位值,對所述第一欄位進行匹配,並在匹配成功的情況下,判定所述第一類別通過驗證;或,利用預先訓練的針對所述第一類別的分類模型,對所述第一欄位進行分類,在分類結果指示所述第一欄位屬於所述第一類別的情況下,判定所述第一類別通過驗證。
在一個實施例中,所述比對單元340具體配置為:判斷所述若干第一目標API是否屬於所述API集合,得到第一判斷結果,歸入所述第一比對結果;判斷所述第一參數是否屬於所述參數集合,得到第二判斷結果,歸入所述第一比對結果;判斷所述若干第一隱私類別是否屬於所述隱私類別集合,得到第三判斷結果,歸入所述第一比對結果;判斷所述若干第二隱私類別是否屬於所述隱私類別集合,得到第四判斷結果,歸入所述第二比對結果。
在一個實施例中,所述比對單元340還配置為:判斷所述若干第二隱私類別是否屬於所述隱私類別集合,得到第四判斷結果,歸入所述第二比對結果;判斷所述若干第二目標API是否屬於所述API集合,得到第五判斷結果,歸入所述第二比對結果;判斷所述第二參數是否屬於所述參數集合,得到第六判斷結果,歸入所述第二比對結果。
在一個實施例中,所述評估單元350具體配置為:將所述第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示所述隱私資料洩漏風險。
在一個實施例中,所述評估單元350具體包括:處理子單元351,配置為根據所述若干系統日誌和若干網路流量記錄,確定監控指標的指標值,所述監控指標針對請求方API調用行為而預先設定;比對子單元352,配置為將預先獲取的所述請求方的歷史指標值與所述指標值進行比對,得到第三比對結果;評估子單元353,配置為基於所述第一比對結果、第二比對結果和第三比對結果,評估所述請求方調用API的隱私資料洩漏風險。
在一個具體的實施例中,所述監控指標中包括以下中的一種或多種:單位時間內請求方向所述服務平臺發送的請求消息的條數,單位時間內請求方請求調用的隱私資料所對應的目標對象的個數,單位時間內請求方請求調用的隱私資料所對應的隱私類別的個數。
在另一個具體的實施例中,所述評估子單元353具體配置為:結合預先設定的評估規則,根據所述第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏;或,將所述第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示所述隱私資料洩漏風險。
綜上,在本說明書實施例提供的針對隱私資料洩漏的風險評估裝置中,透過獲取請求方調用API產生的系統日誌和網路流量記錄,以及請求方調用API的許可權資料,對網路流量進行解析得到解析資料,再將解析資料與許可權資料進行比對,並將系統日誌與許可權資料進行比對,結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,以及時檢測、發現請求方的違規、異常調用行為。進一步地,還可以利用獲取的系統日誌和解析得到的網路流量記錄,確定針對請求方行為設定的監控指標的指標值,再將該指標值與歷史指標值進行比對,從而進一步提高風險評估結果的準確度和可用性。
根據另一方面的實施例,還提供一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行結合圖2所描述的方法。
根據再一方面的實施例,還提供一種計算設備,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現結合圖2所述的方法。
本領域技術人員應該可以意識到,在上述一個或多個示例中,本發明所描述的功能可以用硬體、軟體、韌體或它們的任意組合來實現。當使用軟體實現時,可以將這些功能儲存在電腦可讀媒體中或者作為電腦可讀媒體上的一個或多個指令或代碼進行傳輸。
以上所述的具體實施方式,對本發明的目的、技術方案和有益效果進行了進一步詳細說明,所應理解的是,以上所述僅為本發明的具體實施方式而已,並不用於限定本發明的保護範圍,凡在本發明的技術方案的基礎之上,所做的任何修改、等同替換、改進等,均應包括在本發明的保護範圍之內。
The following describes the solutions provided in this specification with reference to the accompanying drawings.
As mentioned earlier, there is a risk of leaking private information during the current API call process. In a scenario where the requesting party is a cross-border requesting party (such as a cross-border merchant), it is particularly urgent to detect the risk of privacy data leakage. Specifically, some large domestic companies (such as Alibaba) have expanded their business scope overseas, so there are a large number of overseas merchants, and cross-border transfer of information has become the norm. The software and hardware environment and business scenarios of overseas merchants are different from those in China, and the existing data protection structure is inevitably insufficient, resulting in the leakage of user privacy data. Furthermore, the IT architectures of different overseas merchants are usually different, making the API call system complex, difficult to sort out, and easy to be used by criminals, resulting in the leakage of private data (such as sensitive data of domestic users).
In addition, due to the large number of APIs and the difficulty of avoiding API development and management vulnerabilities, there may be differences between the content of the data actually output by the API and the data actually requested by the requester or the data for which the requester has usage rights. For example, an API that a certain requester does not have the right to call is illegally called by the certain requester due to omissions in API permission management, and the user's personal sensitive information is output, causing the user's privacy to be leaked.
For another example, a requesting party has the right to call a certain API, but its contract data with the service platform only includes the full amount of data that can be output by the certain API (such as user gender, user address, and user mobile phone number). Part of the data content (such as user gender). However, when the certain requester calls the certain API, in addition to passing the input parameters corresponding to the part of the data content to the certain API, it also passes in other data content corresponding to the full amount of data (such as user location). Address), due to omissions in API permission management, etc., the data returned by the API to the requester (such as user gender and user address) exceeds the contracted data range (such as user gender) ).
For another example, the API interface called by the requesting party, due to some old unupdated field settings (such as the business personnel splicing the user’s mobile phone number and ID number into one field), resulting in the scope of the API interface output data (such as using The user’s mobile phone number and ID number) are inconsistent with the requesting party’s contract data range (such as the user’s mobile phone number).
Based on the above, the inventor proposes a risk assessment method and device for the leakage of private data. In one embodiment, FIG. 1 shows a schematic diagram of an implementation scenario of a risk assessment method according to an embodiment. As shown in FIG. 1, the requester personnel can send an API call request (or request message) to the service platform through the requester client. ), correspondingly, the service platform can generate the corresponding system log according to the request message, and return an API call response (or response message) to the requesting client. It can be understood that the gateway can record request messages and response messages to generate corresponding network traffic records (or network traffic logs).
As a result, the risk assessment device can obtain system logs and network traffic records from the gateway, and analyze the obtained network traffic records to obtain analytical data; on the other hand, the risk assessment device can also obtain the requestor from the service platform Permission data for calling API. Further, the risk assessment device can compare the system log with the permission data, compare the analytical data with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requesting party’s API call. In this way, the violation and abnormal calling behavior of the requesting party can be detected in time.
The following describes the implementation steps of the above risk assessment method in conjunction with specific embodiments.
First of all, it should be noted that the descriptions in the embodiments of this specification are used for "first", "second", "third" and other similar terms, and are only used to distinguish similar things and do not have other limiting effects.
Figure 2 shows a flowchart of a method for risk assessment of privacy data leakage according to an embodiment. The execution subject of the method can be any device or device or platform or server cluster with computing and processing capabilities, for example, The execution subject may be the risk assessment device shown in FIG. 1. For another example, the execution subject may also be the above-mentioned service platform.
As shown in Figure 2, the method may include the following steps:
Step S210: Obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the call API issued by the request to the service platform The request message is generated, and includes, a number of first target APIs determined according to the request message, a first parameter input for the number of first target APIs, and a number of first privacy categories corresponding to the first parameters; each Each network traffic record includes at least the response message returned by the service platform for the request message. Step S220: Analyze the plurality of network traffic records to obtain analytical data, which includes at least a plurality of second privacy categories corresponding to the API output data. Step S230: Obtain the permission data for the requester to call the API from the service platform, the permission data includes the API set that the requester has the right to call, and is composed of the parameters that the API set has the right to pass in. The parameter set, and the privacy category set corresponding to the parameter set. Step S240, comparing the plurality of system logs with the permission data to obtain a first comparison result, and comparing the analysis data with the permission data to obtain a second comparison result. Step S250, based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API.
The above steps are as follows:
First, in step S210, obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object stored in the service platform.
In one embodiment, the requesting party may be an individual, an organization, or an enterprise, etc., which may log in to the service platform through an account registered in the service platform, and initiate an API call request during the process of using the service platform. In an example, the requestor may be a cross-border merchant, and the service platform may be a cross-border merchant system or a cross-border merchant open platform. It can be understood that the service platform can store the basic attribute information of a large number of service objects, as well as the service data generated by a large number of service objects in the process of using the service. For example, when a service object is registered in the service platform, some registration information will be filled in, or the use of the service by the service object will generate order data, evaluation information, etc. In the embodiment of this specification, the service object targeted by the data requested by the requester is referred to as the target object. In one embodiment, the above-mentioned private data may include the entire amount of data stored in the service platform.
The process of generating system logs and network traffic is introduced below. In one embodiment, the requester may send a request message for calling the API to the service platform. After receiving the request message, the service platform records the business based on the request message, generates a corresponding system log, and generates a response to the request message Message and return the response message to the requester. It can be understood that at the physical layer, the communication between the requester and the service platform will pass through the gateway. Specifically, the request message sent by the requester will be uploaded to the gateway first, and then sent to the service platform through the gateway, where it goes up. In the process, the network can record the request message. In addition, the response message returned by the service platform to the requester will be sent to the gateway first, and then sent to the requester by the gateway. During this downstream process, the gateway can respond The message is recorded, and the recorded request message and the corresponding response message can form a network traffic record.
For the generation of the above system log, the first thing that needs to be explained is that the service platform stores the configuration information of the API service it can provide. In one embodiment, the configuration information includes the name of each API, the full number of parameters that can be passed in to each API, and the data meaning (mobile phone number) of each parameter used to call data (such as 13800001111). Further, after receiving the request message, the service platform can determine the target API included in the request message, the parameters input for the target API, and the meaning of the data corresponding to these parameters according to the stored configuration information, and then generate a system log. It should be noted that in the embodiments of this specification, the meaning of data related to privacy is referred to as privacy category. Specifically, it may include the user's mobile phone number, company switchboard number, ID number, user name, and so on.
As mentioned above, in one embodiment, the above-mentioned private data may include the entire amount of data stored in the service platform. In this way, this step may include: obtaining multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform as the above-mentioned several system logs and several network traffic records.
In another embodiment, the risk assessment can be focused on certain privacy categories. Specifically, multiple privacy categories that need attention can be preset. Based on this, after obtaining multiple system logs and multiple network traffic records generated by the requester calling API, it is necessary to filter the multiple system logs and multiple network traffic records according to multiple preset privacy categories Processing to obtain the several system logs and several network traffic records.
In a specific embodiment, the above-mentioned filtering processing may include: using the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs. It can be seen from the above that each system log includes the API determined according to the corresponding request message, the parameters passed into the API by the request, and the meaning of the callable data corresponding to the parameters. In this way, multiple privacy categories can be used to match the meaning of the data corresponding to the parameters in the multiple system logs, so that the meaning of the data can be matched to the system logs that include any of the multiple privacy categories, and they are included in the above-mentioned several system logs.
In another specific embodiment, the aforementioned filtering processing may further include: filtering out the plurality of network traffic records from the plurality of network traffic records by using filtering items set based on the plurality of privacy categories in advance, The form of the filtering item includes at least one of the following: a custom UDF function, a key field, and a regular item. It needs to be understood that the network traffic record includes the request message and the corresponding response message. The data meaning of the fields included in the request message and the response message is often ambiguous, which is different from the system log, which includes the determination from the request message based on the API configuration information. The meaning of the information presented. Therefore, it is difficult to achieve filtering by using multiple privacy categories to directly match.
The above filtering items can be preset based on multiple privacy categories. In one example, they can include regular items set for mobile phone numbers to match field values with the following characteristics: the first digit is 1, and the first three digits belong to the existing There is a network number (such as China Mobile network number 138, 139, etc.) to classify the network traffic records containing the value of this field into the above-mentioned several network traffic records. In one example, it can include a User-Defined Function (UDF) set for the ID card number, which is used to match the field value that meets the ID card number encoding rules, so as to convert the network containing the field value The traffic records are grouped into the several network traffic records mentioned above. In another example, it can include a key field set for the user's name. For example, an API parameter used to retrieve the user's name (such as User_name) can be set as a key field, so that the network that includes the key field can be set The traffic records are grouped into the several network traffic records mentioned above.
In the above step S210, a number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object can be obtained.
Next, in step S220, analysis processing is performed on the plurality of network traffic records to obtain analysis data, which includes at least a plurality of second privacy categories corresponding to the API output data.
In one embodiment, this step may include: first analysing the plurality of network traffic records to obtain the API output data, and the API output data includes a plurality of fields. It can be understood that the above-mentioned API output data is obtained by analyzing the response message in the network traffic record. Then, several third privacy categories corresponding to several privacy fields in the multiple fields are determined. Specifically, it can be implemented by means of machine learning, regular matching, etc. In a specific embodiment, a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields may be determined based on a pre-trained natural language processing model. In an example, the natural language processing model may include Transformer, Bert, etc. models. In an example, it can be determined that several privacy fields include Li Qingshen, Sihai Co., Ltd., Beijing Qingnian Road Zhenzhong Building, etc., and the corresponding third privacy categories include: user name, company name, address, etc. In another specific embodiment, a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields may be determined based on a plurality of preset regular matching rules. In an example, it can be determined that the field named "phone" is a privacy field, and the corresponding third privacy category is a mobile phone number. In another example, it can be determined that a field including "@" and in the field value is a privacy field, and the corresponding third privacy category is an email address. In this way, several third privacy categories can be determined.
Further, in a specific implementation, the above-mentioned several third privacy categories can be used as several second privacy categories. In another specific embodiment, verification processing is performed on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and the third privacy categories that have passed the verification are classified into the plurality of second privacy categories. In an example, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories. Accordingly, the verification process may include: using pre-stored data corresponding to all the privacy categories. The multiple legal field values of the first category are matched to the first field, and if the matching is successful, it is determined that the first category has passed the verification. In a specific example, suppose that the first category is the user's name, and the first field is "欧茶". The above-mentioned multiple legal field values include the names of multiple users who have been authenticated by real names. Therefore, multiple users can be searched. Whether there is Oucha in the name, if it exists, the user's name is classified into several second privacy categories.
In another example, the above verification processing may further include: using a pre-trained classification model for the first category to classify the first field, and the classification result indicates that the first field belongs to the In the case of the first category, it is determined that the first category passes the verification. In a specific example, suppose the first category is the email address, and the first field is: remember to eat tomorrow, @小花, then the classification result indicates that the first field is not an e-mail address, and then assume the first column If the location is 58978@ali.cn, the classification result indicates that the first field is an email address, and the email address is classified into several second privacy categories. In this way, on the basis of determining a number of third privacy categories, further verification can be obtained to obtain a number of second privacy categories to ensure the accuracy of the determined second privacy categories, thereby enabling subsequent risk assessments for the leakage of private data. The result is more accurate.
Above, several second privacy categories corresponding to the API output data included in the response message can be obtained. On the other hand, optionally, the request message included in the network traffic record can also be parsed. It should be noted that the generation of the above system log is implemented at the application layer, and the generation of network traffic records is at the bottom. In engineering implementation, it is difficult to obtain the complete storage of the above service platform by analyzing the network traffic records. API configuration information for accurate analysis. Therefore, other analytical methods often need to be considered. In one embodiment, the analysis data further includes several second target APIs obtained by parsing the request message and second parameters input for the several second target APIs. The API and parameters parsed here are less accurate and relatively rough than the API names and parameters included in the system log.
In a specific embodiment, API parsing rules set based on a plurality of APIs in advance can be used to parse the plurality of second target APIs from the plurality of network traffic records, and the API parsing rules may be at least one of the following A formal definition: custom UDF functions, key fields and regular items. In another specific embodiment, a parameter parsing rule set based on a plurality of parameters in advance can be used to parse the plurality of second parameters from the plurality of network traffic records, and the parameter parsing rule may be determined by at least one of the following A formal definition: custom UDF functions, key fields and regular items. It should be noted that, for the custom UDF functions, key fields, and regular items involved in the above-mentioned API parsing rules and parameter parsing rules, please refer to the relevant descriptions of the filtering items in the foregoing embodiments, which will not be repeated here.
In the above, the analysis data can be obtained by analyzing a number of network traffic records. On the other hand, step S230 may be executed to obtain the permission data of the requester to call the API from the service platform.
Specifically, the aforementioned permission information includes the API set that the requester has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and the privacy category set corresponding to the parameter set. In an example, the API set may include the names of one or more APIs, such as http://yiteng.cn/data/?id=91, https://niuqi.cn/data/?id=8, etc. In an example, the parameters in the parameter set may include gender, phone, and add. In an example, the privacy categories in the privacy category set may include gender, phone number, and address.
In one embodiment, the above-mentioned service platform includes a user authorization system, a contract system, an API management system, and the like. It should be understood that the user authorization system can store part of the private data that individual users or enterprise users authorize to allow the service platform to provide externally. The contracting system can store the data range that the requester can request from the service platform that the requester negotiates with the service platform. The API management system includes information such as API interface documents that the service platform can provide to the requester to call. Based on this, relevant materials can be obtained separately from these systems, and then sorted into the above-mentioned permission materials.
In this way, the permission data for the requester to call the API can be obtained from the service platform.
Then, in step S240, a number of system logs are compared with the permission data to obtain a first comparison result, and the analysis data is compared with the permission data to obtain a second comparison result .
On the one hand, in an embodiment, obtaining the first comparison result described above may include: determining whether the plurality of first target APIs belong to the API set, and obtaining a first determination result, which is included in the first comparison result . It needs to be understood that for several first target APIs included in each system log in several system logs, it is necessary to determine whether they belong to the API set in the permission data. In a specific embodiment, it is assumed that the target API of several system logs includes http://user.cn/data/?id=00, and the above API set includes http://user.cn/data/?id=00 With http://company.cn/data/?id=66, through comparison, it can be determined that the target APIs in several system logs belong to the API set, and the number that does not belong to the API set is 0, so the first judgment can be made The result was determined to be zero.
In another embodiment, obtaining the first comparison result described above may further include: judging whether the first parameter belongs to the parameter set, and obtaining a second judgment result, which is included in the first comparison result. It needs to be understood that for the first parameter included in each system log in a number of system logs, it is necessary to determine whether it belongs to the parameter set in the permission data. In an example, it is assumed that the parameters in the above-mentioned several system logs include phone and IDnumber, and the above-mentioned parameter set includes phone. Through comparison, it can be determined that IDnumber does not belong to the parameter set, and thus the second judgment result can be determined as 1.
In another embodiment, it may further include: judging whether the several first privacy categories belong to the privacy category set, and obtaining a third judgment result, which is included in the first comparison result. It needs to be understood that for several first privacy categories included in each system log in several system logs, it is necessary to determine whether they belong to the privacy category set in the permission data. In an example, suppose that the third privacy category in the above several system logs includes mobile phone number and ID number, and the above privacy category set includes mobile phone number. Through comparison, it can be determined that the identity card number does not belong to the privacy category set. Determine the privacy category comparison result as 1.
From the above, the first judgment result, the second judgment result, and the third judgment result can be obtained as the first comparison result.
On the other hand, in an embodiment, obtaining the second comparison result above may include: determining whether the plurality of second privacy categories belong to the privacy category set, obtaining a fourth determination result, and categorizing it into the second comparison result. The result. In another embodiment, it may further include: judging whether the plurality of second target APIs belong to the API set, and obtaining a fifth judgment result, which is included in the second comparison result. In yet another embodiment, it may further include: judging whether the second parameter belongs to the parameter set, and obtaining a sixth judgment result, which is included in the second comparison result.
Above, the first comparison result and the second comparison result can be obtained. Then, in step S250, based on at least the first comparison result and the second comparison result, the privacy data leakage risk of the requester calling the API is evaluated.
In an embodiment, this step may include: inputting the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, indicating that the privacy data is leaked risk. In a more specific embodiment, the first risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, and neural networks. In a more specific embodiment, the first prediction result may be a risk classification level, such as high, medium, and low. In another more specific embodiment, the first prediction result may be a risk assessment score, such as 20 or 85. It should be noted that the use process of the first risk assessment model is similar to the training process, so the training process will not be repeated.
In another embodiment, this step may include: firstly, determining an indicator value of a monitoring indicator based on the plurality of system logs and a number of network traffic records, the monitoring indicator being preset for the requesting party's API call behavior; then , Comparing the pre-obtained historical index value of the requesting party with the index value to obtain a third comparison result; then, based on the first comparison result, the second comparison result, and the third comparison As a result, the privacy data leakage risk of the requester calling the API is evaluated.
In a specific embodiment, the above-mentioned monitoring indicators may include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the number of private data requested by the requesting party in a unit time The number of target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time. In an example, the unit time can be yearly, monthly, weekly, daily, hourly, every minute, and so on. In a specific example, the monitoring indicator may include the number of user IDs (which can be parsed from the input parameters of the request message) included in the daily call request of the requesting party.
In a specific embodiment, the aforementioned historical index value may be determined based on historical system logs and historical network traffic records generated by the requesting party invoking private data. In an example, the monitoring index may include the number of request messages sent by the requesting party per minute. Assuming that the historical index value for this number is 20, and the current determined index value is determined to be 100, it can be 4((100-20)/20) determines the comparison result for this number and belongs to the third comparison result mentioned above.
In a specific embodiment, it may be combined with a preset evaluation rule to determine whether privacy data leakage occurs based on the first comparison result, the second comparison result, and the third comparison result. In an example, the evaluation rule may include: if the privacy category that exceeds the permission range in the comparison result includes the user ID number, it is determined that the requesting party's API call sends the privacy data to be leaked. In another specific embodiment, the first comparison result, the second comparison result, and the third comparison result may be jointly input into a pre-trained second risk assessment model to obtain a second prediction result, indicating that all Describe the risk of privacy information leakage. In a more specific embodiment, the second risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, neural networks, etc. In a more specific embodiment, the second prediction result may be a risk classification level, such as extremely high, high, medium, low, extremely low, and so on. In another more specific embodiment, the second prediction result may be a risk assessment score, such as 15 or 90. It should be noted that the use process of the second risk assessment model is similar to the training process, so the training process will not be repeated. In this way, based on the above three comparison results, the risk of data leakage called by the requesting party can be assessed.
In summary, in the risk assessment method for the leakage of private data provided by the embodiments of this specification, the system logs and network traffic records generated by the requester calling the API, as well as the permission data of the requesting party to call the API, are used to control the network traffic. Analyze to obtain analytical data, compare the analytical data with the permission data, and compare the system log with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requesting party’s API call. Detect and discover the violation and abnormal calling behavior of the requesting party in time. Furthermore, the obtained system log and the analyzed network traffic record can be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.
According to another embodiment, this specification also discloses an evaluation device. Specifically, FIG. 3 shows a structural diagram of a risk assessment device for leakage of private data according to an embodiment. As shown in FIG. 3, the apparatus 300 may include:
The first obtaining unit 310 is configured to obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the service from the requesting direction The request message for calling the API sent by the platform is generated, and includes a number of first target APIs determined according to the request message, first parameters input for the number of first target APIs, and a number of first parameters corresponding to the first parameters. A privacy category; each network traffic record includes at least the response message returned by the service platform for the request message. The parsing unit 320 is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data. The second obtaining unit 330 is configured to obtain from the service platform the permission data of the requester to call the API, the permission data including the API set that the requester has the right to call, and the API set has the right to transmit The parameter set composed of the input parameters, and the privacy category set corresponding to the parameter set. The comparison unit 340 is configured to compare the plurality of system logs with the permission data to obtain a first comparison result, and to compare the analysis data with the permission data to obtain a second comparison result. Compare the results. The evaluation unit 350 is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result.
In one embodiment, the first obtaining unit 310 specifically includes: an obtaining subunit 311, configured to obtain multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform; and a filtering subunit 312. It is configured to filter the multiple system logs and multiple network traffic records based on multiple preset privacy categories to obtain the multiple system logs and multiple network traffic records.
In a specific embodiment, the filtering subunit 312 is specifically configured to: use the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs; Filtering out the plurality of network traffic records from the plurality of network traffic records based on the filtering items set in advance based on the plurality of privacy categories, and the form of the filtering items includes at least one of the following: custom UDF function , Key fields and regular items.
In an embodiment, the network traffic record further includes the request message, and the analysis data further includes a number of second target APIs obtained by parsing the request message and second target APIs inputted for the number of second target APIs. parameter.
In a specific embodiment, the parsing unit 320 is further configured to parse the plurality of second target APIs from the plurality of network traffic records by using API parsing rules set in advance based on a plurality of APIs, and the API The parsing rules are defined in at least one of the following forms: custom UDF functions, key fields, and regular items; using parameter parsing rules set based on multiple parameters in advance, to parse the number of the first from the number of network traffic records Two parameters, the parameter analysis rule is defined by at least one of the following forms: custom UDF function, key fields and regular items.
In one embodiment, the parsing unit 320 specifically includes: a parsing subunit 321, configured to perform parsing processing on the plurality of network traffic records to obtain the API output data, and the API output data includes multiple columns Determining subunit 322, configured to determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields; the parsing unit specifically further includes: a subunit 323 configured to group the plurality of first privacy categories The third privacy category is used as the plurality of second privacy categories; or the verification subunit 324 is configured to perform verification processing on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and verify the first privacy categories that pass the verification. The three privacy categories fall into the several second privacy categories.
In a specific embodiment, the determining subunit 322 is specifically configured to determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields based on a pre-trained natural language processing model; or, based on A plurality of preset regular matching rules determines a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields.
In another specific embodiment, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein the verification subunit 324 is specifically configured to: use pre-stored Corresponding to the multiple legal field values of the first category, match the first field, and in the case of a successful match, determine that the first category has passed the verification; or, use a pre-trained target The classification model of the first category classifies the first column, and in a case where the classification result indicates that the first column belongs to the first category, it is determined that the first category passes verification.
In one embodiment, the comparison unit 340 is specifically configured to: determine whether the plurality of first target APIs belong to the API set, obtain a first determination result, and classify it into the first comparison result; determine the Whether the first parameter belongs to the set of parameters, the second judgment result is obtained, and it is classified into the first comparison result; whether the plurality of first privacy categories belong to the set of privacy classifications is judged, and the third judgment result is obtained, which is classified into The first comparison result; determining whether the plurality of second privacy categories belong to the privacy category set, and a fourth determination result is obtained, which is included in the second comparison result.
In one embodiment, the comparison unit 340 is further configured to: determine whether the plurality of second privacy categories belong to the privacy category set, obtain a fourth judgment result, and classify it into the second comparison result; Whether the plurality of second target APIs belong to the API set, obtain the fifth judgment result, and fall into the second comparison result; judge whether the second parameter belongs to the parameter set, obtain the sixth judgment result, and fall into it The second comparison result.
In one embodiment, the evaluation unit 350 is specifically configured to: input the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, and instruct the Risk of leakage of privacy information.
In one embodiment, the evaluation unit 350 specifically includes: a processing sub-unit 351 configured to determine an indicator value of a monitoring indicator based on the number of system logs and a number of network traffic records, and the monitoring indicator is directed to the requester's API call The comparison sub-unit 352 is configured to compare the pre-acquired historical index value of the requesting party with the index value to obtain a third comparison result; the evaluation sub-unit 353 is configured to be based on The first comparison result, the second comparison result, and the third comparison result are used to evaluate the privacy data leakage risk of the requester calling the API.
In a specific embodiment, the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, corresponding to the private information requested by the requester in the unit time The number of target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time.
In another specific embodiment, the evaluation sub-unit 353 is specifically configured to: in combination with a preset evaluation rule, determine whether it occurs according to the first comparison result, the second comparison result, and the third comparison result. Privacy data leakage; or, input the first comparison result, the second comparison result, and the third comparison result into a pre-trained second risk assessment model to obtain a second prediction result, indicating that the privacy data is leaked risk.
In summary, in the risk assessment device for privacy data leakage provided by the embodiment of this specification, by acquiring the system log and network traffic record generated by the requester calling the API, and the permission data of the requesting party calling the API, the network traffic Analyze to obtain analytical data, compare the analytical data with the permission data, and compare the system log with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requesting party’s API call. Detect and discover the violation and abnormal calling behavior of the requesting party in time. Furthermore, the obtained system log and the analyzed network traffic record can be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.
According to another embodiment, there is also provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
According to another aspect of the embodiment, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 Methods.
Those skilled in the art should be aware that in one or more of the above examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. The scope of protection, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the scope of protection of the present invention.