TWI734466B - Risk assessment method and device for leakage of privacy data - Google Patents

Risk assessment method and device for leakage of privacy data Download PDF

Info

Publication number
TWI734466B
TWI734466B TW109115224A TW109115224A TWI734466B TW I734466 B TWI734466 B TW I734466B TW 109115224 A TW109115224 A TW 109115224A TW 109115224 A TW109115224 A TW 109115224A TW I734466 B TWI734466 B TW I734466B
Authority
TW
Taiwan
Prior art keywords
privacy
comparison result
data
api
network traffic
Prior art date
Application number
TW109115224A
Other languages
Chinese (zh)
Other versions
TW202121329A (en
Inventor
鄧圓
Original Assignee
大陸商支付寶(杭州)信息技術有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商支付寶(杭州)信息技術有限公司 filed Critical 大陸商支付寶(杭州)信息技術有限公司
Publication of TW202121329A publication Critical patent/TW202121329A/en
Application granted granted Critical
Publication of TWI734466B publication Critical patent/TWI734466B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6263Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本說明書實施例提供一種針對隱私資料洩漏的風險評估方法。該方法包括:首先,獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄,其中每條系統日誌基於請求方向服務平臺發出的調用API的請求消息而生成,每條網路流量記錄中至少包括服務平臺針對該請求消息返回的回應訊息;接著,對若干網路流量記錄進行解析處理,得到解析資料;然後,從服務平臺獲取請求方調用API的許可權資料;再接著,將若干系統日誌與許可權資料進行比對,得到第一比對結果,以及,將解析資料與許可權資料進行比對,得到第二比對結果;再然後,至少基於第一比對結果和第二比對結果,評估請求方調用API的隱私資料洩漏風險。The embodiment of this specification provides a risk assessment method for the leakage of private data. The method includes: firstly, obtaining a number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object stored in the service platform, wherein each system log is based on the request to the service platform to call the API request Each network traffic record includes at least the response message returned by the service platform for the request message; then, a number of network traffic records are parsed to obtain analytical data; then, the requester is obtained from the service platform and the API is called Next, compare a number of system logs with the permission data to get the first comparison result, and compare the analytical data with the permission data to get the second comparison result; then, Based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API.

Description

針對隱私資料洩漏的風險評估方法及裝置Risk assessment method and device for leakage of privacy data

本說明書一個或多個實施例關於資料資訊安全技術領域,尤其關於針對隱私資料洩漏的風險評估方法及裝置。One or more embodiments of this specification relate to the technical field of data information security, and in particular, to a risk assessment method and device for the leakage of private data.

API(Application Programming Interface,應用程式介面)具有調用方便,通用性強等優點,目前已逐漸成為互聯網網路服務的主要提供方式。因此,API調用也成為了防止資料洩漏的重點關注領域。 服務平臺儲存的資料中通常包括其所服務物件(如個人或企業等)的基本資訊資料,以及在使用服務過程中產生的服務資料等。在服務物件授權的情況下,服務平臺可以基於這些資料向資料需求方(如研究機構或商戶等)提供API調用服務。通常情況下,資料需求方(或稱請求方)通過API調用只能獲得其具有使用權限的資料。然而,因不同請求方(包括散佈在不同地域的請求方,如跨境商戶等)的軟硬體環境、IT架構和業務場景往往不同,且存在較大差異,導致API調用系統複雜,容易被不法分子利用,造成資料洩漏,這無疑給API調用的資料防護帶來極大的挑戰。尤其考慮到洩漏的資料中很可能包括使用者的個人資訊等隱私資料,對資料洩漏的防範就愈發迫切。 因此,需要一種合理、可靠的方案,可以針對API調用而發生資料洩漏,尤其是隱私資料洩漏的風險進行及時、準確地評估,以有效防止隱私資料的洩漏。 API (Application Programming Interface) has the advantages of convenient invocation and strong versatility, and has gradually become the main way of providing Internet network services. Therefore, API calls have also become a key focus area to prevent data leakage. The data stored by the service platform usually includes the basic information data of the objects it serves (such as individuals or enterprises, etc.), as well as the service data generated during the use of the service. In the case of authorization of service objects, the service platform can provide API call services to data requesters (such as research institutions or merchants, etc.) based on these data. Normally, the data requester (or called the requester) can only obtain the data for which it has permission to use it through API calls. However, the software and hardware environments, IT architectures, and business scenarios of different requesters (including requesters scattered in different regions, such as cross-border merchants, etc.) are often different, and there are large differences, resulting in a complex API call system and easy access. The use of illegal elements causes data leakage, which undoubtedly brings great challenges to the data protection of API calls. Especially considering that the leaked data is likely to include the user's personal information and other private data, the prevention of data leaks is becoming more urgent. Therefore, a reasonable and reliable solution is needed to conduct timely and accurate assessment of the risk of data leakage due to API calls, especially the risk of private data leakage, so as to effectively prevent the leakage of private data.

本說明書一個或多個實施例描述了一種針對隱私資料洩漏的風險評估方法及裝置,可以針對API調用而發生隱私資料洩漏的風險進行及時、準確地評估,以有效防止隱私資料的洩漏。 根據第一態樣,提供一種針對隱私資料洩漏的風險評估方法,該方法包括:獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於所述請求方向所述服務平臺發出的調用API的請求消息而生成,並包括,根據所述請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及所述第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括所述服務平臺針對該請求消息返回的回應訊息。對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。從所述服務平臺獲取所述請求方調用API的許可權資料,所述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。將所述若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。 在一個實施例中,其中獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄,包括:獲取所述請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄;基於預先設定的多個隱私類別,對所述多條系統日誌和多條網路流量記錄進行過濾處理,得到所述若干系統日誌和若干網路流量記錄。 在一個具體的實施例中,對所述多條系統日誌和多條網路流量記錄進行過濾處理,得到所述若干系統日誌和若干網路流量記錄,包括:利用所述多個隱私類別,對所述多條系統日誌進行匹配,將匹配成功的系統日誌作為所述若干系統日誌;利用預先基於所述多個隱私類別設定的過濾項,從所述多條網路流量記錄中篩選出所述若干網路流量記錄,所述過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。 在一個實施例中,其中對所述若干網路流量記錄進行解析處理,得到解析資料,包括:對所述若干網路流量記錄進行解析處理,得到所述API輸出資料,所述API輸出資料中包括多個欄位;確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別;將所述若干第三隱私類別作為所述若干第二隱私類別;或,基於所述若干隱私欄位的欄位值,對所述若干第三隱私類別進行驗證處理,並將通過驗證的第三隱私類別歸入所述若干第二隱私類別。 在一個具體的實施例中,其中確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別,包括:基於預先訓練的自然語言處理模型,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別;或,基於預先設定的多個正則匹配規則,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。 在一個具體的實施例中,所述若干隱私欄位中包括任意的第一欄位,對應所述若干第三隱私類別中的第一類別;其中基於所述若干隱私欄位的欄位內容,對所述若干第三類別進行驗證處理,包括:利用預先儲存的對應於所述第一類別的多個合法欄位值,對所述第一欄位進行匹配,並在匹配成功的情況下,判定所述第一類別通過驗證;或,利用預先訓練的針對所述第一類別的分類模型,對所述第一欄位進行分類,在分類結果指示所述第一欄位屬於所述第一類別的情況下,判定所述第一類別通過驗證。 在一個實施例中,其中至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險,包括:將所述第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示所述隱私資料洩漏風險。 在一個實施例中,其中至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險,包括:根據所述若干系統日誌和若干網路流量記錄,確定監控指標的指標值,所述監控指標針對請求方API調用行為而預先設定;將預先獲取的所述請求方的歷史指標值與所述指標值進行比對,得到第三比對結果;基於所述第一比對結果、第二比對結果和第三比對結果,評估所述請求方調用API的隱私資料洩漏風險。 在一個具體的實施例中,基於所述第一比對結果、第二比對結果和第三比對結果,評估所述請求方調用API的隱私資料洩漏風險,包括:結合預先設定的評估規則,根據所述第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏;或,將所述第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示所述隱私資料洩漏風險。 根據第二態樣,提供一種針對隱私資料洩漏的風險評估裝置,該裝置包括:第一獲取單元,配置為獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於所述請求方向所述服務平臺發出的調用API的請求消息而生成,並包括,根據所述請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及所述第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括所述服務平臺針對該請求消息返回的回應訊息。解析單元,配置為對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。第二獲取單元,配置為從所述服務平臺獲取所述請求方調用API的許可權資料,所述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。比對單元,配置為將所述若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。評估單元,配置為至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。 根據第三態樣,提供了一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行第一方面的方法。 根據第四態樣,提供了一種計算設備,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現第一態樣的方法。 綜上,在本說明書實施例提供的針對隱私資料洩漏的風險評估方法及裝置中,透過獲取請求方調用API產生的系統日誌和網路流量記錄,以及請求方調用API的許可權資料,對網路流量進行解析得到解析資料,再將解析資料與許可權資料進行比對,並將系統日誌與許可權資料進行比對,結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,以及時檢測、發現請求方的違規、異常調用行為。進一步地,還可以利用獲取的系統日誌和解析得到的網路流量記錄,確定針對請求方行為設定的監控指標的指標值,再將該指標值與歷史指標值進行比對,從而進一步提高風險評估結果的準確度和可用性。 One or more embodiments of this specification describe a risk assessment method and device for the leakage of private data, which can timely and accurately evaluate the risk of leakage of private data due to API calls, so as to effectively prevent the leakage of private data. According to the first aspect, a risk assessment method for the leakage of private data is provided. The method includes: obtaining a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein Each system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, and the first input for the number of first target APIs. Parameters, and a number of first privacy categories corresponding to the first parameters; each network traffic record includes at least a response message returned by the service platform in response to the request message. Analyzing the plurality of network traffic records to obtain analytical data, which includes at least a plurality of second privacy categories corresponding to the API output data. Obtain from the service platform the permission data of the requester to call the API, the permission data including the API set that the requester has the right to call, and the parameter set composed of the parameters that the API set has the right to pass in, And the privacy category set corresponding to the parameter set. The plurality of system logs are compared with the permission data to obtain a first comparison result, and the analysis data is compared with the permission data to obtain a second comparison result. Based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API. In one embodiment, obtaining a number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object stored in the service platform includes: obtaining the requesting party to call the API provided by the service platform. The multiple system logs and multiple network traffic records of, based on multiple preset privacy categories, the multiple system logs and multiple network traffic records are filtered to obtain the multiple system logs and multiple network traffic records Flow records. In a specific embodiment, filtering the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records includes: using the multiple privacy categories to The plurality of system logs are matched, and the successfully matched system logs are used as the plurality of system logs; the filter items set based on the plurality of privacy categories in advance are used to filter the plurality of network traffic records For several network traffic records, the form of the filtering item includes at least one of the following: a custom UDF function, a key field, and a regular item. In one embodiment, performing analysis processing on the plurality of network traffic records to obtain analysis data includes: performing analysis processing on the plurality of network traffic records to obtain the API output data, in which the API output data Including a plurality of fields; determining a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields; using the plurality of third privacy categories as the plurality of second privacy categories; or, based on the plurality of privacy The field value of the field performs verification processing on the plurality of third privacy categories, and classifies the third privacy categories that have passed the verification into the plurality of second privacy categories. In a specific embodiment, determining a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields includes: determining a number of privacy categories in the plurality of fields based on a pre-trained natural language processing model A number of third privacy categories corresponding to the fields; or, based on a plurality of preset regular matching rules, a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields are determined. In a specific embodiment, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein, based on the field content of the plurality of privacy fields, Performing verification processing on the plurality of third categories includes: using a plurality of pre-stored legal field values corresponding to the first category to match the first field, and if the matching is successful, Determine that the first category passes the verification; or, use a pre-trained classification model for the first category to classify the first field, and the classification result indicates that the first field belongs to the first In the case of the category, it is determined that the first category passes the verification. In one embodiment, evaluating the privacy data leakage risk of the requester calling API based on at least the first comparison result and the second comparison result includes: comparing the first comparison result with the second comparison result. The results are jointly input into the pre-trained first risk assessment model, and the first prediction result is obtained, indicating the risk of leakage of the privacy data. In an embodiment, the evaluation of the privacy data leakage risk of the requester calling API based on at least the first comparison result and the second comparison result includes: according to the several system logs and several network traffic records , Determine the index value of the monitoring index, the monitoring index is preset for the requesting party's API call behavior; comparing the pre-obtained historical index value of the requesting party with the index value to obtain a third comparison result; Based on the first comparison result, the second comparison result, and the third comparison result, the privacy data leakage risk of the requester calling the API is evaluated. In a specific embodiment, based on the first comparison result, the second comparison result, and the third comparison result, evaluating the privacy data leakage risk of the requester calling the API includes: combining with preset evaluation rules , According to the first comparison result, the second comparison result, and the third comparison result, determine whether the privacy information leakage occurs; or, compare the first comparison result, the second comparison result and the third comparison result The results are jointly input into the pre-trained second risk assessment model, and the second prediction result is obtained, indicating the risk of leakage of the private data. According to a second aspect, a risk assessment device for leakage of private data is provided. The device includes: a first acquiring unit configured to acquire a number of system logs and a number of system logs generated by the requesting party requesting to call the private data of the target object stored in the service platform Several network traffic records; among them, each system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, for a number of first A first parameter input by a target API, and a number of first privacy categories corresponding to the first parameter; each network traffic record includes at least a response message returned by the service platform in response to the request message. The parsing unit is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data. The second obtaining unit is configured to obtain from the service platform the permission data of the requester to call the API, the permission data including the API set that the requester has the right to call, and the API set has the right to pass in The parameter set formed by the parameters of, and the privacy category set corresponding to the parameter set. The comparison unit is configured to compare the plurality of system logs with the permission data to obtain a first comparison result, and to compare the analysis data with the permission data to obtain a second comparison The result. The evaluation unit is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result. According to a third aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect. According to a fourth aspect, there is provided a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor. In summary, in the risk assessment method and device for the leakage of private data provided by the embodiments of this specification, by obtaining the system log and network traffic record generated by the requester calling API, and the permission data of the requesting party calling the API, the Internet Analyze the road traffic to obtain the analytical data, then compare the analytical data with the permission data, and compare the system log with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requester calling API , Timely detect and discover violations and abnormal calling behaviors of the requesting party. Furthermore, the obtained system log and the analyzed network traffic record can be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.

下面結合附圖,對本說明書提供的方案進行描述。 如前所述,目前API調用過程中存在洩漏隱私資料的風險。在請求方屬於跨境請求方(如跨境商戶)的場景下,檢測隱私資料洩漏風險尤為緊迫。具體地,國內某些大型企業(如阿里巴巴)的業務範圍已擴展到境外,因此存在大量境外商戶,資料跨境調用已成常態。境外商戶應軟硬體環境及業務場景與國內存在差異,現有資料防護架構難免存在不足,從而造成使用者隱私資料洩漏。再者,不同境外商戶的IT架構通常不同,造成API調用系統複雜,梳理難度大,容易被不法分子利用,導致隱私資料(如國內使用者敏感性資料)洩漏。 此外,因為API數量大、API開發管理漏洞難以避免等原因,API實際輸出的資料內容與請求方實際請求獲取的資料或者請求方具有使用權限的資料可能存在差別。例如,對於某個請求方無權調用的API,因API許可權管理存在疏漏等原因,被該某個請求方非法調用,並輸出使用者的個人敏感資訊,造成使用者隱私權洩露。 又例如,某個請求方有權調用某個API,但是其與服務平臺的簽約資料中只包括該某個API可輸出的全量資料(如使用者性別、用戶位址和用戶手機號)中的部分資料內容(如使用者性別)。然而,該某個請求方在調用該某個API時,除向該某個API傳入對應於該部分資料內容的輸入參數以外,還傳入對應於全量資料中其他資料內容(如使用者位址)的輸入參數,因API許可權管理存在疏漏等原因,導致該某個API返回給該某個請求方的資料(如使用者性別和用戶位址)超出簽約的資料範圍(如使用者性別)。 再例如,請求方所調用的API介面,因一些舊的未更新的欄位設置(如業務人員將用戶手機號和身份證號拼接為一個欄位),導致API介面輸出資料的範圍(如使用者手機號和身份證號)與請求方的簽約資料範圍(如使用者手機號)不一致。 基於以上,發明人提出一種針對隱私資料洩漏的風險評估方法及裝置。在一個實施例中,圖1示出根據一個實施例的風險評估方法的實施場景示意圖,如圖1所示,請求方人員可以透過請求方用戶端向服務平臺發送API調用請求(或稱請求消息),相應地,服務平臺可以根據請求消息生成對應的系統日誌,並且向請求方用戶端返回API調用回應(或稱回應訊息)。可以理解,閘道可以對請求消息和回應訊息進行記錄,產生對應的網路流量記錄(或稱網路流量日誌)。 由此,風險評估裝置可以從閘道中獲取系統日誌和網路流量記錄,並對獲取的網路流量記錄進行解析,得到解析資料;另一方面,風險評估裝置還可以從服務平臺中獲取請求方調用API的許可權資料。進一步地,風險評估裝置可以將系統日誌與許可權資料進行比對,並將解析資料與許可權資料進行比對,再結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,從而及時檢測請求方的違規、異常調用行為。 下面結合具體的實施例,描述上述風險評估方法的實施步驟。 首先需要說明的是,本說明書實施例中的描述用於“第一”、“第二”、“第三”等類似用語,僅用於區分同類事物,不具有其他限定作用。 圖2示出根據一個實施例的針對隱私資料洩漏的風險評估方法的流程圖,所述方法的執行主體可以為任何具有計算、處理能力的裝置或設備或平臺或伺服器集群等,例如,所述執行主體可以為圖1中示出的風險評估裝置,又例如,所述執行主體還可以為上述服務平臺。 如圖2所示,所述方法可以包括以下步驟: 步驟S210,獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於所述請求方向所述服務平臺發出的調用API的請求消息而生成,並包括,根據所述請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及所述第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括所述服務平臺針對該請求消息返回的回應訊息。步驟S220,對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。步驟S230,從所述服務平臺獲取所述請求方調用API的許可權資料,所述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。步驟S240,將所述若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。步驟S250,至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。 以上步驟具體如下: 首先,在步驟S210,獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄。 在一個實施例中,其中請求方可以為個人或機構或企業等,其可以透過在服務平臺中註冊的帳號登錄服務平臺,並在使用服務平臺的過程中發起API調用請求。在一個例子中,上述請求方可以是跨境商戶,上述服務平臺可以是跨境商戶系統或跨境商戶開放平臺。可以理解,服務平臺中可以儲存對大量服務物件的基礎屬性資訊,以及大量服務物件在使用服務過程中產生的服務資料。比如說,服務物件在服務平臺中進行註冊時,會填寫一些註冊資訊,又或者,服務物件使用服務會產生訂單資料、評價資訊等。本說明書實施例中,將請求方請求調用的資料所針對的服務物件,稱為目標對象。在一個實施例中,上述隱私資料可以包括服務平臺中儲存的全量資料。 下面對系統日誌和網路流量的產生過程進行介紹。在一個實施例中,請求方可以向服務平臺發送調用API的請求消息,服務平臺在接收到請求消息後,基於該請求消息進行業務記錄,生成對應的系統日誌,並且,針對該請求消息生成回應訊息,並將回應訊息返回給請求方。可以理解,在實體層上,請求方和服務平臺之間的通信會經過閘道,具體地,請求方發送的請求消息會先上傳至閘道,再經由閘道發送給服務平臺,在此上行過程中網路可以對請求消息進行記錄,另外服務平臺返回給請求方的回應訊息也會先下發至閘道,再由閘道發送給請求方,在此下行過程中,閘道可以對回應訊息進行記錄,並且記錄的請求消息和對應的回應訊息可以組成一條網路流量記錄。 對於上述系統日誌的生成,首先需要說明的是,服務平臺中儲存其可以提供的API服務的配置資訊。在一個實施例中,配置資訊中包括可以每個API的名稱,可以向每個API傳入的全量參數,全量參數中每個參數所用於調用資料(如13800001111)的資料含義(手機號)。進一步地,服務平臺在接收到請求消息之後,可以根據其儲存的配置資訊,確定請求消息中包括的目標API,針對目標API輸入的參數,以及這些參數所對應的資料含義,進而生成系統日誌。需要說明,本說明書實施例中,將與隱私相關的資料含義,稱為隱私類別,具體地,可以包括用戶手機號、公司總機號、身份證號、用戶姓名等等。 如上所述,在一個實施例中,上述隱私資料可以包括服務平臺中儲存的全量資料。如此,在本步驟中可以包括:獲取請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄,作為上述若干系統日誌和若干網路流量記錄。 在另一個實施例中,可以將風險評估重點聚焦到某些隱私類別,具體地,可以預先設定需要關注的多個隱私類別。基於此,在獲取請求方調用API產生的多條系統日誌和多條網路流量記錄之後,需要根據預先設定的多個隱私類別,對所述多條系統日誌和多條網路流量記錄進行過濾處理,得到所述若干系統日誌和若干網路流量記錄。 在一個具體的實施例中,上述過濾處理可以包括:利用所述多個隱私類別,對所述多條系統日誌進行匹配,將匹配成功的系統日誌作為所述若干系統日誌。由上述可知,每條系統日誌中包括根據對應的請求消息確定出的API、請求傳入該API的參數,以及該參數對應的可調用資料的含義。由此可以利用多個隱私類別對多條系統日誌中的參數對應的資料含義進行匹配,如此可以匹配到資料含義中包括多個隱私類別中任一類別的系統日誌,歸入上述若干系統日誌。 在另一個具體的實施例中,上述過濾處理還可以包括:利用預先基於所述多個隱私類別設定的過濾項,從所述多條網路流量記錄中篩選出所述若干網路流量記錄,所述過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。需要理解,網路流量記錄中包括請求消息和對應的回應訊息,請求消息和回應訊息中所包括欄位的資料含義往往是不明確的,不同於系統日誌包括基於API配置資訊從請求消息中確定出的資料含義。因此,利用多個隱私類別直接進行匹配是難以實現過濾的。 上述過濾項可以基於多個隱私類別而預先設定,在一個例子中,可以包括針對手機號設定的正則項,用於匹配出具有以下特點的欄位值:首位為1,且前三位屬於已有網號(如中國移動網號138、139等),以將包含該欄位值的網路流量記錄歸入上述若干網路流量記錄。在一個例子中,可以包括針對身份證號設定的自訂函數(User-Defined Function,UDF),用於匹配出符合身份證號編碼規則的欄位值,以將包含該欄位值的網路流量記錄歸入上述若干網路流量記錄。在另一個例子中,可以包括針對用戶姓名設定的關鍵欄位,比如將用於調取用戶姓名的API參數(如User_name)設定為關鍵欄位,由此可以將包括該關鍵欄位的網路流量記錄歸入上述若干網路流量記錄。 以上在步驟S210,可以獲取請求方請求調用目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄。 接著,在步驟S220,對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。 在一個實施例中,本步驟可以包括:先對所述若干網路流量記錄進行解析處理,得到所述API輸出資料,所述API輸出資料中包括多個欄位。可以理解,是對網路流量記錄中的回應訊息進行解析得到上述API輸出資料。然後確定多個欄位中若干隱私欄位對應的若干第三隱私類別。具體地,可以通過機器學習、正則匹配等方式實現。在一個具體的實施例中,可以基於預先訓練的自然語言處理模型,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。在一個例子中,其中自然語言處理模型可以包括Transformer、Bert等模型。在一個例子中,可以確定若干隱私欄位包括李情深、似海有限公司、北京市青年路珍重大廈等,對應的若干第三隱私類別包括:用戶姓名、企業名稱、地址等。在另一個具體的實施例中,可以基於預先設定的多個正則匹配規則,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。在一個例子中,可以確定欄位名為“phone”欄位為隱私欄位,且其對應的第三隱私類別為手機號。在另一個例子中,可以確定欄位值中包括“@”和的欄位為隱私欄位,且其對應的第三隱私類別為郵箱位址。如此,可以確定出若干第三隱私類別。 進一步地,在一個具體的實施中,可以將上述若干第三隱私類別作為若干第二隱私類別。在另一個具體的實施例中,基於若干隱私欄位的欄位值,對所述若干第三隱私類別進行驗證處理,並將通過驗證的第三隱私類別歸入所述若干第二隱私類別。在一個例子中,所述若干隱私欄位中包括任意的第一欄位,對應所述若干第三隱私類別中的第一類別,相應地,上述驗證處理可以包括:利用預先儲存的對應於所述第一類別的多個合法欄位值,對所述第一欄位進行匹配,並在匹配成功的情況下,判定所述第一類別通過驗證。在一個具體的例子中,假定第一類別為用戶姓名,第一欄位為“歐茶”,上述多個合法欄位值包括已實名認證的多個用戶姓名,由此,可以查找多個用戶姓名中是否存在歐茶,如果存在則將用戶姓名歸入若干第二隱私類別。 在另一個例子中,上述驗證處理還可以包括:利用預先訓練的針對所述第一類別的分類模型,對所述第一欄位進行分類,在分類結果指示所述第一欄位屬於所述第一類別的情況下,判定所述第一類別通過驗證。在一個具體的例子中,假定第一類別為郵箱位址,且第一欄位為:明天記得來吃飯,@小花,則分類結果指示該第一欄位不是郵箱位址,再假定第一欄位為58978@ali.cn,則分類結果指示該第一欄位是郵箱位址,並將郵箱位址歸入若干第二隱私類別。如此,可以在確定出若干第三隱私類別的基礎上,進一步驗證得到若干第二隱私類別,以保證確定出的若干第二隱私類別的準確性,進而使得後續得到的針對隱私資料洩漏的風險評估結果更加準確。 以上,可以得到回應訊息中包括的API輸出資料所對應的若干第二隱私類別。另一方面,可選地,還可以對網路流量記錄中包括的請求消息進行解析。需要說明的是,上述系統日誌的生成是在應用層上實現的,網路流量記錄的產生是在底層,在工程實現上,對網路流量記錄進行解析,難以獲取上述服務平臺中儲存的完備的API配置資訊進行精準解析。因此,往往需要考慮其他解析方式。在一個實施例中,解析資料中還包括對請求消息進行解析得到的若干第二目標API和針對若干第二目標API輸入的第二參數。此處解析出的API和參數,相較系統日誌中包括的API名稱和參數而言,不那麼精準,相對粗略。 在一個具體的實施例中,可以利用預先基於多個API設定的API解析規則,從所述若干網路流量記錄中解析出所述若干第二目標API,所述API解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。在另一個具體的實施例中,可以利用預先基於多個參數設定的參數解析規則,從所述若干網路流量記錄中解析出所述若干第二參數,所述參數解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。需要說明的是,對上述API解析規則和參數解析規則中涉及的自訂UDF函數、關鍵欄位和正則項,可以參見前述實施例中對過濾項的相關描述,在此不作贅述。 以上,對若干網路流量記錄進行解析,可以得到解析資料。另一方面,可以執行步驟S230,從所述服務平臺獲取所述請求方調用API的許可權資料。 具體地,上述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。在一個例子中,其中API集合可以包括一個或多個API的名稱,如http://yiteng.cn/data/?id=91,https://niuqi.cn/data/?id=8等。在一個例子中,其中參數集合中的參數可以包括:gender、phone和add.。在一個例子中,其中隱私類別集合中的隱私類別可以包括性別、電話和位址。 在一個實施例中,上述服務平臺中包括使用者授權系統、簽約系統和API管理系統等。需要理解,其中使用者授權系統中可以儲存個人使用者或企業使用者授權允許服務平臺對外提供的部分隱私資料。其中簽約系統中可以儲存請求方與服務平臺協商約定的請求方可以從服務平臺請求獲取的資料範圍。API管理系統中包括服務平臺可以提供給請求方調用的API介面文檔等資訊。基於此,可以從這些系統中分別獲取相關資料,整理後再歸入上述許可權資料。 如此,可以從服務平臺中獲取請求方調用API的許可權資料。 然後,在步驟S240,將若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。 一方面,在一個實施例中,上述得到第一比對結果,可以包括:判斷所述若干第一目標API是否屬於所述API集合,得到第一判斷結果,歸入所述第一比對結果。需要理解,對於若干系統日誌中每條系統日誌中包括的若干第一目標API,均需要判斷其是否屬於許可權資料中的API集合。在一個具體的實施例中,假定若干系統日誌的目標API包括http://user.cn/data/?id=00,上述API集合中包括http://user.cn/data/?id=00和http://company.cn/data/?id=66,通過比對可以確定若干系統日誌中的目標API均屬於API集合,不屬於API集合的個數為0,由此可以將第一判斷結果確定為0。 在另一個實施例中,上述得到第一比對結果,還可以包括:判斷所述第一參數是否屬於所述參數集合,得到第二判斷結果,歸入所述第一比對結果。需要理解,對於若干系統日誌中每條系統日誌中包括的第一參數,均需要判斷其是否屬於許可權資料中的參數集合。在一個例子中,假定上述若干系統日誌中的參數包括phone和IDnumber,上述參數集合中包括phone,通過比對可以確定IDnumber不屬於參數集合,由此可以將第二判斷結果確定為1。 在又一個實施例中,還可以包括:判斷所述若干第一隱私類別是否屬於所述隱私類別集合,得到第三判斷結果,歸入所述第一比對結果。需要理解,對於若干系統日誌中每條系統日誌中包括的若干第一隱私類別,均需要判斷其是否屬於許可權資料中的隱私類別集合。在一個例子中,假定上述若干系統日誌中的第三隱私類別包括手機號和身份證號,上述隱私類別集合中包括手機號,通過比對可以確定身份證號不屬於隱私類別集合,由此可以將隱私類別比對結果確定為1。 由上可以得到第一判斷結果、第二判斷結果和第三判斷結果,作為上述第一比對結果。 另一方面,在一個實施例中,上述得到第二比對結果,可以包括:判斷所述若干第二隱私類別是否屬於所述隱私類別集合,得到第四判斷結果,歸入所述第二比對結果。在另一個實施例中,還可以包括:判斷上述若干第二目標API是否屬於所述API集合,得到第五判斷結果,歸入所述第二比對結果。在又一個實施例中,還可以包括:判斷上述第二參數是否屬於所述參數集合,得到第六判斷結果,歸入所述第二比對結果。 以上,可以得到第一比對結果和第二比對結果。接著,在步驟S250,至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。 在一個實施例中,本步驟中可以包括:將所述第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示所述隱私資料洩漏風險。在一個更具體的實施例中,其中第一風險評估模型可以採用決策樹、隨機森林、adboost、神經網路等機器學習演算法。在一個更具體的實施例中,其中第一預測結果可以為風險分類等級,如高、中、低等。在另一個更具體的實施例中,其中第一預測結果可以為風險評估分數,如20或85等等。需要說明的是,對第一風險評估模型的使用過程和訓練過程類似,因此對訓練過程不作贅述。 在另一個實施例中,本步驟中可以包括:首先,根據所述若干系統日誌和若干網路流量記錄,確定監控指標的指標值,所述監控指標針對請求方API調用行為而預先設定;接著,將預先獲取的所述請求方的歷史指標值與所述指標值進行比對,得到第三比對結果;然後,基於所述第一比對結果、第二比對結果和第三比對結果,評估所述請求方調用API的隱私資料洩漏風險。 在一個具體的實施例中,上述監控指標可以包括以下中的一種或多種:單位時間內請求方向所述服務平臺發送的請求消息的條數,單位時間內請求方請求調用的隱私資料所對應的目標對象的個數,單位時間內請求方請求調用的隱私資料所對應的隱私類別的個數。在一個例子中,其中單位時間可以為每年、每月、每週、每天、每小時、每分鐘等等。在一個具體的例子中,監控指標可以包括請求方每天的調用請求中包括的用戶ID(可以從請求消息的入參中解析得到)的數量。 在一個具體的實施例中,上述歷史指標值可以是根據請求方的調用隱私資料產生的歷史系統日誌和歷史網路流量記錄而確定的。在一個例子中,監控指標中可以包括請求方每分鐘發出的請求消息的條數,假定針對該條數的歷史指標值為20條,而確定當前確定出的指標值為100條,由此可以將4((100-20)/20)確定針對該條數的比對結果,歸入上述第三比對結果。 在一個具體的實施例中,可以結合預先設定的評估規則,根據所述第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏。在一個例子中,其中評估規則可以包括:如果比對結果中超出許可權範圍的隱私類別包括用戶身份證號,則判定請求方的API調用發送隱私資料洩漏。在另一個具體的實施例中,可以將所述第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示所述隱私資料洩漏風險。在一個更具體的實施例中,其中第二風險評估模型可以採用決策樹、隨機森林、adboost、神經網路等機器學習演算法。在一個更具體的實施例中,其中第二預測結果可以為風險分類等級,如極高、較高、中、較低、極低等。在另一個更具體的實施例中,其中第二預測結果可以為風險評估分數,如15或90等等。需要說明的是,對第二風險評估模型的使用過程和訓練過程類似,因此對訓練過程不作贅述。如此,可以基於上述三個比對結果,評估請求方調用的資料洩漏風險。 綜上,在本說明書實施例提供的針對隱私資料洩漏的風險評估方法中,通過獲取請求方調用API產生的系統日誌和網路流量記錄,以及請求方調用API的許可權資料,對網路流量進行解析得到解析資料,再將解析資料與許可權資料進行比對,並將系統日誌與許可權資料進行比對,結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,以及時檢測、發現請求方的違規、異常調用行為。進一步地,還可以利用獲取的系統日誌和解析得到的網路流量記錄,確定針對請求方行為設定的監控指標的指標值,再將該指標值與歷史指標值進行比對,從而進一步提高風險評估結果的準確度和可用性。 根據另一方面的實施例,本說明書還披露一種評估裝置。具體地,圖3示出根據一個實施例的針對隱私資料洩漏的風險評估裝置結構圖。如圖3所示,所述裝置300可以包括: 第一獲取單元310,配置為獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於所述請求方向所述服務平臺發出的調用API的請求消息而生成,並包括,根據所述請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及所述第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括所述服務平臺針對該請求消息返回的回應訊息。解析單元320,配置為對所述若干網路流量記錄進行解析處理,得到解析資料,其中至少包括API輸出資料所對應的若干第二隱私類別。第二獲取單元330,配置為從所述服務平臺獲取所述請求方調用API的許可權資料,所述許可權資料包括所述請求方有權調用的API集合,針對所述API集合有權傳入的參數組成的參數集合,以及所述參數集合所對應的隱私類別集合。比對單元340,配置為將所述若干系統日誌與所述許可權資料進行比對,得到第一比對結果,以及,將所述解析資料與所述許可權資料進行比對,得到第二比對結果。評估單元350,配置為至少基於所述第一比對結果和第二比對結果,評估所述請求方調用API的隱私資料洩漏風險。 在一個實施例中,第一獲取單元310具體包括:獲取子單元311,配置為獲取所述請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄;過濾子單元312,配置為基於預先設定的多個隱私類別,對所述多條系統日誌和多條網路流量記錄進行過濾處理,得到所述若干系統日誌和若干網路流量記錄。 在一個具體的實施例中,所述過濾子單元312具體配置為:利用所述多個隱私類別,對所述多條系統日誌進行匹配,將匹配成功的系統日誌作為所述若干系統日誌;利用預先基於所述多個隱私類別設定的過濾項,從所述多條網路流量記錄中篩選出所述若干網路流量記錄,所述過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。 在一個實施例中,所述網路流量記錄還包括所述請求消息,所述解析資料還包括對所述請求消息進行解析得到的若干第二目標API和針對若干第二目標API輸入的第二參數。 在一個具體的實施例中,其中解析單元320還配置為:利用預先基於多個API設定的API解析規則,從所述若干網路流量記錄中解析出所述若干第二目標API,所述API解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項;利用預先基於多個參數設定的參數解析規則,從所述若干網路流量記錄中解析出所述若干第二參數,所述參數解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。 在一個實施例中,所述解析單元320具體包括:解析子單元321,配置為對所述若干網路流量記錄進行解析處理,得到所述API輸出資料,所述API輸出資料中包括多個欄位;確定子單元322,配置為確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別;所述解析單元具體還包括:歸入子單元323,配置為將所述若干第三隱私類別作為所述若干第二隱私類別;或驗證子單元324,配置為基於所述若干隱私欄位的欄位值,對所述若干第三隱私類別進行驗證處理,並將通過驗證的第三隱私類別歸入所述若干第二隱私類別。 在一個具體的實施例中,所述確定子單元322具體配置為:基於預先訓練的自然語言處理模型,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別;或,基於預先設定的多個正則匹配規則,確定所述多個欄位中若干隱私欄位對應的若干第三隱私類別。 在另一個具體的實施例中,所述若干隱私欄位中包括任意的第一欄位,對應所述若干第三隱私類別中的第一類別;其中驗證子單元324具體配置為:利用預先儲存的對應於所述第一類別的多個合法欄位值,對所述第一欄位進行匹配,並在匹配成功的情況下,判定所述第一類別通過驗證;或,利用預先訓練的針對所述第一類別的分類模型,對所述第一欄位進行分類,在分類結果指示所述第一欄位屬於所述第一類別的情況下,判定所述第一類別通過驗證。 在一個實施例中,所述比對單元340具體配置為:判斷所述若干第一目標API是否屬於所述API集合,得到第一判斷結果,歸入所述第一比對結果;判斷所述第一參數是否屬於所述參數集合,得到第二判斷結果,歸入所述第一比對結果;判斷所述若干第一隱私類別是否屬於所述隱私類別集合,得到第三判斷結果,歸入所述第一比對結果;判斷所述若干第二隱私類別是否屬於所述隱私類別集合,得到第四判斷結果,歸入所述第二比對結果。 在一個實施例中,所述比對單元340還配置為:判斷所述若干第二隱私類別是否屬於所述隱私類別集合,得到第四判斷結果,歸入所述第二比對結果;判斷所述若干第二目標API是否屬於所述API集合,得到第五判斷結果,歸入所述第二比對結果;判斷所述第二參數是否屬於所述參數集合,得到第六判斷結果,歸入所述第二比對結果。 在一個實施例中,所述評估單元350具體配置為:將所述第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示所述隱私資料洩漏風險。 在一個實施例中,所述評估單元350具體包括:處理子單元351,配置為根據所述若干系統日誌和若干網路流量記錄,確定監控指標的指標值,所述監控指標針對請求方API調用行為而預先設定;比對子單元352,配置為將預先獲取的所述請求方的歷史指標值與所述指標值進行比對,得到第三比對結果;評估子單元353,配置為基於所述第一比對結果、第二比對結果和第三比對結果,評估所述請求方調用API的隱私資料洩漏風險。 在一個具體的實施例中,所述監控指標中包括以下中的一種或多種:單位時間內請求方向所述服務平臺發送的請求消息的條數,單位時間內請求方請求調用的隱私資料所對應的目標對象的個數,單位時間內請求方請求調用的隱私資料所對應的隱私類別的個數。 在另一個具體的實施例中,所述評估子單元353具體配置為:結合預先設定的評估規則,根據所述第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏;或,將所述第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示所述隱私資料洩漏風險。 綜上,在本說明書實施例提供的針對隱私資料洩漏的風險評估裝置中,透過獲取請求方調用API產生的系統日誌和網路流量記錄,以及請求方調用API的許可權資料,對網路流量進行解析得到解析資料,再將解析資料與許可權資料進行比對,並將系統日誌與許可權資料進行比對,結合兩個比對結果,評估請求方調用API造成隱私資料洩漏的風險,以及時檢測、發現請求方的違規、異常調用行為。進一步地,還可以利用獲取的系統日誌和解析得到的網路流量記錄,確定針對請求方行為設定的監控指標的指標值,再將該指標值與歷史指標值進行比對,從而進一步提高風險評估結果的準確度和可用性。 根據另一方面的實施例,還提供一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行結合圖2所描述的方法。 根據再一方面的實施例,還提供一種計算設備,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現結合圖2所述的方法。 本領域技術人員應該可以意識到,在上述一個或多個示例中,本發明所描述的功能可以用硬體、軟體、韌體或它們的任意組合來實現。當使用軟體實現時,可以將這些功能儲存在電腦可讀媒體中或者作為電腦可讀媒體上的一個或多個指令或代碼進行傳輸。 以上所述的具體實施方式,對本發明的目的、技術方案和有益效果進行了進一步詳細說明,所應理解的是,以上所述僅為本發明的具體實施方式而已,並不用於限定本發明的保護範圍,凡在本發明的技術方案的基礎之上,所做的任何修改、等同替換、改進等,均應包括在本發明的保護範圍之內。 The following describes the solutions provided in this specification with reference to the accompanying drawings. As mentioned earlier, there is a risk of leaking private information during the current API call process. In a scenario where the requesting party is a cross-border requesting party (such as a cross-border merchant), it is particularly urgent to detect the risk of privacy data leakage. Specifically, some large domestic companies (such as Alibaba) have expanded their business scope overseas, so there are a large number of overseas merchants, and cross-border transfer of information has become the norm. The software and hardware environment and business scenarios of overseas merchants are different from those in China, and the existing data protection structure is inevitably insufficient, resulting in the leakage of user privacy data. Furthermore, the IT architectures of different overseas merchants are usually different, making the API call system complex, difficult to sort out, and easy to be used by criminals, resulting in the leakage of private data (such as sensitive data of domestic users). In addition, due to the large number of APIs and the difficulty of avoiding API development and management vulnerabilities, there may be differences between the content of the data actually output by the API and the data actually requested by the requester or the data for which the requester has usage rights. For example, an API that a certain requester does not have the right to call is illegally called by the certain requester due to omissions in API permission management, and the user's personal sensitive information is output, causing the user's privacy to be leaked. For another example, a requesting party has the right to call a certain API, but its contract data with the service platform only includes the full amount of data that can be output by the certain API (such as user gender, user address, and user mobile phone number). Part of the data content (such as user gender). However, when the certain requester calls the certain API, in addition to passing the input parameters corresponding to the part of the data content to the certain API, it also passes in other data content corresponding to the full amount of data (such as user location). Address), due to omissions in API permission management, etc., the data returned by the API to the requester (such as user gender and user address) exceeds the contracted data range (such as user gender) ). For another example, the API interface called by the requesting party, due to some old unupdated field settings (such as the business personnel splicing the user’s mobile phone number and ID number into one field), resulting in the scope of the API interface output data (such as using The user’s mobile phone number and ID number) are inconsistent with the requesting party’s contract data range (such as the user’s mobile phone number). Based on the above, the inventor proposes a risk assessment method and device for the leakage of private data. In one embodiment, FIG. 1 shows a schematic diagram of an implementation scenario of a risk assessment method according to an embodiment. As shown in FIG. 1, the requester personnel can send an API call request (or request message) to the service platform through the requester client. ), correspondingly, the service platform can generate the corresponding system log according to the request message, and return an API call response (or response message) to the requesting client. It can be understood that the gateway can record request messages and response messages to generate corresponding network traffic records (or network traffic logs). As a result, the risk assessment device can obtain system logs and network traffic records from the gateway, and analyze the obtained network traffic records to obtain analytical data; on the other hand, the risk assessment device can also obtain the requestor from the service platform Permission data for calling API. Further, the risk assessment device can compare the system log with the permission data, compare the analytical data with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requesting party’s API call. In this way, the violation and abnormal calling behavior of the requesting party can be detected in time. The following describes the implementation steps of the above risk assessment method in conjunction with specific embodiments. First of all, it should be noted that the descriptions in the embodiments of this specification are used for "first", "second", "third" and other similar terms, and are only used to distinguish similar things and do not have other limiting effects. Figure 2 shows a flowchart of a method for risk assessment of privacy data leakage according to an embodiment. The execution subject of the method can be any device or device or platform or server cluster with computing and processing capabilities, for example, The execution subject may be the risk assessment device shown in FIG. 1. For another example, the execution subject may also be the above-mentioned service platform. As shown in Figure 2, the method may include the following steps: Step S210: Obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the call API issued by the request to the service platform The request message is generated, and includes, a number of first target APIs determined according to the request message, a first parameter input for the number of first target APIs, and a number of first privacy categories corresponding to the first parameters; each Each network traffic record includes at least the response message returned by the service platform for the request message. Step S220: Analyze the plurality of network traffic records to obtain analytical data, which includes at least a plurality of second privacy categories corresponding to the API output data. Step S230: Obtain the permission data for the requester to call the API from the service platform, the permission data includes the API set that the requester has the right to call, and is composed of the parameters that the API set has the right to pass in. The parameter set, and the privacy category set corresponding to the parameter set. Step S240, comparing the plurality of system logs with the permission data to obtain a first comparison result, and comparing the analysis data with the permission data to obtain a second comparison result. Step S250, based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API. The above steps are as follows: First, in step S210, obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object stored in the service platform. In one embodiment, the requesting party may be an individual, an organization, or an enterprise, etc., which may log in to the service platform through an account registered in the service platform, and initiate an API call request during the process of using the service platform. In an example, the requestor may be a cross-border merchant, and the service platform may be a cross-border merchant system or a cross-border merchant open platform. It can be understood that the service platform can store the basic attribute information of a large number of service objects, as well as the service data generated by a large number of service objects in the process of using the service. For example, when a service object is registered in the service platform, some registration information will be filled in, or the use of the service by the service object will generate order data, evaluation information, etc. In the embodiment of this specification, the service object targeted by the data requested by the requester is referred to as the target object. In one embodiment, the above-mentioned private data may include the entire amount of data stored in the service platform. The process of generating system logs and network traffic is introduced below. In one embodiment, the requester may send a request message for calling the API to the service platform. After receiving the request message, the service platform records the business based on the request message, generates a corresponding system log, and generates a response to the request message Message and return the response message to the requester. It can be understood that at the physical layer, the communication between the requester and the service platform will pass through the gateway. Specifically, the request message sent by the requester will be uploaded to the gateway first, and then sent to the service platform through the gateway, where it goes up. In the process, the network can record the request message. In addition, the response message returned by the service platform to the requester will be sent to the gateway first, and then sent to the requester by the gateway. During this downstream process, the gateway can respond The message is recorded, and the recorded request message and the corresponding response message can form a network traffic record. For the generation of the above system log, the first thing that needs to be explained is that the service platform stores the configuration information of the API service it can provide. In one embodiment, the configuration information includes the name of each API, the full number of parameters that can be passed in to each API, and the data meaning (mobile phone number) of each parameter used to call data (such as 13800001111). Further, after receiving the request message, the service platform can determine the target API included in the request message, the parameters input for the target API, and the meaning of the data corresponding to these parameters according to the stored configuration information, and then generate a system log. It should be noted that in the embodiments of this specification, the meaning of data related to privacy is referred to as privacy category. Specifically, it may include the user's mobile phone number, company switchboard number, ID number, user name, and so on. As mentioned above, in one embodiment, the above-mentioned private data may include the entire amount of data stored in the service platform. In this way, this step may include: obtaining multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform as the above-mentioned several system logs and several network traffic records. In another embodiment, the risk assessment can be focused on certain privacy categories. Specifically, multiple privacy categories that need attention can be preset. Based on this, after obtaining multiple system logs and multiple network traffic records generated by the requester calling API, it is necessary to filter the multiple system logs and multiple network traffic records according to multiple preset privacy categories Processing to obtain the several system logs and several network traffic records. In a specific embodiment, the above-mentioned filtering processing may include: using the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs. It can be seen from the above that each system log includes the API determined according to the corresponding request message, the parameters passed into the API by the request, and the meaning of the callable data corresponding to the parameters. In this way, multiple privacy categories can be used to match the meaning of the data corresponding to the parameters in the multiple system logs, so that the meaning of the data can be matched to the system logs that include any of the multiple privacy categories, and they are included in the above-mentioned several system logs. In another specific embodiment, the aforementioned filtering processing may further include: filtering out the plurality of network traffic records from the plurality of network traffic records by using filtering items set based on the plurality of privacy categories in advance, The form of the filtering item includes at least one of the following: a custom UDF function, a key field, and a regular item. It needs to be understood that the network traffic record includes the request message and the corresponding response message. The data meaning of the fields included in the request message and the response message is often ambiguous, which is different from the system log, which includes the determination from the request message based on the API configuration information. The meaning of the information presented. Therefore, it is difficult to achieve filtering by using multiple privacy categories to directly match. The above filtering items can be preset based on multiple privacy categories. In one example, they can include regular items set for mobile phone numbers to match field values with the following characteristics: the first digit is 1, and the first three digits belong to the existing There is a network number (such as China Mobile network number 138, 139, etc.) to classify the network traffic records containing the value of this field into the above-mentioned several network traffic records. In one example, it can include a User-Defined Function (UDF) set for the ID card number, which is used to match the field value that meets the ID card number encoding rules, so as to convert the network containing the field value The traffic records are grouped into the several network traffic records mentioned above. In another example, it can include a key field set for the user's name. For example, an API parameter used to retrieve the user's name (such as User_name) can be set as a key field, so that the network that includes the key field can be set The traffic records are grouped into the several network traffic records mentioned above. In the above step S210, a number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object can be obtained. Next, in step S220, analysis processing is performed on the plurality of network traffic records to obtain analysis data, which includes at least a plurality of second privacy categories corresponding to the API output data. In one embodiment, this step may include: first analysing the plurality of network traffic records to obtain the API output data, and the API output data includes a plurality of fields. It can be understood that the above-mentioned API output data is obtained by analyzing the response message in the network traffic record. Then, several third privacy categories corresponding to several privacy fields in the multiple fields are determined. Specifically, it can be implemented by means of machine learning, regular matching, etc. In a specific embodiment, a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields may be determined based on a pre-trained natural language processing model. In an example, the natural language processing model may include Transformer, Bert, etc. models. In an example, it can be determined that several privacy fields include Li Qingshen, Sihai Co., Ltd., Beijing Qingnian Road Zhenzhong Building, etc., and the corresponding third privacy categories include: user name, company name, address, etc. In another specific embodiment, a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields may be determined based on a plurality of preset regular matching rules. In an example, it can be determined that the field named "phone" is a privacy field, and the corresponding third privacy category is a mobile phone number. In another example, it can be determined that a field including "@" and in the field value is a privacy field, and the corresponding third privacy category is an email address. In this way, several third privacy categories can be determined. Further, in a specific implementation, the above-mentioned several third privacy categories can be used as several second privacy categories. In another specific embodiment, verification processing is performed on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and the third privacy categories that have passed the verification are classified into the plurality of second privacy categories. In an example, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories. Accordingly, the verification process may include: using pre-stored data corresponding to all the privacy categories. The multiple legal field values of the first category are matched to the first field, and if the matching is successful, it is determined that the first category has passed the verification. In a specific example, suppose that the first category is the user's name, and the first field is "欧茶". The above-mentioned multiple legal field values include the names of multiple users who have been authenticated by real names. Therefore, multiple users can be searched. Whether there is Oucha in the name, if it exists, the user's name is classified into several second privacy categories. In another example, the above verification processing may further include: using a pre-trained classification model for the first category to classify the first field, and the classification result indicates that the first field belongs to the In the case of the first category, it is determined that the first category passes the verification. In a specific example, suppose the first category is the email address, and the first field is: remember to eat tomorrow, @小花, then the classification result indicates that the first field is not an e-mail address, and then assume the first column If the location is 58978@ali.cn, the classification result indicates that the first field is an email address, and the email address is classified into several second privacy categories. In this way, on the basis of determining a number of third privacy categories, further verification can be obtained to obtain a number of second privacy categories to ensure the accuracy of the determined second privacy categories, thereby enabling subsequent risk assessments for the leakage of private data. The result is more accurate. Above, several second privacy categories corresponding to the API output data included in the response message can be obtained. On the other hand, optionally, the request message included in the network traffic record can also be parsed. It should be noted that the generation of the above system log is implemented at the application layer, and the generation of network traffic records is at the bottom. In engineering implementation, it is difficult to obtain the complete storage of the above service platform by analyzing the network traffic records. API configuration information for accurate analysis. Therefore, other analytical methods often need to be considered. In one embodiment, the analysis data further includes several second target APIs obtained by parsing the request message and second parameters input for the several second target APIs. The API and parameters parsed here are less accurate and relatively rough than the API names and parameters included in the system log. In a specific embodiment, API parsing rules set based on a plurality of APIs in advance can be used to parse the plurality of second target APIs from the plurality of network traffic records, and the API parsing rules may be at least one of the following A formal definition: custom UDF functions, key fields and regular items. In another specific embodiment, a parameter parsing rule set based on a plurality of parameters in advance can be used to parse the plurality of second parameters from the plurality of network traffic records, and the parameter parsing rule may be determined by at least one of the following A formal definition: custom UDF functions, key fields and regular items. It should be noted that, for the custom UDF functions, key fields, and regular items involved in the above-mentioned API parsing rules and parameter parsing rules, please refer to the relevant descriptions of the filtering items in the foregoing embodiments, which will not be repeated here. In the above, the analysis data can be obtained by analyzing a number of network traffic records. On the other hand, step S230 may be executed to obtain the permission data of the requester to call the API from the service platform. Specifically, the aforementioned permission information includes the API set that the requester has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and the privacy category set corresponding to the parameter set. In an example, the API set may include the names of one or more APIs, such as http://yiteng.cn/data/?id=91, https://niuqi.cn/data/?id=8, etc. In an example, the parameters in the parameter set may include gender, phone, and add. In an example, the privacy categories in the privacy category set may include gender, phone number, and address. In one embodiment, the above-mentioned service platform includes a user authorization system, a contract system, an API management system, and the like. It should be understood that the user authorization system can store part of the private data that individual users or enterprise users authorize to allow the service platform to provide externally. The contracting system can store the data range that the requester can request from the service platform that the requester negotiates with the service platform. The API management system includes information such as API interface documents that the service platform can provide to the requester to call. Based on this, relevant materials can be obtained separately from these systems, and then sorted into the above-mentioned permission materials. In this way, the permission data for the requester to call the API can be obtained from the service platform. Then, in step S240, a number of system logs are compared with the permission data to obtain a first comparison result, and the analysis data is compared with the permission data to obtain a second comparison result . On the one hand, in an embodiment, obtaining the first comparison result described above may include: determining whether the plurality of first target APIs belong to the API set, and obtaining a first determination result, which is included in the first comparison result . It needs to be understood that for several first target APIs included in each system log in several system logs, it is necessary to determine whether they belong to the API set in the permission data. In a specific embodiment, it is assumed that the target API of several system logs includes http://user.cn/data/?id=00, and the above API set includes http://user.cn/data/?id=00 With http://company.cn/data/?id=66, through comparison, it can be determined that the target APIs in several system logs belong to the API set, and the number that does not belong to the API set is 0, so the first judgment can be made The result was determined to be zero. In another embodiment, obtaining the first comparison result described above may further include: judging whether the first parameter belongs to the parameter set, and obtaining a second judgment result, which is included in the first comparison result. It needs to be understood that for the first parameter included in each system log in a number of system logs, it is necessary to determine whether it belongs to the parameter set in the permission data. In an example, it is assumed that the parameters in the above-mentioned several system logs include phone and IDnumber, and the above-mentioned parameter set includes phone. Through comparison, it can be determined that IDnumber does not belong to the parameter set, and thus the second judgment result can be determined as 1. In another embodiment, it may further include: judging whether the several first privacy categories belong to the privacy category set, and obtaining a third judgment result, which is included in the first comparison result. It needs to be understood that for several first privacy categories included in each system log in several system logs, it is necessary to determine whether they belong to the privacy category set in the permission data. In an example, suppose that the third privacy category in the above several system logs includes mobile phone number and ID number, and the above privacy category set includes mobile phone number. Through comparison, it can be determined that the identity card number does not belong to the privacy category set. Determine the privacy category comparison result as 1. From the above, the first judgment result, the second judgment result, and the third judgment result can be obtained as the first comparison result. On the other hand, in an embodiment, obtaining the second comparison result above may include: determining whether the plurality of second privacy categories belong to the privacy category set, obtaining a fourth determination result, and categorizing it into the second comparison result. The result. In another embodiment, it may further include: judging whether the plurality of second target APIs belong to the API set, and obtaining a fifth judgment result, which is included in the second comparison result. In yet another embodiment, it may further include: judging whether the second parameter belongs to the parameter set, and obtaining a sixth judgment result, which is included in the second comparison result. Above, the first comparison result and the second comparison result can be obtained. Then, in step S250, based on at least the first comparison result and the second comparison result, the privacy data leakage risk of the requester calling the API is evaluated. In an embodiment, this step may include: inputting the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, indicating that the privacy data is leaked risk. In a more specific embodiment, the first risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, and neural networks. In a more specific embodiment, the first prediction result may be a risk classification level, such as high, medium, and low. In another more specific embodiment, the first prediction result may be a risk assessment score, such as 20 or 85. It should be noted that the use process of the first risk assessment model is similar to the training process, so the training process will not be repeated. In another embodiment, this step may include: firstly, determining an indicator value of a monitoring indicator based on the plurality of system logs and a number of network traffic records, the monitoring indicator being preset for the requesting party's API call behavior; then , Comparing the pre-obtained historical index value of the requesting party with the index value to obtain a third comparison result; then, based on the first comparison result, the second comparison result, and the third comparison As a result, the privacy data leakage risk of the requester calling the API is evaluated. In a specific embodiment, the above-mentioned monitoring indicators may include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the number of private data requested by the requesting party in a unit time The number of target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time. In an example, the unit time can be yearly, monthly, weekly, daily, hourly, every minute, and so on. In a specific example, the monitoring indicator may include the number of user IDs (which can be parsed from the input parameters of the request message) included in the daily call request of the requesting party. In a specific embodiment, the aforementioned historical index value may be determined based on historical system logs and historical network traffic records generated by the requesting party invoking private data. In an example, the monitoring index may include the number of request messages sent by the requesting party per minute. Assuming that the historical index value for this number is 20, and the current determined index value is determined to be 100, it can be 4((100-20)/20) determines the comparison result for this number and belongs to the third comparison result mentioned above. In a specific embodiment, it may be combined with a preset evaluation rule to determine whether privacy data leakage occurs based on the first comparison result, the second comparison result, and the third comparison result. In an example, the evaluation rule may include: if the privacy category that exceeds the permission range in the comparison result includes the user ID number, it is determined that the requesting party's API call sends the privacy data to be leaked. In another specific embodiment, the first comparison result, the second comparison result, and the third comparison result may be jointly input into a pre-trained second risk assessment model to obtain a second prediction result, indicating that all Describe the risk of privacy information leakage. In a more specific embodiment, the second risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, neural networks, etc. In a more specific embodiment, the second prediction result may be a risk classification level, such as extremely high, high, medium, low, extremely low, and so on. In another more specific embodiment, the second prediction result may be a risk assessment score, such as 15 or 90. It should be noted that the use process of the second risk assessment model is similar to the training process, so the training process will not be repeated. In this way, based on the above three comparison results, the risk of data leakage called by the requesting party can be assessed. In summary, in the risk assessment method for the leakage of private data provided by the embodiments of this specification, the system logs and network traffic records generated by the requester calling the API, as well as the permission data of the requesting party to call the API, are used to control the network traffic. Analyze to obtain analytical data, compare the analytical data with the permission data, and compare the system log with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requesting party’s API call. Detect and discover the violation and abnormal calling behavior of the requesting party in time. Furthermore, the obtained system log and the analyzed network traffic record can be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results. According to another embodiment, this specification also discloses an evaluation device. Specifically, FIG. 3 shows a structural diagram of a risk assessment device for leakage of private data according to an embodiment. As shown in FIG. 3, the apparatus 300 may include: The first obtaining unit 310 is configured to obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the service from the requesting direction The request message for calling the API sent by the platform is generated, and includes a number of first target APIs determined according to the request message, first parameters input for the number of first target APIs, and a number of first parameters corresponding to the first parameters. A privacy category; each network traffic record includes at least the response message returned by the service platform for the request message. The parsing unit 320 is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data. The second obtaining unit 330 is configured to obtain from the service platform the permission data of the requester to call the API, the permission data including the API set that the requester has the right to call, and the API set has the right to transmit The parameter set composed of the input parameters, and the privacy category set corresponding to the parameter set. The comparison unit 340 is configured to compare the plurality of system logs with the permission data to obtain a first comparison result, and to compare the analysis data with the permission data to obtain a second comparison result. Compare the results. The evaluation unit 350 is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result. In one embodiment, the first obtaining unit 310 specifically includes: an obtaining subunit 311, configured to obtain multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform; and a filtering subunit 312. It is configured to filter the multiple system logs and multiple network traffic records based on multiple preset privacy categories to obtain the multiple system logs and multiple network traffic records. In a specific embodiment, the filtering subunit 312 is specifically configured to: use the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs; Filtering out the plurality of network traffic records from the plurality of network traffic records based on the filtering items set in advance based on the plurality of privacy categories, and the form of the filtering items includes at least one of the following: custom UDF function , Key fields and regular items. In an embodiment, the network traffic record further includes the request message, and the analysis data further includes a number of second target APIs obtained by parsing the request message and second target APIs inputted for the number of second target APIs. parameter. In a specific embodiment, the parsing unit 320 is further configured to parse the plurality of second target APIs from the plurality of network traffic records by using API parsing rules set in advance based on a plurality of APIs, and the API The parsing rules are defined in at least one of the following forms: custom UDF functions, key fields, and regular items; using parameter parsing rules set based on multiple parameters in advance, to parse the number of the first from the number of network traffic records Two parameters, the parameter analysis rule is defined by at least one of the following forms: custom UDF function, key fields and regular items. In one embodiment, the parsing unit 320 specifically includes: a parsing subunit 321, configured to perform parsing processing on the plurality of network traffic records to obtain the API output data, and the API output data includes multiple columns Determining subunit 322, configured to determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields; the parsing unit specifically further includes: a subunit 323 configured to group the plurality of first privacy categories The third privacy category is used as the plurality of second privacy categories; or the verification subunit 324 is configured to perform verification processing on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and verify the first privacy categories that pass the verification. The three privacy categories fall into the several second privacy categories. In a specific embodiment, the determining subunit 322 is specifically configured to determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields based on a pre-trained natural language processing model; or, based on A plurality of preset regular matching rules determines a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields. In another specific embodiment, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein the verification subunit 324 is specifically configured to: use pre-stored Corresponding to the multiple legal field values of the first category, match the first field, and in the case of a successful match, determine that the first category has passed the verification; or, use a pre-trained target The classification model of the first category classifies the first column, and in a case where the classification result indicates that the first column belongs to the first category, it is determined that the first category passes verification. In one embodiment, the comparison unit 340 is specifically configured to: determine whether the plurality of first target APIs belong to the API set, obtain a first determination result, and classify it into the first comparison result; determine the Whether the first parameter belongs to the set of parameters, the second judgment result is obtained, and it is classified into the first comparison result; whether the plurality of first privacy categories belong to the set of privacy classifications is judged, and the third judgment result is obtained, which is classified into The first comparison result; determining whether the plurality of second privacy categories belong to the privacy category set, and a fourth determination result is obtained, which is included in the second comparison result. In one embodiment, the comparison unit 340 is further configured to: determine whether the plurality of second privacy categories belong to the privacy category set, obtain a fourth judgment result, and classify it into the second comparison result; Whether the plurality of second target APIs belong to the API set, obtain the fifth judgment result, and fall into the second comparison result; judge whether the second parameter belongs to the parameter set, obtain the sixth judgment result, and fall into it The second comparison result. In one embodiment, the evaluation unit 350 is specifically configured to: input the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, and instruct the Risk of leakage of privacy information. In one embodiment, the evaluation unit 350 specifically includes: a processing sub-unit 351 configured to determine an indicator value of a monitoring indicator based on the number of system logs and a number of network traffic records, and the monitoring indicator is directed to the requester's API call The comparison sub-unit 352 is configured to compare the pre-acquired historical index value of the requesting party with the index value to obtain a third comparison result; the evaluation sub-unit 353 is configured to be based on The first comparison result, the second comparison result, and the third comparison result are used to evaluate the privacy data leakage risk of the requester calling the API. In a specific embodiment, the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, corresponding to the private information requested by the requester in the unit time The number of target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time. In another specific embodiment, the evaluation sub-unit 353 is specifically configured to: in combination with a preset evaluation rule, determine whether it occurs according to the first comparison result, the second comparison result, and the third comparison result. Privacy data leakage; or, input the first comparison result, the second comparison result, and the third comparison result into a pre-trained second risk assessment model to obtain a second prediction result, indicating that the privacy data is leaked risk. In summary, in the risk assessment device for privacy data leakage provided by the embodiment of this specification, by acquiring the system log and network traffic record generated by the requester calling the API, and the permission data of the requesting party calling the API, the network traffic Analyze to obtain analytical data, compare the analytical data with the permission data, and compare the system log with the permission data, and combine the two comparison results to assess the risk of privacy data leakage caused by the requesting party’s API call. Detect and discover the violation and abnormal calling behavior of the requesting party in time. Furthermore, the obtained system log and the analyzed network traffic record can be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results. According to another embodiment, there is also provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2. According to another aspect of the embodiment, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, the implementation described in conjunction with FIG. 2 Methods. Those skilled in the art should be aware that in one or more of the above examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. The scope of protection, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the scope of protection of the present invention.

300:風險評估裝置 310:第一獲取單元 311:獲取子單元 312:過濾子單元 320:解析單元 321:解析子單元 322:確定子單元 323:歸入子單元 324:驗證子單元 330:第二獲取單元 340:比對單元 350:評估單元 351:處理子單元 352:比對子單元 353:評估子單元 300: Risk assessment device 310: The first acquisition unit 311: Get subunit 312: Filter subunit 320: Analysis unit 321: Analysis subunit 322: Determine subunit 323: subunit 324: Verification subunit 330: Second acquisition unit 340: Comparison unit 350: evaluation unit 351: Processing subunit 352: Comparison subunit 353: Evaluation Subunit

為了更清楚地說明本發明實施例的技術方案,下面將對實施例描述中所需要使用的附圖作簡單地介紹,顯而易見地,下面描述中的附圖僅僅是本發明的一些實施例,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些附圖獲得其它的附圖。 [圖1]示出根據一個實施例的風險評估方法的實施場景示意圖; [圖2]示出根據一個實施例的針對隱私資料洩漏的風險評估方法流程圖; [圖3]示出根據一個實施例的針對隱私資料洩漏的風險評估裝置結構圖。 In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work. [Fig. 1] A schematic diagram showing an implementation scenario of a risk assessment method according to an embodiment; [Fig. 2] shows a flowchart of a method for risk assessment of privacy data leakage according to an embodiment; [Fig. 3] shows a structural diagram of a risk assessment device for leakage of private data according to an embodiment.

Claims (30)

一種針對隱私資料洩漏的風險評估方法,包括:獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於該請求方向該服務平臺發出的調用API的請求消息而生成,並包括,根據該請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及該第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括該服務平臺針對該請求消息返回的回應訊息;對該若干網路流量記錄進行解析處理,得到解析資料,該解析資料中至少包括API輸出資料所對應的若干第二隱私類別;從該服務平臺獲取該請求方調用API的許可權資料,該許可權資料包括該請求方有權調用的API集合,針對該API集合有權傳入的參數組成的參數集合,以及該參數集合所對應的隱私類別集合;將該若干系統日誌與該許可權資料進行比對,得到第一比對結果,以及,將該解析資料與該許可權資料進行比對,得到第二比對結果;至少基於該第一比對結果和第二比對結果,評估該請求方調用API的隱私資料洩漏風險。 A risk assessment method for the leakage of private data, including: obtaining a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the request direction The request message for calling the API sent by the service platform is generated, and includes a number of first target APIs determined according to the request message, first parameters input for the number of first target APIs, and a number of first parameters corresponding to the first parameters. A privacy category; each network traffic record includes at least the response message returned by the service platform for the request message; the analysis processing of the several network traffic records is performed to obtain analysis data, and the analysis data includes at least the API output data store Corresponding to a number of second privacy categories; obtain permission data from the service platform for the requester to call the API, the permission data includes the API set that the requester has the right to call, and is composed of the parameters that the API set has the right to pass in The parameter set, and the privacy category set corresponding to the parameter set; compare the system logs with the permission data to obtain the first comparison result, and compare the analytical data with the permission data, Obtain a second comparison result; at least based on the first comparison result and the second comparison result, evaluate the privacy data leakage risk of the requester calling the API. 根據請求項1所述的方法,其中,獲取請 求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄,包括:獲取該請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄;基於預先設定的多個隱私類別,對該多條系統日誌和多條網路流量記錄進行過濾處理,得到該若干系統日誌和若干網路流量記錄。 According to the method described in claim 1, in which, obtain A number of system logs and a number of network traffic records generated by the requesting party requesting to call the private data of the target object stored in the service platform, including: obtaining multiple system logs and multiple network entries generated by the requesting party calling the API provided by the service platform Road traffic records; based on multiple preset privacy categories, the multiple system logs and multiple network traffic records are filtered to obtain the several system logs and several network traffic records. 根據請求項2所述的方法,其中,對該多條系統日誌和多條網路流量記錄進行過濾處理,得到該若干系統日誌和若干網路流量記錄,包括:利用該多個隱私類別,對該多條系統日誌進行匹配,將匹配成功的系統日誌作為該若干系統日誌;利用預先基於該多個隱私類別設定的過濾項,從該多條網路流量記錄中篩選出該若干網路流量記錄,該過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。 The method according to claim 2, wherein filtering the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records includes: using the multiple privacy categories to The plurality of system logs are matched, and the successfully matched system logs are used as the plurality of system logs; the filter items set based on the plurality of privacy categories in advance are used to filter the plurality of network traffic records from the plurality of network traffic records , The form of the filter item includes at least one of the following: custom UDF function, key field, and regular item. 根據請求項1所述的方法,其中,該網路流量記錄還包括該請求消息,該解析資料還包括對該請求消息進行解析得到的若干第二目標API和針對若干第二目標API輸入的第二參數。 The method according to claim 1, wherein the network traffic record further includes the request message, and the analysis data further includes a number of second target APIs obtained by parsing the request message and the first input for the number of second target APIs. Two parameters. 根據請求項4所述的方法,其中,對該若干網路流量記錄進行該解析處理,得到解析資料,包括:利用預先基於多個API設定的API解析規則,從該若干網路流量記錄中解析出該若干第二目標API,該API解析規 則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項;利用預先基於多個參數設定的參數解析規則,從該若干網路流量記錄中解析出該若干第二參數,該參數解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。 The method according to claim 4, wherein performing the analysis processing on the plurality of network traffic records to obtain analysis data includes: using API analysis rules set based on a plurality of APIs in advance to parse the plurality of network traffic records Out the several second target APIs, and the API parsing rules It is defined through at least one of the following forms: custom UDF functions, key fields, and regular items; using parameter parsing rules set based on multiple parameters in advance to parse the number of second parameters from the number of network traffic records, The parameter parsing rule is defined in at least one of the following forms: custom UDF functions, key fields, and regular items. 根據請求項1所述的方法,其中,對該若干網路流量記錄進行解析處理,得到解析資料,包括:對該若干網路流量記錄進行解析處理,得到該API輸出資料,該API輸出資料中包括多個欄位;確定該多個欄位中若干隱私欄位對應的若干第三隱私類別;將該若干第三隱私類別作為該若干第二隱私類別;或,基於該若干隱私欄位的欄位值,對該若干第三隱私類別進行驗證處理,並將透過驗證的第三隱私類別歸入該若干第二隱私類別。 The method according to claim 1, wherein, performing analysis processing on the plurality of network traffic records to obtain analysis data includes: performing analysis processing on the plurality of network traffic records to obtain the API output data, and the API output data is Including a plurality of fields; determining a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields; using the plurality of third privacy categories as the plurality of second privacy categories; or, a field based on the plurality of privacy fields Place value, verifying the plurality of third privacy categories, and classify the verified third privacy categories into the plurality of second privacy categories. 根據請求項6所述的方法,其中,確定該多個欄位中若干隱私欄位對應的若干第三隱私類別,包括:基於預先訓練的自然語言處理模型,確定該多個欄位中若干隱私欄位對應的若干第三隱私類別;或,基於預先設定的多個正則匹配規則,確定該多個欄位中若干隱私欄位對應的若干第三隱私類別。 The method according to claim 6, wherein determining a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields includes: determining a plurality of privacy categories in the plurality of fields based on a pre-trained natural language processing model A number of third privacy categories corresponding to the fields; or, based on a plurality of preset regular matching rules, a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields are determined. 根據請求項6所述的方法,其中,該若干隱私欄位中包括任意的第一欄位,對應該若干第三隱私類別中的第一類別;其中基於該若干隱私欄位的欄位內容,對該若干第三類別進行驗證處理,包括:利用預先儲存的對應於該第一類別的多個合法欄位值,對該第一欄位進行匹配,並在匹配成功的情況下,判定該第一類別通過驗證;或,利用預先訓練的針對該第一類別的分類模型,對該第一欄位進行分類,在分類結果指示該第一欄位屬於該第一類別的情況下,判定該第一類別通過驗證。 The method according to claim 6, wherein the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein, based on the field content of the plurality of privacy fields, Performing verification processing on the plurality of third categories includes: using a plurality of pre-stored legal field values corresponding to the first category to match the first field, and in the case of a successful match, determining the first field One category passes the verification; or, using a pre-trained classification model for the first category to classify the first field, and if the classification result indicates that the first field belongs to the first category, determine the first field One category passed verification. 根據請求項1所述的方法,其中,將該若干系統日誌與該許可權資料進行比對,得到第一比對結果,包括:判斷該若干第一目標API是否屬於該API集合,得到第一判斷結果,歸入該第一比對結果;判斷該第一參數是否屬於該參數集合,得到第二判斷結果,歸入該第一比對結果;判斷該若干第一隱私類別是否屬於該隱私類別集合,得到第三判斷結果,歸入該第一比對結果;其中將該解析資料與該許可權資料進行比對,得到第二比對結果,包括:判斷該若干第二隱私類別是否屬於該隱私類別集合,得到第四判斷結果,歸入該第二比對結果。 The method according to claim 1, wherein comparing the plurality of system logs with the permission data to obtain the first comparison result includes: determining whether the plurality of first target APIs belong to the API set, and obtaining the first comparison result. The judgment result is classified into the first comparison result; it is judged whether the first parameter belongs to the parameter set, and the second judgment result is obtained, and it is classified into the first comparison result; it is judged whether the several first privacy categories belong to the privacy category Collecting, the third judgment result is obtained, and it is classified into the first comparison result; the analysis data is compared with the permission data to obtain the second comparison result, including: judging whether the plurality of second privacy categories belong to the The privacy category is set, and the fourth judgment result is obtained, which is included in the second comparison result. 根據請求項4所述的方法,其中,將該 解析資料與該許可權資料進行比對,得到第二比對結果,包括:判斷該若干第二隱私類別是否屬於該隱私類別集合,得到第四判斷結果,歸入該第二比對結果;判斷該若干第二目標API是否屬於該API集合,得到第五判斷結果,歸入該第二比對結果;判斷該第二參數是否屬於該參數集合,得到第六判斷結果,歸入該第二比對結果。 The method according to claim 4, wherein the The analysis data is compared with the permission data to obtain the second comparison result, including: judging whether the plurality of second privacy categories belong to the privacy category set, and obtaining the fourth judgment result, which is included in the second comparison result; judgment; Whether the plurality of second target APIs belong to the API set, the fifth judgment result is obtained, and they are classified into the second comparison result; whether the second parameter belongs to the parameter set is judged, and the sixth judgment result is obtained, which is classified into the second comparison result. The result. 根據請求項1所述的方法,其中,至少基於該第一比對結果和第二比對結果,評估該請求方調用API的隱私資料洩漏風險,包括:將該第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示該隱私資料洩漏風險。 The method according to claim 1, wherein, based on at least the first comparison result and the second comparison result, evaluating the privacy data leakage risk of the requester calling API includes: the first comparison result and the second comparison result The comparison results are jointly input into the pre-trained first risk assessment model, and the first prediction result is obtained, indicating the risk of leakage of the private data. 根據請求項1所述的方法,其中,至少基於該第一比對結果和第二比對結果,評估該請求方調用API的隱私資料洩漏風險,包括:根據該若干系統日誌和若干網路流量記錄,確定監控指標的指標值,該監控指標針對請求方API調用行為而預先設定;將預先獲取的該請求方的歷史指標值與該指標值進行比對,得到第三比對結果;基於該第一比對結果、第二比對結果和第三比對結果,評估該請求方調用API的隱私資料洩漏風險。 The method according to claim 1, wherein, based on at least the first comparison result and the second comparison result, assessing the privacy data leakage risk of the requester calling API includes: according to the plurality of system logs and the plurality of network traffic Record and determine the index value of the monitoring index, which is preset for the requester’s API call behavior; compare the pre-obtained historical index value of the requester with the index value to obtain the third comparison result; based on the The first comparison result, the second comparison result, and the third comparison result are evaluated to assess the privacy data leakage risk of the requester calling the API. 根據請求項12所述的方法,其中,該監控指標中包括以下中的一種或多種:單位時間內請求方向該服務平臺發送的請求消息的條數,單位時間內請求方請求調用的隱私資料所對應的目標對象的個數,單位時間內請求方請求調用的隱私資料所對應的隱私類別的個數。 The method according to claim 12, wherein the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the private data store requested by the requester in a unit time; The number of corresponding target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time. 根據請求項12所述的方法,其中,基於該第一比對結果、第二比對結果和第三比對結果,評估該請求方調用API的隱私資料洩漏風險,包括:結合預先設定的評估規則,根據該第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏;或,將該第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示該隱私資料洩漏風險。 The method according to claim 12, wherein, based on the first comparison result, the second comparison result, and the third comparison result, evaluating the privacy data leakage risk of the requester calling API includes: combining with a preset evaluation Rules, according to the first comparison result, the second comparison result, and the third comparison result, determine whether there is any leakage of privacy information; or, the first comparison result, the second comparison result, and the third comparison result Commonly input the pre-trained second risk assessment model to obtain the second prediction result, indicating the risk of leakage of the private data. 一種針對隱私資料洩漏的風險評估裝置,包括:第一獲取單元,配置為獲取請求方請求調用服務平臺中儲存的目標對象的隱私資料而產生的若干系統日誌和若干網路流量記錄;其中,每條系統日誌基於該請求方向該服務平臺發出的調用API的請求消息而生成,並包括,根據該請求消息確定的若干第一目標API,針對若干第一目標API輸入的第一參數,以及該第一參數所對應的若干第一隱私類別;每條網路流量記錄中至少包括該服務平臺針對該請求消息返回的回應訊息; 解析單元,配置為對該若干網路流量記錄進行解析處理,得到解析資料,該解析資料中至少包括API輸出資料所對應的若干第二隱私類別;第二獲取單元,配置為從該服務平臺獲取該請求方調用API的許可權資料,該許可權資料包括該請求方有權調用的API集合,針對該API集合有權傳入的參數組成的參數集合,以及該參數集合所對應的隱私類別集合;比對單元,配置為將該若干系統日誌與該許可權資料進行比對,得到第一比對結果,以及,將該解析資料與該許可權資料進行比對,得到第二比對結果;評估單元,配置為至少基於該第一比對結果和第二比對結果,評估該請求方調用API的隱私資料洩漏風險。 A risk assessment device for privacy data leakage includes: a first acquisition unit configured to acquire a number of system logs and a number of network traffic records generated by a requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each A system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, the first parameters entered for the number of first target APIs, and the first parameter A number of first privacy categories corresponding to a parameter; each network traffic record includes at least the response message returned by the service platform for the request message; The analysis unit is configured to perform analysis processing on the plurality of network traffic records to obtain analysis data, the analysis data includes at least a number of second privacy categories corresponding to the API output data; the second acquisition unit is configured to acquire from the service platform The permission data of the requesting party to call the API, the permission data including the API set that the requesting party has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and the privacy category set corresponding to the parameter set The comparison unit is configured to compare the plurality of system logs with the permission data to obtain a first comparison result, and to compare the analytical data with the permission data to obtain a second comparison result; The evaluation unit is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result. 根據請求項15所述的裝置,其中,第一獲取單元具體包括:獲取子單元,配置為獲取該請求方調用服務平臺提供的API而產生的多條系統日誌和多條網路流量記錄;過濾子單元,配置為基於預先設定的多個隱私類別,對該多條系統日誌和多條網路流量記錄進行過濾處理,得到該若干系統日誌和若干網路流量記錄。 The device according to claim 15, wherein the first obtaining unit specifically includes: an obtaining subunit configured to obtain multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform; and filtering; The sub-unit is configured to filter the multiple system logs and multiple network traffic records based on multiple preset privacy categories to obtain the multiple system logs and multiple network traffic records. 根據請求項16所述的裝置,其中,該過濾子單元具體配置為:利用該多個隱私類別,對該多條系統日誌進行匹配,將匹配成功的系統日誌作為該若干系統日誌;利用預先基於該多個隱私類別設定的過濾項,從該多 條網路流量記錄中篩選出該若干網路流量記錄,該過濾項的形式包括以下中的至少一種:自訂UDF函數、關鍵欄位和正則項。 The device according to claim 16, wherein the filtering subunit is specifically configured to: use the plurality of privacy categories to match the plurality of system logs, and use the system logs that are successfully matched as the plurality of system logs; The filtering items set by the multiple privacy categories The several network traffic records are filtered out of the network traffic records, and the form of the filtering item includes at least one of the following: custom UDF functions, key fields, and regular items. 根據請求項15所述的裝置,其中,該網路流量記錄還包括該請求消息,該解析資料還包括對該請求消息進行解析得到的若干第二目標API和針對若干第二目標API輸入的第二參數。 The device according to claim 15, wherein the network traffic record further includes the request message, and the analysis data further includes a plurality of second target APIs obtained by parsing the request message, and the first input for the plurality of second target APIs. Two parameters. 根據請求項18所述的裝置,其中,該解析單元還配置為:利用預先基於多個API設定的API解析規則,從該若干網路流量記錄中解析出該若干第二目標API,該API解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項;利用預先基於多個參數設定的參數解析規則,從該若干網路流量記錄中解析出該若干第二參數,該參數解析規則透過以下中的至少一種形式定義:自訂UDF函數、關鍵欄位和正則項。 The device according to claim 18, wherein the parsing unit is further configured to parse the plurality of second target APIs from the plurality of network traffic records by using API parsing rules set in advance based on a plurality of APIs, and the API parsing The rules are defined in at least one of the following forms: custom UDF functions, key fields, and regular items; using parameter analysis rules set based on multiple parameters in advance to parse the number of second parameters from the number of network traffic records, The parameter parsing rule is defined in at least one of the following forms: custom UDF functions, key fields, and regular items. 根據請求項15所述的裝置,其中,該解析單元具體包括:解析子單元,配置為對該若干網路流量記錄進行解析處理,得到該API輸出資料,該API輸出資料中包括多個欄位;確定子單元,配置為確定該多個欄位中若干隱私欄位對應的若干第三隱私類別; 該解析單元具體還包括:歸入子單元,配置為將該若干第三隱私類別作為該若干第二隱私類別;或,驗證子單元,配置為基於該若干隱私欄位的欄位值,對該若干第三隱私類別進行驗證處理,並將通過驗證的第三隱私類別歸入該若干第二隱私類別。 The device according to claim 15, wherein the analysis unit specifically includes: an analysis subunit configured to perform analysis processing on the plurality of network traffic records to obtain the API output data, and the API output data includes a plurality of fields ; Determining the sub-unit, configured to determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields; The analysis unit specifically further includes: a sub-unit configured to use the plurality of third privacy categories as the plurality of second privacy categories; or, the verification sub-unit is configured to perform the Several third privacy categories are subjected to verification processing, and the third privacy categories that have passed the verification are classified into the multiple second privacy categories. 根據請求項20所述的裝置,其中,該確定子單元具體配置為:基於預先訓練的自然語言處理模型,確定該多個欄位中若干隱私欄位對應的若干第三隱私類別;或,基於預先設定的多個正則匹配規則,確定該多個欄位中若干隱私欄位對應的若干第三隱私類別。 The device according to claim 20, wherein the determining subunit is specifically configured to: determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields based on a pre-trained natural language processing model; or, based on A plurality of preset regular matching rules determines a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields. 根據請求項20所述的裝置,其中,該若干隱私欄位中包括任意的第一欄位,對應該若干第三隱私類別中的第一類別;其中驗證子單元具體配置為:利用預先儲存的對應於該第一類別的多個合法欄位值,對該第一欄位進行匹配,並在匹配成功的情況下,判定該第一類別通過驗證;或,利用預先訓練的針對該第一類別的分類模型,對該第一欄位進行分類,在分類結果指示該第一欄位屬於該第一類別的情況下,判定該第一類別通過驗證。 The device according to claim 20, wherein the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein the verification subunit is specifically configured to: use pre-stored Corresponding to multiple legal field values of the first category, match the first field, and if the matching is successful, determine that the first category has passed the verification; or, use pre-trained for the first category The classification model of, classifies the first field, and if the classification result indicates that the first field belongs to the first class, it is determined that the first class passes the verification. 根據請求項15所述的裝置,其中,該比對單元具體配置為:判斷該若干第一目標API是否屬於該API集合,得到第一判斷結果,歸入該第一比對結果; 判斷該第一參數是否屬於該參數集合,得到第二判斷結果,歸入該第一比對結果;判斷該若干第一隱私類別是否屬於該隱私類別集合,得到第三判斷結果,歸入該第一比對結果;判斷該若干第二隱私類別是否屬於該隱私類別集合,得到第四判斷結果,歸入該第二比對結果。 The device according to claim 15, wherein the comparison unit is specifically configured to determine whether the plurality of first target APIs belong to the API set, obtain a first judgment result, and classify it into the first comparison result; Determine whether the first parameter belongs to the parameter set, and obtain the second judgment result, which is included in the first comparison result; determine whether the several first privacy categories belong to the privacy category set, and obtain the third judgment result, which is included in the first comparison result. A comparison result: It is judged whether the plurality of second privacy categories belong to the privacy category set, and the fourth judgment result is obtained, which is included in the second comparison result. 根據請求項18所述的裝置,其中,該比對單元還配置為:判斷該若干第二隱私類別是否屬於該隱私類別集合,得到第四判斷結果,歸入該第二比對結果;判斷該若干第二目標API是否屬於該API集合,得到第五判斷結果,歸入該第二比對結果;判斷該第二參數是否屬於該參數集合,得到第六判斷結果,歸入該第二比對結果。 The device according to claim 18, wherein the comparison unit is further configured to: determine whether the plurality of second privacy categories belong to the privacy category set, obtain a fourth determination result, and classify it into the second comparison result; determine the Whether a number of second target APIs belong to the API set, the fifth judgment result is obtained, and they are classified into the second comparison result; whether the second parameter belongs to the parameter set is judged, and the sixth judgment result is obtained, which is classified into the second comparison result result. 根據請求項15所述的裝置,其中,該評估單元具體配置為:將該第一比對結果和第二比對結果共同輸入預先訓練的第一風險評估模型中,得到第一預測結果,指示該隱私資料洩漏風險。 The device according to claim 15, wherein the evaluation unit is specifically configured to input the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain the first prediction result, and indicate This privacy information is at risk of leakage. 根據請求項15所述的裝置,其中,該評估單元具體包括:處理子單元,配置為根據該若干系統日誌和若干網路流量記錄,確定監控指標的指標值,該監控指標針對請求方API調用行為而預先設定; 比對子單元,配置為將預先獲取的該請求方的歷史指標值與該指標值進行比對,得到第三比對結果;評估子單元,配置為基於該第一比對結果、第二比對結果和第三比對結果,評估該請求方調用API的隱私資料洩漏風險。 The device according to claim 15, wherein the evaluation unit specifically includes: a processing subunit configured to determine an indicator value of a monitoring indicator based on the number of system logs and a number of network traffic records, and the monitoring indicator is directed to the requesting party's API call Pre-determined by behavior; The comparison subunit is configured to compare the pre-obtained historical index value of the requester with the index value to obtain a third comparison result; the evaluation subunit is configured to be based on the first comparison result and the second comparison result. Based on the result and the third comparison result, assess the privacy data leakage risk of the requester calling the API. 根據請求項26所述的裝置,其中,該監控指標中包括以下中的一種或多種:單位時間內請求方向該服務平臺發送的請求消息的條數,單位時間內請求方請求調用的隱私資料所對應的目標對象的個數,單位時間內請求方請求調用的隱私資料所對應的隱私類別的個數。 The device according to claim 26, wherein the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the private data store requested by the requesting party in a unit time; The number of corresponding target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time. 根據請求項26所述的裝置,其中,該評估子單元具體配置為:結合預先設定的評估規則,根據該第一比對結果、第二比對結果和第三比對結果,判斷是否發生隱私資料洩漏;或,將該第一比對結果、第二比對結果和第三比對結果共同輸入預先訓練的第二風險評估模型中,得到第二預測結果,指示該隱私資料洩漏風險。 The device according to claim 26, wherein the evaluation subunit is specifically configured to determine whether privacy has occurred based on the first comparison result, the second comparison result, and the third comparison result in combination with a preset evaluation rule Data leakage; or, the first comparison result, the second comparison result, and the third comparison result are jointly input into the pre-trained second risk assessment model to obtain the second prediction result, indicating the risk of the privacy data leakage. 一種電腦可讀儲存媒體,其上儲存有電腦程式,其中,當該電腦程式在電腦中執行時,令電腦執行請求項1-14中任一項所述的方法。 A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed in a computer, the computer is caused to execute the method described in any one of the request items 1-14. 一種計算設備,包括記憶體和處理器,其中,該記憶體中儲存有可執行代碼,該處理器執行該可執行代碼時,實現請求項1-14中任一項所述的方法。A computing device includes a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, the method described in any one of claim items 1-14 is implemented.
TW109115224A 2019-11-19 2020-05-07 Risk assessment method and device for leakage of privacy data TWI734466B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911131676.3A CN110851872B (en) 2019-11-19 2019-11-19 Risk assessment method and device for private data leakage
CN201911131676.3 2019-11-19

Publications (2)

Publication Number Publication Date
TW202121329A TW202121329A (en) 2021-06-01
TWI734466B true TWI734466B (en) 2021-07-21

Family

ID=69602179

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109115224A TWI734466B (en) 2019-11-19 2020-05-07 Risk assessment method and device for leakage of privacy data

Country Status (3)

Country Link
CN (1) CN110851872B (en)
TW (1) TWI734466B (en)
WO (1) WO2021098274A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851872B (en) * 2019-11-19 2021-02-23 支付宝(杭州)信息技术有限公司 Risk assessment method and device for private data leakage
CN112163222A (en) * 2020-10-10 2021-01-01 哈尔滨工业大学(深圳) Malicious software detection method and device
CN113360916A (en) * 2021-06-18 2021-09-07 奇安信科技集团股份有限公司 Risk detection method, device, equipment and medium for application programming interface
CN114301844B (en) * 2021-12-30 2024-04-19 天翼物联科技有限公司 Flow control method and system for Internet of things capability open platform and related components thereof
CN114154132B (en) * 2022-02-10 2022-05-20 北京华科软科技有限公司 Data sharing method based on service system
CN115296933B (en) * 2022-10-08 2022-12-23 国家工业信息安全发展研究中心 Industrial production data risk level assessment method and system
CN116170331B (en) * 2023-04-23 2023-08-04 远江盛邦(北京)网络安全科技股份有限公司 API asset management method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI355168B (en) * 2007-12-07 2011-12-21 Univ Nat Chiao Tung Application classification method in network traff
CN103533546A (en) * 2013-10-29 2014-01-22 无锡赛思汇智科技有限公司 Implicit user verification and privacy protection method based on multi-dimensional behavior characteristics
US9552478B2 (en) * 2010-05-18 2017-01-24 AO Kaspersky Lab Team security for portable information devices
TWI596498B (en) * 2016-11-02 2017-08-21 FedMR-based botnet reconnaissance method
CN109598146A (en) * 2018-12-07 2019-04-09 百度在线网络技术(北京)有限公司 Privacy risk appraisal procedure and device
CN109753808A (en) * 2018-11-19 2019-05-14 中国科学院信息工程研究所 A kind of privacy compromise methods of risk assessment and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346566A (en) * 2013-07-31 2015-02-11 腾讯科技(深圳)有限公司 Method, device, terminal, server and system for detecting privacy authority risks
CN103716313B (en) * 2013-12-24 2016-07-13 中国科学院信息工程研究所 A kind of user privacy information guard method and system
CN103761472B (en) * 2014-02-21 2017-05-24 北京奇虎科技有限公司 Application program accessing method and device based on intelligent terminal
CN109716345B (en) * 2016-04-29 2023-09-15 普威达有限公司 Computer-implemented privacy engineering system and method
US10887291B2 (en) * 2016-12-16 2021-01-05 Amazon Technologies, Inc. Secure data distribution of sensitive data across content delivery networks
CN106845236A (en) * 2017-01-18 2017-06-13 东南大学 A kind of application program various dimensions privacy leakage detection method and system for iOS platforms
CN109145603A (en) * 2018-07-09 2019-01-04 四川大学 A kind of Android privacy leakage behavioral value methods and techniques based on information flow
CN110334537B (en) * 2019-05-31 2023-01-13 华为技术有限公司 Information processing method and device and server
CN110851872B (en) * 2019-11-19 2021-02-23 支付宝(杭州)信息技术有限公司 Risk assessment method and device for private data leakage

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI355168B (en) * 2007-12-07 2011-12-21 Univ Nat Chiao Tung Application classification method in network traff
US9552478B2 (en) * 2010-05-18 2017-01-24 AO Kaspersky Lab Team security for portable information devices
CN103533546A (en) * 2013-10-29 2014-01-22 无锡赛思汇智科技有限公司 Implicit user verification and privacy protection method based on multi-dimensional behavior characteristics
TWI596498B (en) * 2016-11-02 2017-08-21 FedMR-based botnet reconnaissance method
CN109753808A (en) * 2018-11-19 2019-05-14 中国科学院信息工程研究所 A kind of privacy compromise methods of risk assessment and device
CN109598146A (en) * 2018-12-07 2019-04-09 百度在线网络技术(北京)有限公司 Privacy risk appraisal procedure and device

Also Published As

Publication number Publication date
CN110851872B (en) 2021-02-23
CN110851872A (en) 2020-02-28
WO2021098274A1 (en) 2021-05-27
TW202121329A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
TWI734466B (en) Risk assessment method and device for leakage of privacy data
CN110399925B (en) Account risk identification method, device and storage medium
US10628828B2 (en) Systems and methods for sanction screening
US10037533B2 (en) Systems and methods for detecting relations between unknown merchants and merchants with a known connection to fraud
US10924514B1 (en) Machine learning detection of fraudulent validation of financial institution credentials
KR102514325B1 (en) Model training system and method, storage medium
US10346845B2 (en) Enhanced automated acceptance of payment transactions that have been flagged for human review by an anti-fraud system
US9679125B2 (en) Characterizing user behavior via intelligent identity analytics
TWI684151B (en) Method and device for detecting illegal transaction
US20170140386A1 (en) Transaction assessment and/or authentication
CN111027094B (en) Risk assessment method and device for private data leakage
Fröwis et al. Safeguarding the evidential value of forensic cryptocurrency investigations
CN106548342B (en) Trusted device determining method and device
CN105590055A (en) Method and apparatus for identifying trustworthy user behavior in network interaction system
CN111786974B (en) Network security assessment method and device, computer equipment and storage medium
US20140181007A1 (en) Trademark reservation system
CN112801827A (en) Intellectual property management system based on block chain
CN110909384B (en) Method and device for determining business party revealing user information
US11736448B2 (en) Digital identity network alerts
US20230027733A1 (en) Systems, devices, and methods for observing and/or performing data access compliance to a computer network
CN109636570A (en) Risk analysis method, device, equipment and the readable storage medium storing program for executing of cell-phone number
CN111489175A (en) Online identity authentication method, device, system and storage medium
CN106330811A (en) Domain name credibility determination method and device
US11144675B2 (en) Data processing systems and methods for automatically protecting sensitive data within privacy management systems
CN112702410B (en) Evaluation system, method and related equipment based on blockchain network