WO2021098274A1 - Method and apparatus for evaluating risk of leakage of private data - Google Patents

Method and apparatus for evaluating risk of leakage of private data Download PDF

Info

Publication number
WO2021098274A1
WO2021098274A1 PCT/CN2020/105106 CN2020105106W WO2021098274A1 WO 2021098274 A1 WO2021098274 A1 WO 2021098274A1 CN 2020105106 W CN2020105106 W CN 2020105106W WO 2021098274 A1 WO2021098274 A1 WO 2021098274A1
Authority
WO
WIPO (PCT)
Prior art keywords
privacy
comparison result
data
api
network traffic
Prior art date
Application number
PCT/CN2020/105106
Other languages
French (fr)
Chinese (zh)
Inventor
邓圆
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021098274A1 publication Critical patent/WO2021098274A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6263Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies

Definitions

  • One or more embodiments of this specification relate to the technical field of data information security, and in particular, to a risk assessment method and device for private data leakage.
  • API Application Programming Interface
  • API calls have the advantages of convenient calling and strong versatility, and has gradually become the main way of providing Internet network services. Therefore, API calls have also become a key focus area to prevent data leakage.
  • the data stored by the service platform usually includes the basic information data of the objects it serves (such as individuals or enterprises, etc.), as well as the service data generated during the use of the service.
  • the service platform can provide API call services to the data demander (such as research institutions or merchants, etc.) based on these data.
  • the data demander or requester
  • the software and hardware environments, IT architectures, and business scenarios of different requesters are often different, and there are large differences, resulting in a complex API call system and easy to be illegal.
  • Molecular use causes data leakage, which undoubtedly brings great challenges to the data protection of API calls. Especially considering that the leaked data is likely to include the user's personal information and other private data, the prevention of data leaks is becoming more urgent.
  • One or more embodiments of this specification describe a risk assessment method and device for privacy data leakage, which can conduct timely and accurate assessment of the risk of privacy data leakage due to API calls, so as to effectively prevent the leakage of privacy data.
  • a risk assessment method for privacy data leakage includes: obtaining a number of system logs and a number of network traffic records generated by a requesting party requesting to call the privacy data of a target object stored in a service platform; wherein, Each system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, and first parameters input for the number of first target APIs , And several first privacy categories corresponding to the first parameter; each network traffic record includes at least a response message returned by the service platform for the request message. Analyzing the several network traffic records to obtain parsed data, which includes at least several second privacy categories corresponding to the API output data.
  • the permission data of the requester to call the API includes the API set that the requester has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and all The privacy category set corresponding to the parameter set.
  • the plurality of system logs are compared with the authority data to obtain a first comparison result, and the analysis data is compared with the authority data to obtain a second comparison result. Based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API.
  • obtaining several system logs and several network traffic records generated by the requester requesting to call the privacy data of the target object stored in the service platform includes: obtaining the requestor generated by calling the API provided by the service platform Multiple system logs and multiple network traffic records; based on multiple preset privacy categories, filter the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records.
  • filtering the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records includes: using the multiple privacy categories to perform A plurality of system logs are matched, and the successfully matched system logs are used as the plurality of system logs; the filter items set based on the plurality of privacy categories in advance are used to filter the plurality of network traffic records from the plurality of network traffic records
  • the form of the filtering item includes at least one of the following: a custom UDF function, a key field, and a regular item.
  • the analytic processing of the plurality of network traffic records to obtain analytical data includes: analytic processing of the plurality of network traffic records to obtain the API output data, and the API output data includes multiple Fields; determine several third privacy categories corresponding to several privacy fields in the multiple fields; use the several third privacy categories as the several second privacy categories; or, based on the field values of the several privacy fields, Perform verification processing on the plurality of third privacy categories, and classify the verified third privacy categories into the plurality of second privacy categories.
  • the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein, based on the field content of the plurality of privacy fields, the Performing verification processing for the third category includes: matching the first field by using a plurality of pre-stored legal field values corresponding to the first category, and in the case of a successful match, determining the first category Pass the verification; or, use a pre-trained classification model for the first category to classify the first field, and if the classification result indicates that the first field belongs to the first category, determine the The first category passed verification.
  • evaluating the privacy data leakage risk of the requester calling API based on at least the first comparison result and the second comparison result includes: comparing the first comparison result with the second comparison result.
  • the results are jointly input into the pre-trained first risk assessment model, and the first prediction result is obtained, indicating the risk of leakage of the privacy data.
  • assessing the privacy data leakage risk of the requester calling the API includes: combining with a preset assessment Rule, according to the first comparison result, the second comparison result, and the third comparison result, determine whether privacy data leakage occurs; or, compare the first comparison result, the second comparison result and the third comparison result
  • the results are jointly input into the pre-trained second risk assessment model, and the second prediction result is obtained, indicating the risk of leakage of the private data.
  • a risk assessment device for privacy data leakage includes: a first acquiring unit configured to acquire a number of system logs and a number of system logs generated by a requester requesting to call the privacy data of a target object stored in a service platform Several network traffic records; among them, each system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, for a number of first The first parameter input by the target API, and several first privacy categories corresponding to the first parameter; each network traffic record includes at least the response message returned by the service platform for the request message.
  • the parsing unit is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data.
  • the second obtaining unit is configured to obtain from the service platform the permission data of the requester to call the API, the permission data including the API set that the requester has the right to call, and the parameters that the API set has the right to pass The composed parameter set, and the privacy category set corresponding to the parameter set.
  • the comparison unit is configured to compare the plurality of system logs with the authority data to obtain a first comparison result, and to compare the analysis data with the authority data to obtain a second comparison result .
  • the evaluation unit is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
  • the network traffic is performed by obtaining the system log and network traffic record generated by the requester calling API, and the permission data of the requesting party calling API. Analyze the parsed data, compare the parsed data with the permission data, and compare the system log with the permission data. Combine the two comparison results to assess the risk of privacy data leakage caused by the requester's API call, and timely detect, Violations and abnormal calling behaviors of the requesting party were found. Furthermore, the obtained system log and the parsed network traffic record can also be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.
  • Fig. 1 shows a schematic diagram of an implementation scenario of a risk assessment method according to an embodiment.
  • Fig. 2 shows a flowchart of a risk assessment method for privacy data leakage according to an embodiment.
  • Fig. 3 shows a structural diagram of a risk assessment device for privacy data leakage according to an embodiment.
  • the API interface called by the requester due to some old and unupdated field settings (such as the business personnel splicing the user's mobile phone number and ID number into one field), resulting in the range of data output by the API interface (such as the user's mobile phone number) And the ID number) is inconsistent with the requesting party's contract data range (such as the user's mobile phone number).
  • FIG. 1 shows a schematic diagram of an implementation scenario of a risk assessment method according to an embodiment.
  • the requester personnel can send an API call request (or request message) to the service platform through the requester client. ), correspondingly, the service platform can generate a corresponding system log according to the request message, and return an API call response (or response message) to the requesting client.
  • the gateway can record the request message and the response message, and generate a corresponding network traffic record (or called a network traffic log).
  • the foregoing filtering processing may further include: filtering out the plurality of network traffic records from the plurality of network traffic records by using filtering items set in advance based on the plurality of privacy categories, so
  • the form of the filter item includes at least one of the following: custom UDF function, key field, and regular item.
  • the network traffic record includes the request message and the corresponding response message.
  • the data meaning of the fields included in the request message and the response message is often ambiguous, which is different from the system log including the determination from the request message based on the API configuration information Data meaning. Therefore, it is difficult to achieve filtering by using multiple privacy categories to directly match.
  • this step may include: inputting the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, indicating that the privacy data is leaked risk.
  • the first risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, neural networks, etc.
  • the first prediction result may be a risk classification level, such as high, medium, and low.
  • the first prediction result may be a risk assessment score, such as 20 or 85. It should be noted that the use process of the first risk assessment model is similar to the training process, so the training process will not be repeated.
  • the aforementioned historical indicator value may be determined based on historical system logs and historical network traffic records generated by the requesting party's invoking privacy data.
  • the monitoring index may include the number of request messages sent by the requesting party per minute. Assuming that the historical index value for this number is 20, and the current determined index value is determined to be 100, it can be 4((100-20)/20) determines the comparison result for this number and belongs to the third comparison result mentioned above.
  • the filtering subunit 312 is specifically configured to: use the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs; Filtering out the plurality of network traffic records from the plurality of network traffic records based on the filtering items set in advance based on the multiple privacy categories, and the form of the filtering items includes at least one of the following: custom UDF function , Key fields and regular items.
  • the parsing unit 320 specifically includes: a parsing subunit 321 configured to perform parsing processing on the plurality of network traffic records to obtain the API output data, and the API output data includes multiple fields;
  • the determining subunit 322 is configured to determine several third privacy categories corresponding to several privacy fields in the multiple fields;
  • the parsing unit specifically further includes: a subunit 323 configured to use the several third privacy categories as The plurality of second privacy categories; or the verification subunit 324 is configured to perform verification processing on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and include the third privacy categories that have passed the verification into all Describe several second privacy categories.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A method for evaluating the risk of leakage of private data. The method comprises: firstly, acquiring several system logs and several network traffic records generated by a requester requesting the calling of private data, stored in a service platform, of a target object, wherein each system log is generated on the basis of a request message sent by the requester to the service platform for calling an API, and each network traffic record comprises at least a response message returned by the service platform with regard to the request message (210); next, performing parsing processing on the several network traffic records to obtain parsing data (220); then, acquiring, from the service platform, permission data of the requester to call the API (230); next, comparing the several system logs with the permission data to obtain a first comparison result, and comparing the parsing data with the permission data to obtain a second comparison result (240); and then at least on the basis of the first comparison result and the second comparison result, evaluating the risk of leakage of private data with regard to the requester calling the API (250).

Description

针对隐私数据泄漏的风险评估方法及装置Risk assessment method and device for privacy data leakage 技术领域Technical field
本说明书一个或多个实施例涉及数据信息安全技术领域,尤其涉及针对隐私数据泄漏的风险评估方法及装置。One or more embodiments of this specification relate to the technical field of data information security, and in particular, to a risk assessment method and device for private data leakage.
背景技术Background technique
API(Application Programming Interface,应用程序接口)具有调用方便,通用性强等优点,目前已逐渐成为互联网网络服务的主要提供方式。因此,API调用也成为了防止数据泄漏的重点关注领域。API (Application Programming Interface) has the advantages of convenient calling and strong versatility, and has gradually become the main way of providing Internet network services. Therefore, API calls have also become a key focus area to prevent data leakage.
服务平台存储的数据中通常包括其所服务对象(如个人或企业等)的基本信息数据,以及在使用服务过程中产生的服务数据等。在服务对象授权的情况下,服务平台可以基于这些数据向数据需求方(如研究机构或商户等)提供API调用服务。通常情况下,数据需求方(或称请求方)通过API调用只能获得其具有使用权限的数据。然而,因不同请求方(包括散布在不同地域的请求方,如跨境商户等)的软硬件环境、IT架构和业务场景往往不同,且存在较大差异,导致API调用系统复杂,容易被不法分子利用,造成数据泄漏,这无疑给API调用的数据防护带来极大的挑战。尤其考虑到泄漏的数据中很可能包括用户的个人信息等隐私数据,对数据泄漏的防范就愈发迫切。The data stored by the service platform usually includes the basic information data of the objects it serves (such as individuals or enterprises, etc.), as well as the service data generated during the use of the service. In the case that the service object is authorized, the service platform can provide API call services to the data demander (such as research institutions or merchants, etc.) based on these data. Under normal circumstances, the data demander (or requester) can only obtain the data for which it has permission to use it through API calls. However, the software and hardware environments, IT architectures, and business scenarios of different requesters (including requesters scattered in different regions, such as cross-border merchants, etc.) are often different, and there are large differences, resulting in a complex API call system and easy to be illegal. Molecular use causes data leakage, which undoubtedly brings great challenges to the data protection of API calls. Especially considering that the leaked data is likely to include the user's personal information and other private data, the prevention of data leaks is becoming more urgent.
因此,需要一种合理、可靠的方案,可以针对API调用而发生数据泄漏,尤其是隐私数据泄漏的风险进行及时、准确地评估,以有效防止隐私数据的泄漏。Therefore, a reasonable and reliable solution is needed to conduct timely and accurate assessment of the risk of data leakage due to API calls, especially the risk of private data leakage, so as to effectively prevent the leakage of private data.
发明内容Summary of the invention
本说明书一个或多个实施例描述了一种针对隐私数据泄漏的风险评估方法及装置,可以针对API调用而发生隐私数据泄漏的风险进行及时、准确地评估,以有效防止隐私数据的泄漏。One or more embodiments of this specification describe a risk assessment method and device for privacy data leakage, which can conduct timely and accurate assessment of the risk of privacy data leakage due to API calls, so as to effectively prevent the leakage of privacy data.
根据第一方面,提供一种针对隐私数据泄漏的风险评估方法,该方法包括:获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录;其中,每条系统日志基于所述请求方向所述服务平台发出的调用API的请求消息而生成,并包括,根据所述请求消息确定的若干第一目标API,针对若干第一目 标API输入的第一参数,以及所述第一参数所对应的若干第一隐私类别;每条网络流量记录中至少包括所述服务平台针对该请求消息返回的响应消息。对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别。从所述服务平台获取所述请求方调用API的权限数据,所述权限数据包括所述请求方有权调用的API集合,针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合。将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果。至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。According to the first aspect, a risk assessment method for privacy data leakage is provided. The method includes: obtaining a number of system logs and a number of network traffic records generated by a requesting party requesting to call the privacy data of a target object stored in a service platform; wherein, Each system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, and first parameters input for the number of first target APIs , And several first privacy categories corresponding to the first parameter; each network traffic record includes at least a response message returned by the service platform for the request message. Analyzing the several network traffic records to obtain parsed data, which includes at least several second privacy categories corresponding to the API output data. Obtain from the service platform the permission data of the requester to call the API, the permission data includes the API set that the requester has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and all The privacy category set corresponding to the parameter set. The plurality of system logs are compared with the authority data to obtain a first comparison result, and the analysis data is compared with the authority data to obtain a second comparison result. Based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API.
在一个实施例中,其中获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录,包括:获取所述请求方调用服务平台提供的API而产生的多条系统日志和多条网络流量记录;基于预先设定的多个隐私类别,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录。In one embodiment, obtaining several system logs and several network traffic records generated by the requester requesting to call the privacy data of the target object stored in the service platform includes: obtaining the requestor generated by calling the API provided by the service platform Multiple system logs and multiple network traffic records; based on multiple preset privacy categories, filter the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records.
在一个具体的实施例中,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录,包括:利用所述多个隐私类别,对所述多条系统日志进行匹配,将匹配成功的系统日志作为所述若干系统日志;利用预先基于所述多个隐私类别设定的过滤项,从所述多条网络流量记录中筛选出所述若干网络流量记录,所述过滤项的形式包括以下中的至少一种:自定义UDF函数、关键字段和正则项。In a specific embodiment, filtering the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records includes: using the multiple privacy categories to perform A plurality of system logs are matched, and the successfully matched system logs are used as the plurality of system logs; the filter items set based on the plurality of privacy categories in advance are used to filter the plurality of network traffic records from the plurality of network traffic records For traffic records, the form of the filtering item includes at least one of the following: a custom UDF function, a key field, and a regular item.
在一个实施例中,其中对所述若干网络流量记录进行解析处理,得到解析数据,包括:对所述若干网络流量记录进行解析处理,得到所述API输出数据,所述API输出数据中包括多个字段;确定所述多个字段中若干隐私字段对应的若干第三隐私类别;将所述若干第三隐私类别作为所述若干第二隐私类别;或,基于所述若干隐私字段的字段值,对所述若干第三隐私类别进行验证处理,并将通过验证的第三隐私类别归入所述若干第二隐私类别。In one embodiment, the analytic processing of the plurality of network traffic records to obtain analytical data includes: analytic processing of the plurality of network traffic records to obtain the API output data, and the API output data includes multiple Fields; determine several third privacy categories corresponding to several privacy fields in the multiple fields; use the several third privacy categories as the several second privacy categories; or, based on the field values of the several privacy fields, Perform verification processing on the plurality of third privacy categories, and classify the verified third privacy categories into the plurality of second privacy categories.
在一个具体的实施例中,其中确定所述多个字段中若干隐私字段对应的若干第三隐私类别,包括:基于预先训练的自然语言处理模型,确定所述多个字段中若干隐私字段对应的若干第三隐私类别;或,基于预先设定的多个正则匹配规则,确定所述多个字段中若干隐私字段对应的若干第三隐私类别。In a specific embodiment, determining a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields includes: determining, based on a pre-trained natural language processing model, the number of privacy fields corresponding to the plurality of fields Several third privacy categories; or, based on multiple preset regular matching rules, determine several third privacy categories corresponding to several privacy fields in the multiple fields.
在一个具体的实施例中,所述若干隐私字段中包括任意的第一字段,对应所述若干第三隐私类别中的第一类别;其中基于所述若干隐私字段的字段内容,对所述若干第三类别进行验证处理,包括:利用预先存储的对应于所述第一类别的多个合法字段值,对所述第一字段进行匹配,并在匹配成功的情况下,判定所述第一类别通过验证;或,利用预先训练的针对所述第一类别的分类模型,对所述第一字段进行分类,在分类结果指示所述第一字段属于所述第一类别的情况下,判定所述第一类别通过验证。In a specific embodiment, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein, based on the field content of the plurality of privacy fields, the Performing verification processing for the third category includes: matching the first field by using a plurality of pre-stored legal field values corresponding to the first category, and in the case of a successful match, determining the first category Pass the verification; or, use a pre-trained classification model for the first category to classify the first field, and if the classification result indicates that the first field belongs to the first category, determine the The first category passed verification.
在一个实施例中,其中至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险,包括:将所述第一比对结果和第二比对结果共同输入预先训练的第一风险评估模型中,得到第一预测结果,指示所述隐私数据泄漏风险。In one embodiment, evaluating the privacy data leakage risk of the requester calling API based on at least the first comparison result and the second comparison result includes: comparing the first comparison result with the second comparison result. The results are jointly input into the pre-trained first risk assessment model, and the first prediction result is obtained, indicating the risk of leakage of the privacy data.
在一个实施例中,其中至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险,包括:根据所述若干系统日志和若干网络流量记录,确定监控指标的指标值,所述监控指标针对请求方API调用行为而预先设定;将预先获取的所述请求方的历史指标值与所述指标值进行比对,得到第三比对结果;基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API的隐私数据泄漏风险。In an embodiment, evaluating the privacy data leakage risk of the requester calling API based on at least the first comparison result and the second comparison result includes: according to the several system logs and several network traffic records, Determining an indicator value of a monitoring indicator, the monitoring indicator being preset for the requesting party's API call behavior; comparing the pre-obtained historical indicator value of the requesting party with the indicator value to obtain a third comparison result; Based on the first comparison result, the second comparison result, and the third comparison result, the privacy data leakage risk of the requester calling the API is evaluated.
在一个具体的实施例中,基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API的隐私数据泄漏风险,包括:结合预先设定的评估规则,根据所述第一比对结果、第二比对结果和第三比对结果,判断是否发生隐私数据泄漏;或,将所述第一比对结果、第二比对结果和第三比对结果共同输入预先训练的第二风险评估模型中,得到第二预测结果,指示所述隐私数据泄漏风险。In a specific embodiment, based on the first comparison result, the second comparison result, and the third comparison result, assessing the privacy data leakage risk of the requester calling the API includes: combining with a preset assessment Rule, according to the first comparison result, the second comparison result, and the third comparison result, determine whether privacy data leakage occurs; or, compare the first comparison result, the second comparison result and the third comparison result The results are jointly input into the pre-trained second risk assessment model, and the second prediction result is obtained, indicating the risk of leakage of the private data.
根据第二方面,提供一种针对隐私数据泄漏的风险评估装置,该装置包括:第一获取单元,配置为获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录;其中,每条系统日志基于所述请求方向所述服务平台发出的调用API的请求消息而生成,并包括,根据所述请求消息确定的若干第一目标API,针对若干第一目标API输入的第一参数,以及所述第一参数所对应的若干第一隐私类别;每条网络流量记录中至少包括所述服务平台针对该请求消息返回的响应消息。解析单元,配置为对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别。第二获取单元,配置为从所述服务平台获取所述请求方调用API的权限数据,所述权限数据包括所述请求方有权调用的API集合, 针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合。比对单元,配置为将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果。评估单元,配置为至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。According to a second aspect, a risk assessment device for privacy data leakage is provided. The device includes: a first acquiring unit configured to acquire a number of system logs and a number of system logs generated by a requester requesting to call the privacy data of a target object stored in a service platform Several network traffic records; among them, each system log is generated based on the request message for calling the API sent by the request to the service platform, and includes a number of first target APIs determined according to the request message, for a number of first The first parameter input by the target API, and several first privacy categories corresponding to the first parameter; each network traffic record includes at least the response message returned by the service platform for the request message. The parsing unit is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data. The second obtaining unit is configured to obtain from the service platform the permission data of the requester to call the API, the permission data including the API set that the requester has the right to call, and the parameters that the API set has the right to pass The composed parameter set, and the privacy category set corresponding to the parameter set. The comparison unit is configured to compare the plurality of system logs with the authority data to obtain a first comparison result, and to compare the analysis data with the authority data to obtain a second comparison result . The evaluation unit is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result.
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。According to a third aspect, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
根据第四方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。According to a fourth aspect, there is provided a computing device, including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
综上,在本说明书实施例提供的针对隐私数据泄漏的风险评估方法及装置中,通过获取请求方调用API产生的系统日志和网络流量记录,以及请求方调用API的权限数据,对网络流量进行解析得到解析数据,再将解析数据与权限数据进行比对,并将系统日志与权限数据进行比对,结合两个比对结果,评估请求方调用API造成隐私数据泄漏的风险,以及时检测、发现请求方的违规、异常调用行为。进一步地,还可以利用获取的系统日志和解析得到的网络流量记录,确定针对请求方行为设定的监控指标的指标值,再将该指标值与历史指标值进行比对,从而进一步提高风险评估结果的准确度和可用性。In summary, in the risk assessment method and device for privacy data leakage provided in the embodiments of this specification, the network traffic is performed by obtaining the system log and network traffic record generated by the requester calling API, and the permission data of the requesting party calling API. Analyze the parsed data, compare the parsed data with the permission data, and compare the system log with the permission data. Combine the two comparison results to assess the risk of privacy data leakage caused by the requester's API call, and timely detect, Violations and abnormal calling behaviors of the requesting party were found. Furthermore, the obtained system log and the parsed network traffic record can also be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.
附图说明Description of the drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained from these drawings.
图1示出根据一个实施例的风险评估方法的实施场景示意图。Fig. 1 shows a schematic diagram of an implementation scenario of a risk assessment method according to an embodiment.
图2示出根据一个实施例的针对隐私数据泄漏的风险评估方法流程图。Fig. 2 shows a flowchart of a risk assessment method for privacy data leakage according to an embodiment.
图3示出根据一个实施例的针对隐私数据泄漏的风险评估装置结构图。Fig. 3 shows a structural diagram of a risk assessment device for privacy data leakage according to an embodiment.
具体实施方式Detailed ways
下面结合附图,对本说明书提供的方案进行描述。The following describes the solutions provided in this specification with reference to the accompanying drawings.
如前所述,目前API调用过程中存在泄漏隐私数据的风险。在请求方属于跨境请求方(如跨境商户)的场景下,检测隐私数据泄漏风险尤为紧迫。具体地,国内某些大型企业(如阿里巴巴)的业务范围已扩展到境外,因此存在大量境外商户,数据跨境调用已成常态。境外商户应软硬件环境及业务场景与国内存在差异,现有数据防护架构难免存在不足,从而造成用户隐私数据泄漏。再者,不同境外商户的IT架构通常不同,造成API调用系统复杂,梳理难度大,容易被不法分子利用,导致隐私数据(如国内用户敏感数据)泄漏。As mentioned earlier, there is a risk of leaking private data during the current API call. In a scenario where the requesting party is a cross-border requesting party (such as a cross-border merchant), it is particularly urgent to detect the risk of privacy data leakage. Specifically, some large domestic enterprises (such as Alibaba) have expanded their business scope overseas, so there are a large number of overseas merchants, and cross-border data transfer has become the norm. The software and hardware environment and business scenarios of overseas merchants are different from those in China, and the existing data protection architecture will inevitably have inadequacies, resulting in leakage of user privacy data. Furthermore, the IT architectures of different overseas merchants are usually different, resulting in a complex API call system, difficult to sort out, and easy to be used by criminals, resulting in the leakage of private data (such as sensitive data of domestic users).
此外,因为API数量大、API开发管理漏洞难以避免等原因,API实际输出的数据内容与请求方实际请求获取的数据或者请求方具有使用权限的数据可能存在差别。例如,对于某个请求方无权调用的API,因API权限管理存在疏漏等原因,被该某个请求方非法调用,并输出用户的个人敏感信息,造成用户隐私泄漏。In addition, due to the large number of APIs and the difficulty of avoiding API development and management vulnerabilities, there may be differences between the data content actually output by the API and the data actually requested by the requester or the data for which the requester has usage rights. For example, an API that a certain requesting party does not have the right to call is illegally called by the certain requesting party due to omissions in API authority management, and outputs the user's sensitive personal information, resulting in leakage of user privacy.
又例如,某个请求方有权调用某个API,但是其与服务平台的签约数据中只包括该某个API可输出的全量数据(如用户性别、用户地址和用户手机号)中的部分数据内容(如用户性别)。然而,该某个请求方在调用该某个API时,除向该某个API传入对应于该部分数据内容的输入参数以外,还传入对应于全量数据中其他数据内容(如用户地址)的输入参数,因API权限管理存在疏漏等原因,导致该某个API返回给该某个请求方的数据(如用户性别和用户地址)超出签约的数据范围(如用户性别)。For another example, a requesting party has the right to call a certain API, but its contract data with the service platform only includes part of the data in the full amount of data that can be output by the certain API (such as user gender, user address, and user mobile phone number) Content (such as user gender). However, when the certain requester calls the certain API, in addition to passing in the input parameters corresponding to the part of the data content to the certain API, it also passes in other data content (such as the user address) corresponding to the full amount of data. The input parameters of, due to omissions in API authority management, etc., the data returned by the API to the requester (such as user gender and user address) exceeds the contracted data range (such as user gender).
再例如,请求方所调用的API接口,因一些旧的未更新的字段设置(如业务人员将用户手机号和身份证号拼接为一个字段),导致API接口输出数据的范围(如用户手机号和身份证号)与请求方的签约数据范围(如用户手机号)不一致。For another example, the API interface called by the requester, due to some old and unupdated field settings (such as the business personnel splicing the user's mobile phone number and ID number into one field), resulting in the range of data output by the API interface (such as the user's mobile phone number) And the ID number) is inconsistent with the requesting party's contract data range (such as the user's mobile phone number).
基于以上,发明人提出一种针对隐私数据泄漏的风险评估方法及装置。在一个实施例中,图1示出根据一个实施例的风险评估方法的实施场景示意图,如图1所示,请求方人员可以通过请求方客户端向服务平台发送API调用请求(或称请求消息),相应地,服务平台可以根据请求消息生成对应的系统日志,并且向请求方客户端返回API调用响应(或称响应消息)。可以理解,网关可以对请求消息和响应消息进行记录,产生对应的网络流量记录(或称网络流量日志)。Based on the above, the inventor proposes a risk assessment method and device for privacy data leakage. In an embodiment, FIG. 1 shows a schematic diagram of an implementation scenario of a risk assessment method according to an embodiment. As shown in FIG. 1, the requester personnel can send an API call request (or request message) to the service platform through the requester client. ), correspondingly, the service platform can generate a corresponding system log according to the request message, and return an API call response (or response message) to the requesting client. It can be understood that the gateway can record the request message and the response message, and generate a corresponding network traffic record (or called a network traffic log).
由此,风险评估装置可以从网关中获取系统日志和网络流量记录,并对获取的网络流量记录进行解析,得到解析数据;另一方面,风险评估装置还可以从服务平台中获取请求方调用API的权限数据。进一步地,风险评估装置可以将系统日志与权限数据进行 比对,并将解析数据与权限数据进行比对,再结合两个比对结果,评估请求方调用API造成隐私数据泄漏的风险,从而及时检测请求方的违规、异常调用行为。Thus, the risk assessment device can obtain system logs and network traffic records from the gateway, and analyze the obtained network traffic records to obtain analytical data; on the other hand, the risk assessment device can also obtain the requester from the service platform and call the API Permission data. Further, the risk assessment device can compare the system log with the permission data, and compare the analytical data with the permission data, and then combine the two comparison results to assess the risk of privacy data leakage caused by the requester calling the API, so as to be timely Detect violations and abnormal calling behaviors of the requesting party.
下面结合具体的实施例,描述上述风险评估方法的实施步骤。The following describes the implementation steps of the above risk assessment method in conjunction with specific embodiments.
首先需要说明的是,本说明书实施例中的描述用于“第一”、“第二”、“第三”等类似用语,仅用于区分同类事物,不具有其他限定作用。First, it should be noted that the descriptions in the embodiments of this specification are used for similar terms such as "first", "second", "third", etc., and are only used to distinguish similar things and do not have other limiting effects.
图2示出根据一个实施例的针对隐私数据泄漏的风险评估方法的流程图,所述方法的执行主体可以为任何具有计算、处理能力的装置或设备或平台或服务器集群等,例如,所述执行主体可以为图1中示出的风险评估装置,又例如,所述执行主体还可以为上述服务平台。Figure 2 shows a flowchart of a method for risk assessment of privacy data leakage according to an embodiment. The execution subject of the method can be any device or device or platform or server cluster with computing and processing capabilities, for example, the The execution body may be the risk assessment device shown in FIG. 1, for another example, the execution body may also be the above-mentioned service platform.
如图2所示,所述方法可以包括以下步骤S210至步骤S250。As shown in FIG. 2, the method may include the following steps S210 to S250.
步骤S210,获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录;其中,每条系统日志基于所述请求方向所述服务平台发出的调用API的请求消息而生成,并包括,根据所述请求消息确定的若干第一目标API,针对若干第一目标API输入的第一参数,以及所述第一参数所对应的若干第一隐私类别;每条网络流量记录中至少包括所述服务平台针对该请求消息返回的响应消息。步骤S220,对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别。步骤S230,从所述服务平台获取所述请求方调用API的权限数据,所述权限数据包括所述请求方有权调用的API集合,针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合。步骤S240,将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果。步骤S250,至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。Step S210: Obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the API call sent by the request to the service platform. A request message is generated, and includes, a number of first target APIs determined according to the request message, a first parameter input for the number of first target APIs, and a number of first privacy categories corresponding to the first parameters; each The network traffic record includes at least a response message returned by the service platform in response to the request message. Step S220: Analyze the several network traffic records to obtain parsed data, which includes at least several second privacy categories corresponding to the API output data. Step S230: Obtain the permission data of the requester to call the API from the service platform, the permission data includes the API set that the requester has the right to call, and the parameter set composed of the parameters that the API set has the right to pass in. , And the privacy category set corresponding to the parameter set. Step S240, comparing the plurality of system logs with the authority data to obtain a first comparison result, and comparing the analysis data with the authority data to obtain a second comparison result. Step S250, based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API.
以上步骤具体如下:首先,在步骤S210,获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录。The above steps are specifically as follows: First, in step S210, obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform.
在一个实施例中,其中请求方可以为个人或机构或企业等,其可以通过在服务平台中注册的账号登录服务平台,并在使用服务平台的过程中发起API调用请求。在一个例子中,上述请求方可以是跨境商户,上述服务平台可以是跨境商户系统或跨境商户开放平台。可以理解,服务平台中可以存储对大量服务对象的基础属性信息,以及大量服务对象在使用服务过程中产生的服务数据。比如说,服务对象在服务平台中进行注册时, 会填写一些注册信息,又或者,服务对象使用服务会产生订单数据、评价信息等。本说明书实施例中,将请求方请求调用的数据所针对的服务对象,称为目标对象。在一个实施例中,上述隐私数据可以包括服务平台中存储的全量数据。In an embodiment, the requesting party may be an individual, an organization, or an enterprise, etc., which may log in to the service platform through an account registered in the service platform, and initiate an API call request in the process of using the service platform. In an example, the requestor may be a cross-border merchant, and the service platform may be a cross-border merchant system or a cross-border merchant open platform. It can be understood that the service platform can store basic attribute information for a large number of service objects and service data generated by a large number of service objects in the process of using the service. For example, when the service object registers in the service platform, some registration information will be filled in, or the service object will generate order data and evaluation information when using the service. In the embodiments of this specification, the service object to which the data requested by the requester is targeted is referred to as the target object. In one embodiment, the above-mentioned private data may include the entire amount of data stored in the service platform.
下面对系统日志和网络流量的产生过程进行介绍。在一个实施例中,请求方可以向服务平台发送调用API的请求消息,服务平台在接收到请求消息后,基于该请求消息进行业务记录,生成对应的系统日志,并且,针对该请求消息生成响应消息,并将响应消息返回给请求方。可以理解,在物理层上,请求方和服务平台之间的通信会经过网关,具体地,请求方发送的请求消息会先上传至网关,再经由网关发送给服务平台,在此上行过程中网络可以对请求消息进行记录,另外服务平台返回给请求方的响应消息也会先下发至网关,再由网关发送给请求方,在此下行过程中,网关可以对响应消息进行记录,并且记录的请求消息和对应的响应消息可以组成一条网络流量记录。The process of generating system logs and network traffic is introduced below. In one embodiment, the requester may send a request message for calling the API to the service platform. After receiving the request message, the service platform records the business based on the request message, generates the corresponding system log, and generates a response to the request message Message and return the response message to the requester. It can be understood that at the physical layer, the communication between the requester and the service platform will pass through the gateway. Specifically, the request message sent by the requester will be uploaded to the gateway first, and then sent to the service platform through the gateway. During this uplink process, the network The request message can be recorded. In addition, the response message returned by the service platform to the requester will be sent to the gateway first, and then sent to the requester by the gateway. During this downlink process, the gateway can record the response message and record the response message. The request message and the corresponding response message can form a network traffic record.
对于上述系统日志的生成,首先需要说明的是,服务平台中存储其可以提供的API服务的配置信息。在一个实施例中,配置信息中包括可以每个API的名称,可以向每个API传入的全量参数,全量参数中每个参数所用于调用数据(如13800001111)的数据含义(手机号)。进一步地,服务平台在接收到请求消息之后,可以根据其存储的配置信息,确定请求消息中包括的目标API,针对目标API输入的参数,以及这些参数所对应的数据含义,进而生成系统日志。需要说明,本说明书实施例中,将与隐私相关的数据含义,称为隐私类别,具体地,可以包括用户手机号、公司总机号、身份证号、用户姓名等等。For the generation of the above system log, the first thing to note is that the service platform stores the configuration information of the API service it can provide. In one embodiment, the configuration information includes the name of each API, the full number of parameters that can be passed in to each API, and the data meaning (mobile phone number) of each parameter in the full number of parameters used to call data (such as 13800001111). Further, after receiving the request message, the service platform can determine the target API included in the request message, the parameters input for the target API, and the meaning of the data corresponding to these parameters according to the stored configuration information, and then generate a system log. It should be noted that in the embodiments of this specification, the meaning of privacy-related data is referred to as privacy category. Specifically, it may include the user's mobile phone number, company switchboard number, ID number, user name, and so on.
如上所述,在一个实施例中,上述隐私数据可以包括服务平台中存储的全量数据。如此,在本步骤中可以包括:获取请求方调用服务平台提供的API而产生的多条系统日志和多条网络流量记录,作为上述若干系统日志和若干网络流量记录。As mentioned above, in one embodiment, the above-mentioned private data may include the full amount of data stored in the service platform. In this way, this step may include: obtaining multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform, as the above-mentioned several system logs and several network traffic records.
在另一个实施例中,可以将风险评估重点聚焦到某些隐私类别,具体地,可以预先设定需要关注的多个隐私类别。基于此,在获取请求方调用API产生的多条系统日志和多条网络流量记录之后,需要根据预先设定的多个隐私类别,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录。In another embodiment, the risk assessment can be focused on certain privacy categories. Specifically, multiple privacy categories that need to be paid attention to can be preset. Based on this, after obtaining multiple system logs and multiple network traffic records generated by the requester calling API, it is necessary to filter the multiple system logs and multiple network traffic records according to multiple preset privacy categories , To obtain the several system logs and several network traffic records.
在一个具体的实施例中,上述过滤处理可以包括:利用所述多个隐私类别,对所述多条系统日志进行匹配,将匹配成功的系统日志作为所述若干系统日志。由上述可知,每条系统日志中包括根据对应的请求消息确定出的API、请求传入该API的参数,以及 该参数对应的可调用数据的含义。由此可以利用多个隐私类别对多条系统日志中的参数对应的数据含义进行匹配,如此可以匹配到数据含义中包括多个隐私类别中任一类别的系统日志,归入上述若干系统日志。In a specific embodiment, the above-mentioned filtering processing may include: using the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs. It can be seen from the above that each system log includes the API determined according to the corresponding request message, the parameters passed into the API by the request, and the meaning of the callable data corresponding to the parameters. In this way, multiple privacy categories can be used to match the data meanings corresponding to the parameters in multiple system logs, so that the data meanings can be matched to the system logs that include any of the multiple privacy categories, and they are included in the above-mentioned several system logs.
在另一个具体的实施例中,上述过滤处理还可以包括:利用预先基于所述多个隐私类别设定的过滤项,从所述多条网络流量记录中筛选出所述若干网络流量记录,所述过滤项的形式包括以下中的至少一种:自定义UDF函数、关键字段和正则项。需要理解,网络流量记录中包括请求消息和对应的响应消息,请求消息和响应消息中所包括字段的数据含义往往是不明确的,不同于系统日志包括基于API配置信息从请求消息中确定出的数据含义。因此,利用多个隐私类别直接进行匹配是难以实现过滤的。In another specific embodiment, the foregoing filtering processing may further include: filtering out the plurality of network traffic records from the plurality of network traffic records by using filtering items set in advance based on the plurality of privacy categories, so The form of the filter item includes at least one of the following: custom UDF function, key field, and regular item. It needs to be understood that the network traffic record includes the request message and the corresponding response message. The data meaning of the fields included in the request message and the response message is often ambiguous, which is different from the system log including the determination from the request message based on the API configuration information Data meaning. Therefore, it is difficult to achieve filtering by using multiple privacy categories to directly match.
上述过滤项可以基于多个隐私类别而预先设定,在一个例子中,可以包括针对手机号设定的正则项,用于匹配出具有以下特点的字段值:首位为1,且前三位属于已有网号(如中国移动网号138、139等),以将包含该字段值的网络流量记录归入上述若干网络流量记录。在一个例子中,可以包括针对身份证号设定的自定义函数(User-Defined Function,UDF),用于匹配出符合身份证号编码规则的字段值,以将包含该字段值的网络流量记录归入上述若干网络流量记录。在另一个例子中,可以包括针对用户姓名设定的关键字段,比如将用于调取用户姓名的API参数(如User_name)设定为关键字段,由此可以将包括该关键字段的网络流量记录归入上述若干网络流量记录。The above filtering items can be preset based on multiple privacy categories. In one example, it can include regular items set for mobile phone numbers to match field values with the following characteristics: the first digit is 1, and the first three digits belong to There are network numbers (such as China Mobile network numbers 138, 139, etc.) to classify network traffic records containing the value of this field into the above-mentioned several network traffic records. In an example, it may include a User-Defined Function (UDF) set for the ID number, which is used to match the value of the field that meets the ID number encoding rule, so as to record the network traffic containing the field value Included in the several network traffic records mentioned above. In another example, a key field set for the user’s name may be included. For example, an API parameter used to retrieve the user’s name (such as User_name) may be set as a key field, so that the key field may be included. The network traffic records are classified into the several network traffic records mentioned above.
以上在步骤S210,可以获取请求方请求调用目标对象的隐私数据而产生的若干系统日志和若干网络流量记录。In the above step S210, several system logs and several network traffic records generated by the requesting party requesting to call the private data of the target object can be obtained.
接着,在步骤S220,对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别。Next, in step S220, the several network traffic records are parsed to obtain parsed data, which includes at least several second privacy categories corresponding to the API output data.
在一个实施例中,本步骤可以包括:先对所述若干网络流量记录进行解析处理,得到所述API输出数据,所述API输出数据中包括多个字段。可以理解,是对网络流量记录中的响应消息进行解析得到上述API输出数据。然后确定多个字段中若干隐私字段对应的若干第三隐私类别。具体地,可以通过机器学习、正则匹配等方式实现。在一个具体的实施例中,可以基于预先训练的自然语言处理模型,确定所述多个字段中若干隐私字段对应的若干第三隐私类别。在一个例子中,其中自然语言处理模型可以包括Transformer、Bert等模型。在一个例子中,可以确定若干隐私字段包括李情深、似海有限公司、北京市青年路珍重大厦等,对应的若干第三隐私类别包括:用户姓名、企业名 称、地址等。在另一个具体的实施例中,可以基于预先设定的多个正则匹配规则,确定所述多个字段中若干隐私字段对应的若干第三隐私类别。在一个例子中,可以确定字段名为“phone”字段为隐私字段,且其对应的第三隐私类别为手机号。在另一个例子中,可以确定字段值中包括“@”和的字段为隐私字段,且其对应的第三隐私类别为邮箱地址。如此,可以确定出若干第三隐私类别。In one embodiment, this step may include: first parse the several network traffic records to obtain the API output data, and the API output data includes multiple fields. It can be understood that the above-mentioned API output data is obtained by analyzing the response message in the network traffic record. Then, several third privacy categories corresponding to several privacy fields in the multiple fields are determined. Specifically, it can be implemented by means of machine learning, regular matching, etc. In a specific embodiment, a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields may be determined based on a pre-trained natural language processing model. In an example, the natural language processing model may include Transformer, Bert, etc. models. In an example, it can be determined that several privacy fields include Li Qingshen, Sihai Co., Ltd., Beijing Qingnian Road Zhenzhong Building, etc. The corresponding third privacy categories include: user name, company name, address, and so on. In another specific embodiment, a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields may be determined based on a plurality of preset regular matching rules. In an example, it can be determined that the field named "phone" is a privacy field, and the corresponding third privacy category is a mobile phone number. In another example, it can be determined that the field including "@" and "@" in the field value is a privacy field, and the corresponding third privacy category is an email address. In this way, several third privacy categories can be determined.
进一步地,在一个具体的实施中,可以将上述若干第三隐私类别作为若干第二隐私类别。在另一个具体的实施例中,基于若干隐私字段的字段值,对所述若干第三隐私类别进行验证处理,并将通过验证的第三隐私类别归入所述若干第二隐私类别。在一个例子中,所述若干隐私字段中包括任意的第一字段,对应所述若干第三隐私类别中的第一类别,相应地,上述验证处理可以包括:利用预先存储的对应于所述第一类别的多个合法字段值,对所述第一字段进行匹配,并在匹配成功的情况下,判定所述第一类别通过验证。在一个具体的例子中,假定第一类别为用户姓名,第一字段为“欧茶”,上述多个合法字段值包括已实名认证的多个用户姓名,由此,可以查找多个用户姓名中是否存在欧茶,如果存在则将用户姓名归入若干第二隐私类别。Further, in a specific implementation, the above-mentioned several third privacy categories can be used as several second privacy categories. In another specific embodiment, verification processing is performed on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and the third privacy categories that have passed the verification are classified into the plurality of second privacy categories. In an example, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories. Accordingly, the verification process may include: using pre-stored data corresponding to the first category of the third privacy category. For multiple legal field values of a category, the first field is matched, and if the matching is successful, it is determined that the first category has passed the verification. In a specific example, suppose that the first category is the user name and the first field is "European tea". The above multiple legal field values include multiple user names that have been authenticated by real names. Therefore, you can search for multiple user names. Whether there is Oucha, if it exists, the user's name is classified into a number of second privacy categories.
在另一个例子中,上述验证处理还可以包括:利用预先训练的针对所述第一类别的分类模型,对所述第一字段进行分类,在分类结果指示所述第一字段属于所述第一类别的情况下,判定所述第一类别通过验证。在一个具体的例子中,假定第一类别为邮箱地址,且第一字段为:明天记得来吃饭,@小花,则分类结果指示该第一字段不是邮箱地址,再假定第一字段为58978@ali.cn,则分类结果指示该第一字段是邮箱地址,并将邮箱地址归入若干第二隐私类别。如此,可以在确定出若干第三隐私类别的基础上,进一步验证得到若干第二隐私类别,以保证确定出的若干第二隐私类别的准确性,进而使得后续得到的针对隐私数据泄漏的风险评估结果更加准确。In another example, the above verification processing may further include: using a pre-trained classification model for the first category to classify the first field, and the classification result indicates that the first field belongs to the first field. In the case of the category, it is determined that the first category has passed the verification. In a specific example, assume that the first category is an email address, and the first field is: remember to eat tomorrow, @小花, then the classification result indicates that the first field is not an email address, and then assume that the first field is 58978@ali .cn, the classification result indicates that the first field is an email address, and the email address is classified into several second privacy categories. In this way, on the basis of determining a number of third privacy categories, further verification can be obtained to obtain a number of second privacy categories to ensure the accuracy of the determined second privacy categories, thereby enabling subsequent risk assessments for privacy data leakage. The result is more accurate.
以上,可以得到响应消息中包括的API输出数据所对应的若干第二隐私类别。另一方面,可选地,还可以对网络流量记录中包括的请求消息进行解析。需要说明的是,上述系统日志的生成是在应用层上实现的,网络流量记录的产生是在底层,在工程实现上,对网络流量记录进行解析,难以获取上述服务平台中存储的完备的API配置信息进行精准解析。因此,往往需要考虑其他解析方式。在一个实施例中,解析数据中还包括对请求消息进行解析得到的若干第二目标API和针对若干第二目标API输入的第二参数。此处解析出的API和参数,相较系统日志中包括的API名称和参数而言,不那么精准,相对粗略。Above, several second privacy categories corresponding to the API output data included in the response message can be obtained. On the other hand, optionally, the request message included in the network traffic record can also be parsed. It should be noted that the generation of the above system log is implemented at the application layer, and the generation of network traffic records is at the bottom layer. In terms of engineering implementation, it is difficult to obtain the complete API stored in the above service platform to analyze the network traffic records. The configuration information is accurately analyzed. Therefore, other analytical methods often need to be considered. In an embodiment, the analysis data further includes several second target APIs obtained by parsing the request message and second parameters input for the several second target APIs. The API and parameters parsed here are less accurate and relatively rough compared to the API names and parameters included in the system log.
在一个具体的实施例中,可以利用预先基于多个API设定的API解析规则,从所述若干网络流量记录中解析出所述若干第二目标API,所述API解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项。在另一个具体的实施例中,可以利用预先基于多个参数设定的参数解析规则,从所述若干网络流量记录中解析出所述若干第二参数,所述参数解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项。需要说明的是,对上述API解析规则和参数解析规则中涉及的自定义UDF函数、关键字段和正则项,可以参见前述实施例中对过滤项的相关描述,在此不作赘述。In a specific embodiment, API parsing rules set based on multiple APIs in advance can be used to parse the plurality of second target APIs from the plurality of network traffic records, and the API parsing rules pass at least one of the following A formal definition: custom UDF functions, key fields and regular items. In another specific embodiment, a parameter parsing rule set based on a plurality of parameters in advance can be used to parse the plurality of second parameters from the plurality of network traffic records, and the parameter parsing rule may pass at least one of the following A formal definition: custom UDF functions, key fields and regular items. It should be noted that, for the custom UDF functions, key fields, and regular items involved in the above-mentioned API parsing rules and parameter parsing rules, please refer to the relevant description of the filtering items in the foregoing embodiments, which will not be repeated here.
以上,对若干网络流量记录进行解析,可以得到解析数据。另一方面,可以执行步骤S230,从所述服务平台获取所述请求方调用API的权限数据。Above, analyze several network traffic records to obtain analytical data. On the other hand, step S230 may be performed to obtain the requester's permission data for calling the API from the service platform.
具体地,上述权限数据包括所述请求方有权调用的API集合,针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合。在一个例子中,其中API集合可以包括一个或多个API的名称,如http://yiteng.cn/data/?id=91,https://niuqi.cn/data/?id=8等。在一个例子中,其中参数集合中的参数可以包括:gender、phone和add.。在一个例子中,其中隐私类别集合中的隐私类别可以包括性别、电话和地址。Specifically, the above-mentioned permission data includes the API set that the requester has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and the privacy category set corresponding to the parameter set. In an example, the API set may include the names of one or more APIs, such as http://yiteng.cn/data/? id=91, https://niuqi.cn/data/? id=8 and so on. In an example, the parameters in the parameter set may include gender, phone and add. In an example, the privacy categories in the privacy category set may include gender, phone number, and address.
在一个实施例中,上述服务平台中包括用户授权系统、签约系统和API管理系统等。需要理解,其中用户授权系统中可以存储个人用户或企业用户授权允许服务平台对外提供的部分隐私数据。其中签约系统中可以存储请求方与服务平台协商约定的请求方可以从服务平台请求获取的数据范围。API管理系统中包括服务平台可以提供给请求方调用的API接口文档等信息。基于此,可以从这些系统中分别获取相关数据,整理后再归入上述权限数据。In one embodiment, the above-mentioned service platform includes a user authorization system, a contract system, an API management system, and the like. It should be understood that the user authorization system can store part of the private data that individual users or enterprise users are authorized to allow the service platform to provide externally. The contracting system can store the data range that the requesting party can request from the service platform that the requesting party negotiates with the service platform. The API management system includes information such as API interface documents that the service platform can provide to the requester to call. Based on this, relevant data can be obtained separately from these systems, and then classified into the above-mentioned authority data after sorting.
如此,可以从服务平台中获取请求方调用API的权限数据。In this way, the permission data of the requester to call the API can be obtained from the service platform.
然后,在步骤S240,将若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果。Then, in step S240, a number of system logs are compared with the authority data to obtain a first comparison result, and the analysis data is compared with the authority data to obtain a second comparison result.
一方面,在一个实施例中,上述得到第一比对结果,可以包括:判断所述若干第一目标API是否属于所述API集合,得到第一判断结果,归入所述第一比对结果。需要理解,对于若干系统日志中每条系统日志中包括的若干第一目标API,均需要判断其是否属于权限数据中的API集合。在一个具体的实施例中,假定若干系统日志的目标API 包括http://user.cn/data/?id=00,上述API集合中包括http://user.cn/data/?id=00和http://company.cn/data/?id=66,通过比对可以确定若干系统日志中的目标API均属于API集合,不属于API集合的个数为0,由此可以将第一判断结果确定为0。On the one hand, in an embodiment, obtaining the first comparison result described above may include: determining whether the plurality of first target APIs belong to the API set, and obtaining a first determination result, which is included in the first comparison result . It needs to be understood that for several first target APIs included in each system log in several system logs, it is necessary to determine whether they belong to the API set in the permission data. In a specific embodiment, it is assumed that the target APIs of several system logs include http://user.cn/data/? id=00, the above API set includes http://user.cn/data/? id=00 and http://company.cn/data/? id=66. Through comparison, it can be determined that the target APIs in several system logs belong to the API set, and the number that does not belong to the API set is 0, so the first judgment result can be determined to be 0.
在另一个实施例中,上述得到第一比对结果,还可以包括:判断所述第一参数是否属于所述参数集合,得到第二判断结果,归入所述第一比对结果。需要理解,对于若干系统日志中每条系统日志中包括的第一参数,均需要判断其是否属于权限数据中的参数集合。在一个例子中,假定上述若干系统日志中的参数包括phone和IDnumber,上述参数集合中包括phone,通过比对可以确定IDnumber不属于参数集合,由此可以将第二判断结果确定为1。In another embodiment, obtaining the first comparison result described above may further include: judging whether the first parameter belongs to the parameter set, and obtaining a second judgment result, which is included in the first comparison result. It needs to be understood that for the first parameter included in each system log in several system logs, it is necessary to determine whether it belongs to the parameter set in the permission data. In an example, it is assumed that the parameters in the above-mentioned several system logs include phone and IDnumber, and the above-mentioned parameter set includes phone. Through comparison, it can be determined that IDnumber does not belong to the parameter set, and thus the second judgment result can be determined as 1.
在又一个实施例中,还可以包括:判断所述若干第一隐私类别是否属于所述隐私类别集合,得到第三判断结果,归入所述第一比对结果。需要理解,对于若干系统日志中每条系统日志中包括的若干第一隐私类别,均需要判断其是否属于权限数据中的隐私类别集合。在一个例子中,假定上述若干系统日志中的第三隐私类别包括手机号和身份证号,上述隐私类别集合中包括手机号,通过比对可以确定身份证号不属于隐私类别集合,由此可以将隐私类别比对结果确定为1。In yet another embodiment, it may further include: judging whether the several first privacy categories belong to the privacy category set, and obtaining a third judgment result, which is included in the first comparison result. It needs to be understood that for several first privacy categories included in each system log in several system logs, it is necessary to determine whether they belong to the privacy category set in the permission data. In an example, suppose that the third privacy category in the above several system logs includes mobile phone number and ID number, and the above privacy category set includes mobile phone number. Through comparison, it can be determined that the identity card number does not belong to the privacy category set. Determine the privacy category comparison result as 1.
由上可以得到第一判断结果、第二判断结果和第三判断结果,作为上述第一比对结果。From the above, the first judgment result, the second judgment result, and the third judgment result can be obtained as the first comparison result.
另一方面,在一个实施例中,上述得到第二比对结果,可以包括:判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果。在另一个实施例中,还可以包括:判断上述若干第二目标API是否属于所述API集合,得到第五判断结果,归入所述第二比对结果。在又一个实施例中,还可以包括:判断上述第二参数是否属于所述参数集合,得到第六判断结果,归入所述第二比对结果。On the other hand, in one embodiment, obtaining the second comparison result above may include: determining whether the plurality of second privacy categories belong to the privacy category set, obtaining a fourth determination result, and categorizing it into the second comparison result. The result. In another embodiment, it may further include: judging whether the plurality of second target APIs belong to the API set, and obtaining a fifth judgment result, which is included in the second comparison result. In yet another embodiment, it may further include: judging whether the second parameter belongs to the parameter set, and obtaining a sixth judgment result, which is included in the second comparison result.
以上,可以得到第一比对结果和第二比对结果。接着,在步骤S250,至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。Above, the first comparison result and the second comparison result can be obtained. Next, in step S250, based on at least the first comparison result and the second comparison result, the privacy data leakage risk of the requester calling the API is evaluated.
在一个实施例中,本步骤中可以包括:将所述第一比对结果和第二比对结果共同输入预先训练的第一风险评估模型中,得到第一预测结果,指示所述隐私数据泄漏风险。在一个更具体的实施例中,其中第一风险评估模型可以采用决策树、随机森林、adboost、神经网络等机器学习算法。在一个更具体的实施例中,其中第一预测结果可以为风险分类等级,如高、中、低等。在另一个更具体的实施例中,其中第一预测结果可以为风险 评估分数,如20或85等等。需要说明的是,对第一风险评估模型的使用过程和训练过程类似,因此对训练过程不作赘述。In an embodiment, this step may include: inputting the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, indicating that the privacy data is leaked risk. In a more specific embodiment, the first risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, neural networks, etc. In a more specific embodiment, the first prediction result may be a risk classification level, such as high, medium, and low. In another more specific embodiment, the first prediction result may be a risk assessment score, such as 20 or 85. It should be noted that the use process of the first risk assessment model is similar to the training process, so the training process will not be repeated.
在另一个实施例中,本步骤中可以包括:首先,根据所述若干系统日志和若干网络流量记录,确定监控指标的指标值,所述监控指标针对请求方API调用行为而预先设定;接着,将预先获取的所述请求方的历史指标值与所述指标值进行比对,得到第三比对结果;然后,基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API的隐私数据泄漏风险。In another embodiment, this step may include: firstly, according to the several system logs and several network traffic records, determining the index value of the monitoring index, the monitoring index being preset for the requesting party's API call behavior; then , Comparing the pre-obtained historical index value of the requesting party with the index value to obtain a third comparison result; then, based on the first comparison result, the second comparison result, and the third comparison result As a result, the privacy data leakage risk of the requester calling the API is evaluated.
在一个具体的实施例中,上述监控指标可以包括以下中的一种或多种:单位时间内请求方向所述服务平台发送的请求消息的条数,单位时间内请求方请求调用的隐私数据所对应的目标对象的个数,单位时间内请求方请求调用的隐私数据所对应的隐私类别的个数。在一个例子中,其中单位时间可以为每年、每月、每周、每天、每小时、每分钟等等。在一个具体的例子中,监控指标可以包括请求方每天的调用请求中包括的用户ID(可以从请求消息的入参中解析得到)的数量。In a specific embodiment, the above-mentioned monitoring indicators may include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the private data requested by the requesting party in a unit time. The number of corresponding target objects, the number of privacy categories corresponding to the privacy data requested by the requester in a unit time. In an example, the unit time can be yearly, monthly, weekly, daily, hourly, every minute, and so on. In a specific example, the monitoring indicator may include the number of user IDs (which can be parsed from the input parameters of the request message) included in the daily call request of the requesting party.
在一个具体的实施例中,上述历史指标值可以是根据请求方的调用隐私数据产生的历史系统日志和历史网络流量记录而确定的。在一个例子中,监控指标中可以包括请求方每分钟发出的请求消息的条数,假定针对该条数的历史指标值为20条,而确定当前确定出的指标值为100条,由此可以将4((100-20)/20)确定针对该条数的比对结果,归入上述第三比对结果。In a specific embodiment, the aforementioned historical indicator value may be determined based on historical system logs and historical network traffic records generated by the requesting party's invoking privacy data. In an example, the monitoring index may include the number of request messages sent by the requesting party per minute. Assuming that the historical index value for this number is 20, and the current determined index value is determined to be 100, it can be 4((100-20)/20) determines the comparison result for this number and belongs to the third comparison result mentioned above.
在一个具体的实施例中,可以结合预先设定的评估规则,根据所述第一比对结果、第二比对结果和第三比对结果,判断是否发生隐私数据泄漏。在一个例子中,其中评估规则可以包括:如果比对结果中超出权限范围的隐私类别包括用户身份证号,则判定请求方的API调用发送隐私数据泄漏。在另一个具体的实施例中,可以将所述第一比对结果、第二比对结果和第三比对结果共同输入预先训练的第二风险评估模型中,得到第二预测结果,指示所述隐私数据泄漏风险。在一个更具体的实施例中,其中第二风险评估模型可以采用决策树、随机森林、adboost、神经网络等机器学习算法。在一个更具体的实施例中,其中第二预测结果可以为风险分类等级,如极高、较高、中、较低、极低等。在另一个更具体的实施例中,其中第二预测结果可以为风险评估分数,如15或90等等。需要说明的是,对第二风险评估模型的使用过程和训练过程类似,因此对训练过程不作赘述。如此,可以基于上述三个比对结果,评估请求方调用的数据泄漏风险。In a specific embodiment, a preset evaluation rule may be combined to determine whether privacy data leakage occurs based on the first comparison result, the second comparison result, and the third comparison result. In an example, the evaluation rule may include: if the privacy category that exceeds the permission range in the comparison result includes the user ID number, it is determined that the requesting party's API call sends privacy data leakage. In another specific embodiment, the first comparison result, the second comparison result, and the third comparison result may be jointly input into a pre-trained second risk assessment model to obtain a second prediction result, indicating that all Describe the risk of privacy data leakage. In a more specific embodiment, the second risk assessment model may use machine learning algorithms such as decision trees, random forests, adboost, and neural networks. In a more specific embodiment, the second prediction result may be a risk classification level, such as extremely high, high, medium, low, extremely low, and so on. In another more specific embodiment, the second prediction result may be a risk assessment score, such as 15 or 90. It should be noted that the use process of the second risk assessment model is similar to the training process, so the training process will not be repeated. In this way, based on the above three comparison results, the risk of data leakage invoked by the requester can be evaluated.
综上,在本说明书实施例提供的针对隐私数据泄漏的风险评估方法中,通过获取请求方调用API产生的系统日志和网络流量记录,以及请求方调用API的权限数据,对网络流量进行解析得到解析数据,再将解析数据与权限数据进行比对,并将系统日志与权限数据进行比对,结合两个比对结果,评估请求方调用API造成隐私数据泄漏的风险,以及时检测、发现请求方的违规、异常调用行为。进一步地,还可以利用获取的系统日志和解析得到的网络流量记录,确定针对请求方行为设定的监控指标的指标值,再将该指标值与历史指标值进行比对,从而进一步提高风险评估结果的准确度和可用性。To sum up, in the risk assessment method for privacy data leakage provided by the embodiment of this specification, the system log and network traffic record generated by the requester calling the API, and the permission data of the requesting party calling the API are obtained by analyzing the network traffic. Analyze the data, then compare the parsed data with the permission data, and compare the system log with the permission data. Combine the two comparison results to evaluate the risk of privacy data leakage caused by the requester's API call, and detect and discover the request in a timely manner Party’s violations and abnormal calling behaviors. Furthermore, the obtained system log and the parsed network traffic record can also be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.
根据另一方面的实施例,本说明书还披露一种评估装置。具体地,图3示出根据一个实施例的针对隐私数据泄漏的风险评估装置结构图。如图3所示,所述装置300可以包括以下各单元。According to another embodiment, this specification also discloses an evaluation device. Specifically, FIG. 3 shows a structural diagram of a risk assessment device for privacy data leakage according to an embodiment. As shown in FIG. 3, the device 300 may include the following units.
第一获取单元310,配置为获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录;其中,每条系统日志基于所述请求方向所述服务平台发出的调用API的请求消息而生成,并包括,根据所述请求消息确定的若干第一目标API,针对若干第一目标API输入的第一参数,以及所述第一参数所对应的若干第一隐私类别;每条网络流量记录中至少包括所述服务平台针对该请求消息返回的响应消息。解析单元320,配置为对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别。第二获取单元330,配置为从所述服务平台获取所述请求方调用API的权限数据,所述权限数据包括所述请求方有权调用的API集合,针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合。比对单元340,配置为将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果。评估单元350,配置为至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。The first obtaining unit 310 is configured to obtain a number of system logs and a number of network traffic records generated by the requestor requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the request to the service platform The issued request message for calling the API is generated and includes a number of first target APIs determined according to the request message, first parameters input for the number of first target APIs, and a number of first parameters corresponding to the first parameters. Privacy category; each network traffic record includes at least the response message returned by the service platform for the request message. The parsing unit 320 is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data. The second obtaining unit 330 is configured to obtain from the service platform the permission data of the requester to call the API, the permission data includes the API set that the requester has the right to call, and the API set has the right to pass in A parameter set composed of parameters, and a privacy category set corresponding to the parameter set. The comparison unit 340 is configured to compare the plurality of system logs with the authority data to obtain a first comparison result, and to compare the analysis data with the authority data to obtain a second comparison result. The evaluation unit 350 is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result.
在一个实施例中,第一获取单元310具体包括:获取子单元311,配置为获取所述请求方调用服务平台提供的API而产生的多条系统日志和多条网络流量记录;过滤子单元312,配置为基于预先设定的多个隐私类别,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录。In an embodiment, the first obtaining unit 310 specifically includes: an obtaining subunit 311, configured to obtain multiple system logs and multiple network traffic records generated by the requester calling an API provided by the service platform; and a filtering subunit 312 , It is configured to filter the multiple system logs and multiple network traffic records based on multiple preset privacy categories to obtain the multiple system logs and multiple network traffic records.
在一个具体的实施例中,所述过滤子单元312具体配置为:利用所述多个隐私类别,对所述多条系统日志进行匹配,将匹配成功的系统日志作为所述若干系统日志;利用预 先基于所述多个隐私类别设定的过滤项,从所述多条网络流量记录中筛选出所述若干网络流量记录,所述过滤项的形式包括以下中的至少一种:自定义UDF函数、关键字段和正则项。In a specific embodiment, the filtering subunit 312 is specifically configured to: use the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the plurality of system logs; Filtering out the plurality of network traffic records from the plurality of network traffic records based on the filtering items set in advance based on the multiple privacy categories, and the form of the filtering items includes at least one of the following: custom UDF function , Key fields and regular items.
在一个实施例中,所述网络流量记录还包括所述请求消息,所述解析数据还包括对所述请求消息进行解析得到的若干第二目标API和针对若干第二目标API输入的第二参数。In an embodiment, the network traffic record further includes the request message, and the analysis data further includes several second target APIs obtained by parsing the request message and second parameters input for the several second target APIs .
在一个具体的实施例中,其中解析单元320还配置为:利用预先基于多个API设定的API解析规则,从所述若干网络流量记录中解析出所述若干第二目标API,所述API解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项;利用预先基于多个参数设定的参数解析规则,从所述若干网络流量记录中解析出所述若干第二参数,所述参数解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项。In a specific embodiment, the parsing unit 320 is further configured to parse the plurality of second target APIs from the plurality of network traffic records by using API parsing rules set based on a plurality of APIs in advance, and the API The parsing rules are defined in at least one of the following forms: custom UDF functions, key fields, and regular items; using parameter parsing rules set in advance based on multiple parameters to parse the number of network traffic records The second parameter, the parameter parsing rule is defined by at least one of the following forms: a custom UDF function, a key field, and a regular item.
在一个实施例中,所述解析单元320具体包括:解析子单元321,配置为对所述若干网络流量记录进行解析处理,得到所述API输出数据,所述API输出数据中包括多个字段;确定子单元322,配置为确定所述多个字段中若干隐私字段对应的若干第三隐私类别;所述解析单元具体还包括:归入子单元323,配置为将所述若干第三隐私类别作为所述若干第二隐私类别;或验证子单元324,配置为基于所述若干隐私字段的字段值,对所述若干第三隐私类别进行验证处理,并将通过验证的第三隐私类别归入所述若干第二隐私类别。In one embodiment, the parsing unit 320 specifically includes: a parsing subunit 321 configured to perform parsing processing on the plurality of network traffic records to obtain the API output data, and the API output data includes multiple fields; The determining subunit 322 is configured to determine several third privacy categories corresponding to several privacy fields in the multiple fields; the parsing unit specifically further includes: a subunit 323 configured to use the several third privacy categories as The plurality of second privacy categories; or the verification subunit 324 is configured to perform verification processing on the plurality of third privacy categories based on the field values of the plurality of privacy fields, and include the third privacy categories that have passed the verification into all Describe several second privacy categories.
在一个具体的实施例中,所述确定子单元322具体配置为:基于预先训练的自然语言处理模型,确定所述多个字段中若干隐私字段对应的若干第三隐私类别;或,基于预先设定的多个正则匹配规则,确定所述多个字段中若干隐私字段对应的若干第三隐私类别。In a specific embodiment, the determining subunit 322 is specifically configured to: determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields based on a pre-trained natural language processing model; or, based on a preset A plurality of predetermined regular matching rules are determined to determine a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields.
在另一个具体的实施例中,所述若干隐私字段中包括任意的第一字段,对应所述若干第三隐私类别中的第一类别;其中验证子单元324具体配置为:利用预先存储的对应于所述第一类别的多个合法字段值,对所述第一字段进行匹配,并在匹配成功的情况下,判定所述第一类别通过验证;或,利用预先训练的针对所述第一类别的分类模型,对所述第一字段进行分类,在分类结果指示所述第一字段属于所述第一类别的情况下,判定所述第一类别通过验证。In another specific embodiment, the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein the verification subunit 324 is specifically configured to: use a pre-stored corresponding Match the first field on the multiple legal field values of the first category, and if the matching is successful, determine that the first category has passed the verification; or, use a pre-trained target for the first The classification model of the category classifies the first field, and when the classification result indicates that the first field belongs to the first category, it is determined that the first category passes the verification.
在一个实施例中,所述比对单元340具体配置为:判断所述若干第一目标API是否属于所述API集合,得到第一判断结果,归入所述第一比对结果;判断所述第一参数是否属于所述参数集合,得到第二判断结果,归入所述第一比对结果;判断所述若干第一隐私类别是否属于所述隐私类别集合,得到第三判断结果,归入所述第一比对结果;判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果。In one embodiment, the comparison unit 340 is specifically configured to: determine whether the plurality of first target APIs belong to the API set, obtain a first determination result, and classify it into the first comparison result; determine Whether the first parameter belongs to the set of parameters, the second judgment result is obtained, and it is classified into the first comparison result; whether the plurality of first privacy categories belong to the set of privacy classifications is judged, and the third judgment result is obtained, which is classified into The first comparison result; it is determined whether the several second privacy categories belong to the privacy category set, and a fourth determination result is obtained, which is included in the second comparison result.
在一个实施例中,所述比对单元340还配置为:判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果;判断所述若干第二目标API是否属于所述API集合,得到第五判断结果,归入所述第二比对结果;判断所述第二参数是否属于所述参数集合,得到第六判断结果,归入所述第二比对结果。In an embodiment, the comparison unit 340 is further configured to: determine whether the plurality of second privacy categories belong to the privacy category set, obtain a fourth judgment result, and classify it into the second comparison result; Whether the plurality of second target APIs belong to the API set is obtained, and the fifth judgment result is obtained, and is classified into the second comparison result; whether the second parameter belongs to the parameter set is judged, and the sixth judgment result is obtained, which is classified into The second comparison result.
在一个实施例中,所述评估单元350具体配置为:将所述第一比对结果和第二比对结果共同输入预先训练的第一风险评估模型中,得到第一预测结果,指示所述隐私数据泄漏风险。In one embodiment, the evaluation unit 350 is specifically configured to: input the first comparison result and the second comparison result into a pre-trained first risk assessment model to obtain a first prediction result, and instruct the Risk of privacy data leakage.
在一个实施例中,所述评估单元350具体包括:处理子单元351,配置为根据所述若干系统日志和若干网络流量记录,确定监控指标的指标值,所述监控指标针对请求方API调用行为而预先设定;比对子单元352,配置为将预先获取的所述请求方的历史指标值与所述指标值进行比对,得到第三比对结果;评估子单元353,配置为基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API的隐私数据泄漏风险。In one embodiment, the evaluation unit 350 specifically includes: a processing subunit 351 configured to determine an indicator value of a monitoring indicator based on the number of system logs and a number of network traffic records, the monitoring indicator being specific to the requesting party’s API call behavior And preset; the comparison subunit 352 is configured to compare the pre-acquired historical index value of the requester with the index value to obtain a third comparison result; the evaluation subunit 353 is configured to be based on all The first comparison result, the second comparison result, and the third comparison result are used to evaluate the privacy data leakage risk of the requester calling the API.
在一个具体的实施例中,所述监控指标中包括以下中的一种或多种:单位时间内请求方向所述服务平台发送的请求消息的条数,单位时间内请求方请求调用的隐私数据所对应的目标对象的个数,单位时间内请求方请求调用的隐私数据所对应的隐私类别的个数。In a specific embodiment, the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the private data requested by the requester to call in a unit time The number of corresponding target objects, and the number of privacy categories corresponding to the privacy data requested by the requester in a unit time.
在另一个具体的实施例中,所述评估子单元353具体配置为:结合预先设定的评估规则,根据所述第一比对结果、第二比对结果和第三比对结果,判断是否发生隐私数据泄漏;或,将所述第一比对结果、第二比对结果和第三比对结果共同输入预先训练的第二风险评估模型中,得到第二预测结果,指示所述隐私数据泄漏风险。In another specific embodiment, the evaluation sub-unit 353 is specifically configured to: in combination with a preset evaluation rule, according to the first comparison result, the second comparison result, and the third comparison result, determine whether Leakage of privacy data occurs; or, the first comparison result, the second comparison result, and the third comparison result are jointly input into a pre-trained second risk assessment model to obtain a second prediction result, indicating the privacy data Risk of leakage.
综上,在本说明书实施例提供的针对隐私数据泄漏的风险评估装置中,通过获取请求方调用API产生的系统日志和网络流量记录,以及请求方调用API的权限数据,对网 络流量进行解析得到解析数据,再将解析数据与权限数据进行比对,并将系统日志与权限数据进行比对,结合两个比对结果,评估请求方调用API造成隐私数据泄漏的风险,以及时检测、发现请求方的违规、异常调用行为。进一步地,还可以利用获取的系统日志和解析得到的网络流量记录,确定针对请求方行为设定的监控指标的指标值,再将该指标值与历史指标值进行比对,从而进一步提高风险评估结果的准确度和可用性。To sum up, in the risk assessment device for privacy data leakage provided in the embodiment of this specification, the system log and network traffic record generated by the requester calling the API are obtained, and the permission data of the requesting party calling the API is obtained by analyzing the network traffic. Analyze the data, then compare the parsed data with the permission data, and compare the system log with the permission data. Combine the two comparison results to evaluate the risk of privacy data leakage caused by the requester's API call, and detect and discover the request in a timely manner Party’s violations and abnormal calling behaviors. Furthermore, the obtained system log and the parsed network traffic record can also be used to determine the indicator value of the monitoring indicator set for the requester’s behavior, and then compare the indicator value with the historical indicator value, thereby further improving the risk assessment Accuracy and availability of results.
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。According to another embodiment, there is also provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所述的方法。According to an embodiment of still another aspect, there is also provided a computing device, including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 2 method.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims (30)

  1. 一种针对隐私数据泄漏的风险评估方法,包括:A risk assessment method for private data leakage, including:
    获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录;其中,每条系统日志基于所述请求方向所述服务平台发出的调用API的请求消息而生成,并包括,根据所述请求消息确定的若干第一目标API,针对若干第一目标API输入的第一参数,以及所述第一参数所对应的若干第一隐私类别;每条网络流量记录中至少包括所述服务平台针对该请求消息返回的响应消息;Obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is based on the request message for calling the API sent by the request to the service platform Generate and include, several first target APIs determined according to the request message, first parameters input for the several first target APIs, and several first privacy categories corresponding to the first parameters; each network traffic record Includes at least the response message returned by the service platform in response to the request message;
    对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别;Analyzing the several network traffic records to obtain parsed data, which includes at least several second privacy categories corresponding to the API output data;
    从所述服务平台获取所述请求方调用API的权限数据,所述权限数据包括所述请求方有权调用的API集合,针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合;Obtain from the service platform the permission data of the requester to call the API, the permission data includes the API set that the requester has the right to call, the parameter set composed of the parameters that the API set has the right to pass in, and all The privacy category set corresponding to the parameter set;
    将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果;Comparing the plurality of system logs with the authority data to obtain a first comparison result, and comparing the analysis data with the authority data to obtain a second comparison result;
    至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。Based on at least the first comparison result and the second comparison result, assess the privacy data leakage risk of the requester calling the API.
  2. 根据权利要求1所述的方法,其中,获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录,包括:The method according to claim 1, wherein obtaining a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform includes:
    获取所述请求方调用服务平台提供的API而产生的多条系统日志和多条网络流量记录;Acquiring multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform;
    基于预先设定的多个隐私类别,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录。Based on multiple preset privacy categories, filtering the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records.
  3. 根据权利要求2所述的方法,其中,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录,包括:The method according to claim 2, wherein filtering the multiple system logs and multiple network traffic records to obtain the multiple system logs and multiple network traffic records includes:
    利用所述多个隐私类别,对所述多条系统日志进行匹配,将匹配成功的系统日志作为所述若干系统日志;Use the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the multiple system logs;
    利用预先基于所述多个隐私类别设定的过滤项,从所述多条网络流量记录中筛选出所述若干网络流量记录,所述过滤项的形式包括以下中的至少一种:自定义UDF函数、关键字段和正则项。The filter items set based on the plurality of privacy categories in advance are used to filter the plurality of network traffic records from the plurality of network traffic records, and the form of the filter items includes at least one of the following: custom UDF Functions, key fields and regular items.
  4. 根据权利要求1所述的方法,其中,所述网络流量记录还包括所述请求消息, 所述解析数据还包括对所述请求消息进行解析得到的若干第二目标API和针对若干第二目标API输入的第二参数。The method according to claim 1, wherein the network traffic record further includes the request message, and the analysis data further includes a number of second target APIs obtained by parsing the request message and a number of second target APIs. Enter the second parameter.
  5. 根据权利要求4所述的方法,其中,对所述若干网络流量记录进行所述解析处理,得到解析数据,包括:The method according to claim 4, wherein performing the analysis processing on the plurality of network traffic records to obtain analysis data comprises:
    利用预先基于多个API设定的API解析规则,从所述若干网络流量记录中解析出所述若干第二目标API,所述API解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项;The plurality of second target APIs are parsed from the plurality of network traffic records using API parsing rules set in advance based on multiple APIs, and the API parsing rules are defined in at least one of the following forms: custom UDF function , Key fields and regular items;
    利用预先基于多个参数设定的参数解析规则,从所述若干网络流量记录中解析出所述若干第二参数,所述参数解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项。The plurality of second parameters are parsed from the plurality of network traffic records by using parameter parsing rules set in advance based on a plurality of parameters, and the parameter parsing rules are defined in at least one of the following forms: custom UDF functions, Key fields and regular items.
  6. 根据权利要求1所述的方法,其中,对所述若干网络流量记录进行解析处理,得到解析数据,包括:The method according to claim 1, wherein the parsing process on the plurality of network traffic records to obtain parsing data comprises:
    对所述若干网络流量记录进行解析处理,得到所述API输出数据,所述API输出数据中包括多个字段;Analyzing the several network traffic records to obtain the API output data, and the API output data includes a plurality of fields;
    确定所述多个字段中若干隐私字段对应的若干第三隐私类别;Determining a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields;
    将所述若干第三隐私类别作为所述若干第二隐私类别;或,Use the plurality of third privacy categories as the plurality of second privacy categories; or,
    基于所述若干隐私字段的字段值,对所述若干第三隐私类别进行验证处理,并将通过验证的第三隐私类别归入所述若干第二隐私类别。Based on the field values of the plurality of privacy fields, the plurality of third privacy categories are verified, and the third privacy categories that have passed the verification are classified into the plurality of second privacy categories.
  7. 根据权利要求6所述的方法,其中,确定所述多个字段中若干隐私字段对应的若干第三隐私类别,包括:7. The method according to claim 6, wherein determining a plurality of third privacy categories corresponding to a plurality of privacy fields in the plurality of fields comprises:
    基于预先训练的自然语言处理模型,确定所述多个字段中若干隐私字段对应的若干第三隐私类别;或,Based on a pre-trained natural language processing model, determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields; or,
    基于预先设定的多个正则匹配规则,确定所述多个字段中若干隐私字段对应的若干第三隐私类别。Based on a plurality of preset regular matching rules, a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields are determined.
  8. 根据权利要求6所述的方法,其中,所述若干隐私字段中包括任意的第一字段,对应所述若干第三隐私类别中的第一类别;其中基于所述若干隐私字段的字段内容,对所述若干第三类别进行验证处理,包括:The method according to claim 6, wherein the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein based on the field content of the plurality of privacy fields, The verification processing of the several third categories includes:
    利用预先存储的对应于所述第一类别的多个合法字段值,对所述第一字段进行匹配,并在匹配成功的情况下,判定所述第一类别通过验证;或,Use pre-stored multiple legal field values corresponding to the first category to match the first field, and in the case of a successful match, determine that the first category passes the verification; or,
    利用预先训练的针对所述第一类别的分类模型,对所述第一字段进行分类,在分类 结果指示所述第一字段属于所述第一类别的情况下,判定所述第一类别通过验证。Use a pre-trained classification model for the first category to classify the first field, and if the classification result indicates that the first field belongs to the first category, determine that the first category passes the verification .
  9. 根据权利要求1所述的方法,其中,将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,包括:The method according to claim 1, wherein comparing the plurality of system logs with the permission data to obtain the first comparison result comprises:
    判断所述若干第一目标API是否属于所述API集合,得到第一判断结果,归入所述第一比对结果;Determine whether the plurality of first target APIs belong to the API set, obtain a first determination result, and classify it as the first comparison result;
    判断所述第一参数是否属于所述参数集合,得到第二判断结果,归入所述第一比对结果;Judging whether the first parameter belongs to the parameter set, and obtaining a second judgment result, which is included in the first comparison result;
    判断所述若干第一隐私类别是否属于所述隐私类别集合,得到第三判断结果,归入所述第一比对结果;Judging whether the several first privacy categories belong to the privacy category set, obtaining a third judgment result, and categorizing it as the first comparison result;
    其中将所述解析数据与所述权限数据进行比对,得到第二比对结果,包括:The comparison between the analysis data and the authorization data to obtain a second comparison result includes:
    判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果。It is determined whether the plurality of second privacy categories belong to the privacy category set, and a fourth determination result is obtained, which is included in the second comparison result.
  10. 根据权利要求4所述的方法,其中,将所述解析数据与所述权限数据进行比对,得到第二比对结果,包括:The method according to claim 4, wherein comparing the parsed data with the permission data to obtain a second comparison result comprises:
    判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果;Judging whether the plurality of second privacy categories belong to the privacy category set, obtaining a fourth judgment result, and categorizing it as the second comparison result;
    判断所述若干第二目标API是否属于所述API集合,得到第五判断结果,归入所述第二比对结果;Determine whether the plurality of second target APIs belong to the API set, obtain a fifth determination result, and classify it into the second comparison result;
    判断所述第二参数是否属于所述参数集合,得到第六判断结果,归入所述第二比对结果。It is judged whether the second parameter belongs to the parameter set, and a sixth judgment result is obtained, which is included in the second comparison result.
  11. 根据权利要求1所述的方法,其中,至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险,包括:The method according to claim 1, wherein, based on at least the first comparison result and the second comparison result, evaluating the privacy data leakage risk of the requester calling the API comprises:
    将所述第一比对结果和第二比对结果共同输入预先训练的第一风险评估模型中,得到第一预测结果,指示所述隐私数据泄漏风险。The first comparison result and the second comparison result are jointly input into a pre-trained first risk assessment model to obtain a first prediction result, indicating the risk of leakage of the privacy data.
  12. 根据权利要求1所述的方法,其中,至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险,包括:The method according to claim 1, wherein, based on at least the first comparison result and the second comparison result, evaluating the privacy data leakage risk of the requester calling the API comprises:
    根据所述若干系统日志和若干网络流量记录,确定监控指标的指标值,所述监控指标针对请求方API调用行为而预先设定;According to the several system logs and several network traffic records, determine the index value of the monitoring index, the monitoring index is preset for the requesting party's API call behavior;
    将预先获取的所述请求方的历史指标值与所述指标值进行比对,得到第三比对结果;Comparing the pre-obtained historical index value of the requesting party with the index value to obtain a third comparison result;
    基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API 的隐私数据泄漏风险。Based on the first comparison result, the second comparison result, and the third comparison result, assess the privacy data leakage risk of the requester calling the API.
  13. 根据权利要求12所述的方法,其中,所述监控指标中包括以下中的一种或多种:单位时间内请求方向所述服务平台发送的请求消息的条数,单位时间内请求方请求调用的隐私数据所对应的目标对象的个数,单位时间内请求方请求调用的隐私数据所对应的隐私类别的个数。The method according to claim 12, wherein the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the requester requests to call The number of target objects corresponding to the private data, and the number of privacy categories corresponding to the private data requested by the requesting party in a unit time.
  14. 根据权利要求12所述的方法,其中,基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API的隐私数据泄漏风险,包括:The method according to claim 12, wherein, based on the first comparison result, the second comparison result, and the third comparison result, evaluating the privacy data leakage risk of the requester calling the API comprises:
    结合预先设定的评估规则,根据所述第一比对结果、第二比对结果和第三比对结果,判断是否发生隐私数据泄漏;或,Combining with preset evaluation rules, determine whether privacy data leakage occurs according to the first comparison result, the second comparison result, and the third comparison result; or,
    将所述第一比对结果、第二比对结果和第三比对结果共同输入预先训练的第二风险评估模型中,得到第二预测结果,指示所述隐私数据泄漏风险。The first comparison result, the second comparison result, and the third comparison result are jointly input into a pre-trained second risk assessment model to obtain a second prediction result, indicating the risk of leakage of the private data.
  15. 一种针对隐私数据泄漏的风险评估装置,包括:A risk assessment device for private data leakage, including:
    第一获取单元,配置为获取请求方请求调用服务平台中存储的目标对象的隐私数据而产生的若干系统日志和若干网络流量记录;其中,每条系统日志基于所述请求方向所述服务平台发出的调用API的请求消息而生成,并包括,根据所述请求消息确定的若干第一目标API,针对若干第一目标API输入的第一参数,以及所述第一参数所对应的若干第一隐私类别;每条网络流量记录中至少包括所述服务平台针对该请求消息返回的响应消息;The first obtaining unit is configured to obtain a number of system logs and a number of network traffic records generated by the requesting party requesting to call the privacy data of the target object stored in the service platform; wherein, each system log is sent to the service platform based on the request The request message for calling the API is generated, and includes a number of first target APIs determined according to the request message, first parameters entered for the number of first target APIs, and a number of first privacy corresponding to the first parameters Category; each network traffic record includes at least the response message returned by the service platform for the request message;
    解析单元,配置为对所述若干网络流量记录进行解析处理,得到解析数据,其中至少包括API输出数据所对应的若干第二隐私类别;The parsing unit is configured to perform parsing processing on the plurality of network traffic records to obtain parsing data, which includes at least a plurality of second privacy categories corresponding to the API output data;
    第二获取单元,配置为从所述服务平台获取所述请求方调用API的权限数据,所述权限数据包括所述请求方有权调用的API集合,针对所述API集合有权传入的参数组成的参数集合,以及所述参数集合所对应的隐私类别集合;The second obtaining unit is configured to obtain from the service platform the permission data of the requester to call the API, the permission data includes the API set that the requester has the right to call, and the parameters that the API set has the right to pass in The composed parameter set, and the privacy category set corresponding to the parameter set;
    比对单元,配置为将所述若干系统日志与所述权限数据进行比对,得到第一比对结果,以及,将所述解析数据与所述权限数据进行比对,得到第二比对结果;The comparison unit is configured to compare the plurality of system logs with the authority data to obtain a first comparison result, and to compare the analysis data with the authority data to obtain a second comparison result ;
    评估单元,配置为至少基于所述第一比对结果和第二比对结果,评估所述请求方调用API的隐私数据泄漏风险。The evaluation unit is configured to evaluate the privacy data leakage risk of the requester calling the API based on at least the first comparison result and the second comparison result.
  16. 根据权利要求15所述的装置,其中,第一获取单元具体包括:The device according to claim 15, wherein the first obtaining unit specifically comprises:
    获取子单元,配置为获取所述请求方调用服务平台提供的API而产生的多条系统日志和多条网络流量记录;The obtaining subunit is configured to obtain multiple system logs and multiple network traffic records generated by the requester calling the API provided by the service platform;
    过滤子单元,配置为基于预先设定的多个隐私类别,对所述多条系统日志和多条网络流量记录进行过滤处理,得到所述若干系统日志和若干网络流量记录。The filtering subunit is configured to filter the multiple system logs and multiple network traffic records based on multiple preset privacy categories to obtain the multiple system logs and multiple network traffic records.
  17. 根据权利要求16所述的装置,其中,所述过滤子单元具体配置为:The device according to claim 16, wherein the filtering subunit is specifically configured as:
    利用所述多个隐私类别,对所述多条系统日志进行匹配,将匹配成功的系统日志作为所述若干系统日志;Use the multiple privacy categories to match the multiple system logs, and use the successfully matched system logs as the multiple system logs;
    利用预先基于所述多个隐私类别设定的过滤项,从所述多条网络流量记录中筛选出所述若干网络流量记录,所述过滤项的形式包括以下中的至少一种:自定义UDF函数、关键字段和正则项。The filter items set based on the plurality of privacy categories in advance are used to filter the plurality of network traffic records from the plurality of network traffic records, and the form of the filter items includes at least one of the following: custom UDF Functions, key fields and regular items.
  18. 根据权利要求15所述的装置,其中,所述网络流量记录还包括所述请求消息,所述解析数据还包括对所述请求消息进行解析得到的若干第二目标API和针对若干第二目标API输入的第二参数。The apparatus according to claim 15, wherein the network traffic record further includes the request message, and the analysis data further includes a number of second target APIs obtained by parsing the request message and a number of second target APIs. Enter the second parameter.
  19. 根据权利要求18所述的装置,其中,所述解析单元还配置为:The device according to claim 18, wherein the parsing unit is further configured to:
    利用预先基于多个API设定的API解析规则,从所述若干网络流量记录中解析出所述若干第二目标API,所述API解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项;The plurality of second target APIs are parsed from the plurality of network traffic records using API parsing rules set in advance based on multiple APIs, and the API parsing rules are defined in at least one of the following forms: custom UDF function , Key fields and regular items;
    利用预先基于多个参数设定的参数解析规则,从所述若干网络流量记录中解析出所述若干第二参数,所述参数解析规则通过以下中的至少一种形式定义:自定义UDF函数、关键字段和正则项。The plurality of second parameters are parsed from the plurality of network traffic records by using parameter parsing rules set in advance based on a plurality of parameters, and the parameter parsing rules are defined in at least one of the following forms: custom UDF functions, Key fields and regular items.
  20. 根据权利要求15所述的装置,其中,所述解析单元具体包括:The device according to claim 15, wherein the analyzing unit specifically comprises:
    解析子单元,配置为对所述若干网络流量记录进行解析处理,得到所述API输出数据,所述API输出数据中包括多个字段;A parsing subunit, configured to perform parsing processing on the several network traffic records to obtain the API output data, and the API output data includes a plurality of fields;
    确定子单元,配置为确定所述多个字段中若干隐私字段对应的若干第三隐私类别;A determining subunit, configured to determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields;
    所述解析单元具体还包括:归入子单元,配置为将所述若干第三隐私类别作为所述若干第二隐私类别;或,验证子单元,配置为基于所述若干隐私字段的字段值,对所述若干第三隐私类别进行验证处理,并将通过验证的第三隐私类别归入所述若干第二隐私类别。The parsing unit specifically further includes: a classification subunit configured to use the plurality of third privacy categories as the plurality of second privacy categories; or a verification subunit configured to be based on the field values of the plurality of privacy fields, Perform verification processing on the plurality of third privacy categories, and classify the verified third privacy categories into the plurality of second privacy categories.
  21. 根据权利要求20所述的装置,其中,所述确定子单元具体配置为:The device according to claim 20, wherein the determining subunit is specifically configured as:
    基于预先训练的自然语言处理模型,确定所述多个字段中若干隐私字段对应的若干第三隐私类别;或,Based on a pre-trained natural language processing model, determine a number of third privacy categories corresponding to a number of privacy fields in the plurality of fields; or,
    基于预先设定的多个正则匹配规则,确定所述多个字段中若干隐私字段对应的若干 第三隐私类别。Based on multiple preset regular matching rules, several third privacy categories corresponding to several privacy fields in the multiple fields are determined.
  22. 根据权利要求20所述的装置,其中,所述若干隐私字段中包括任意的第一字段,对应所述若干第三隐私类别中的第一类别;其中验证子单元具体配置为:The apparatus according to claim 20, wherein the plurality of privacy fields includes any first field corresponding to the first category of the plurality of third privacy categories; wherein the verification subunit is specifically configured as:
    利用预先存储的对应于所述第一类别的多个合法字段值,对所述第一字段进行匹配,并在匹配成功的情况下,判定所述第一类别通过验证;或,Use pre-stored multiple legal field values corresponding to the first category to match the first field, and in the case of a successful match, determine that the first category passes the verification; or,
    利用预先训练的针对所述第一类别的分类模型,对所述第一字段进行分类,在分类结果指示所述第一字段属于所述第一类别的情况下,判定所述第一类别通过验证。Use a pre-trained classification model for the first category to classify the first field, and if the classification result indicates that the first field belongs to the first category, determine that the first category passes the verification .
  23. 根据权利要求15所述的装置,其中,所述比对单元具体配置为:The device according to claim 15, wherein the comparison unit is specifically configured to:
    判断所述若干第一目标API是否属于所述API集合,得到第一判断结果,归入所述第一比对结果;Determine whether the plurality of first target APIs belong to the API set, obtain a first determination result, and classify it as the first comparison result;
    判断所述第一参数是否属于所述参数集合,得到第二判断结果,归入所述第一比对结果;Judging whether the first parameter belongs to the parameter set, and obtaining a second judgment result, which is included in the first comparison result;
    判断所述若干第一隐私类别是否属于所述隐私类别集合,得到第三判断结果,归入所述第一比对结果;Judging whether the several first privacy categories belong to the privacy category set, obtaining a third judgment result, and categorizing it as the first comparison result;
    判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果。It is determined whether the plurality of second privacy categories belong to the privacy category set, and a fourth determination result is obtained, which is included in the second comparison result.
  24. 根据权利要求18所述的装置,其中,所述比对单元还配置为:The device according to claim 18, wherein the comparison unit is further configured to:
    判断所述若干第二隐私类别是否属于所述隐私类别集合,得到第四判断结果,归入所述第二比对结果;Judging whether the plurality of second privacy categories belong to the privacy category set, obtaining a fourth judgment result, and categorizing it as the second comparison result;
    判断所述若干第二目标API是否属于所述API集合,得到第五判断结果,归入所述第二比对结果;Determine whether the plurality of second target APIs belong to the API set, obtain a fifth determination result, and classify it into the second comparison result;
    判断所述第二参数是否属于所述参数集合,得到第六判断结果,归入所述第二比对结果。It is judged whether the second parameter belongs to the parameter set, and a sixth judgment result is obtained, which is included in the second comparison result.
  25. 根据权利要求15所述的装置,其中,所述评估单元具体配置为:The device according to claim 15, wherein the evaluation unit is specifically configured to:
    将所述第一比对结果和第二比对结果共同输入预先训练的第一风险评估模型中,得到第一预测结果,指示所述隐私数据泄漏风险。The first comparison result and the second comparison result are jointly input into a pre-trained first risk assessment model to obtain a first prediction result, indicating the risk of leakage of the privacy data.
  26. 根据权利要求15所述的装置,其中,所述评估单元具体包括:The device according to claim 15, wherein the evaluation unit specifically comprises:
    处理子单元,配置为根据所述若干系统日志和若干网络流量记录,确定监控指标的指标值,所述监控指标针对请求方API调用行为而预先设定;The processing subunit is configured to determine the indicator value of the monitoring indicator according to the several system logs and several network traffic records, the monitoring indicator being preset for the requesting party's API call behavior;
    比对子单元,配置为将预先获取的所述请求方的历史指标值与所述指标值进行比对, 得到第三比对结果;The comparison subunit is configured to compare the historical index value of the requester obtained in advance with the index value to obtain a third comparison result;
    评估子单元,配置为基于所述第一比对结果、第二比对结果和第三比对结果,评估所述请求方调用API的隐私数据泄漏风险。The evaluation subunit is configured to evaluate the privacy data leakage risk of the requester calling the API based on the first comparison result, the second comparison result, and the third comparison result.
  27. 根据权利要求26所述的装置,其中,所述监控指标中包括以下中的一种或多种:单位时间内请求方向所述服务平台发送的请求消息的条数,单位时间内请求方请求调用的隐私数据所对应的目标对象的个数,单位时间内请求方请求调用的隐私数据所对应的隐私类别的个数。The device according to claim 26, wherein the monitoring indicators include one or more of the following: the number of request messages sent by the requester to the service platform in a unit time, and the requester requests to call The number of target objects corresponding to the privacy data, and the number of privacy categories corresponding to the privacy data requested by the requester in a unit time.
  28. 根据权利要求26所述的装置,其中,所述评估子单元具体配置为:The device according to claim 26, wherein the evaluation subunit is specifically configured as:
    结合预先设定的评估规则,根据所述第一比对结果、第二比对结果和第三比对结果,判断是否发生隐私数据泄漏;或,Combining with preset evaluation rules, determine whether privacy data leakage occurs according to the first comparison result, the second comparison result, and the third comparison result; or,
    将所述第一比对结果、第二比对结果和第三比对结果共同输入预先训练的第二风险评估模型中,得到第二预测结果,指示所述隐私数据泄漏风险。The first comparison result, the second comparison result, and the third comparison result are jointly input into a pre-trained second risk assessment model to obtain a second prediction result, indicating the risk of leakage of the private data.
  29. 一种计算机可读存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-14中任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed in a computer, the computer is caused to execute the method according to any one of claims 1-14.
  30. 一种计算设备,包括存储器和处理器,其中,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-14中任一项所述的方法。A computing device, comprising a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, the method according to any one of claims 1-14 is implemented.
PCT/CN2020/105106 2019-11-19 2020-07-28 Method and apparatus for evaluating risk of leakage of private data WO2021098274A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911131676.3A CN110851872B (en) 2019-11-19 2019-11-19 Risk assessment method and device for private data leakage
CN201911131676.3 2019-11-19

Publications (1)

Publication Number Publication Date
WO2021098274A1 true WO2021098274A1 (en) 2021-05-27

Family

ID=69602179

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105106 WO2021098274A1 (en) 2019-11-19 2020-07-28 Method and apparatus for evaluating risk of leakage of private data

Country Status (3)

Country Link
CN (1) CN110851872B (en)
TW (1) TWI734466B (en)
WO (1) WO2021098274A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154132A (en) * 2022-02-10 2022-03-08 北京华科软科技有限公司 Data sharing method based on service system
CN114301844A (en) * 2021-12-30 2022-04-08 天翼物联科技有限公司 Internet of things capability open platform flow control method, system and related components thereof
CN116170331A (en) * 2023-04-23 2023-05-26 远江盛邦(北京)网络安全科技股份有限公司 API asset management method, device, electronic equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851872B (en) * 2019-11-19 2021-02-23 支付宝(杭州)信息技术有限公司 Risk assessment method and device for private data leakage
CN112163222A (en) * 2020-10-10 2021-01-01 哈尔滨工业大学(深圳) Malicious software detection method and device
CN113360916A (en) * 2021-06-18 2021-09-07 奇安信科技集团股份有限公司 Risk detection method, device, equipment and medium for application programming interface
CN115296933B (en) * 2022-10-08 2022-12-23 国家工业信息安全发展研究中心 Industrial production data risk level assessment method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716313A (en) * 2013-12-24 2014-04-09 中国科学院信息工程研究所 User privacy information protection method and user privacy information protection system
CN106845236A (en) * 2017-01-18 2017-06-13 东南大学 A kind of application program various dimensions privacy leakage detection method and system for iOS platforms
CN109145603A (en) * 2018-07-09 2019-01-04 四川大学 A kind of Android privacy leakage behavioral value methods and techniques based on information flow
CN109716345A (en) * 2016-04-29 2019-05-03 普威达有限公司 Computer implemented privacy engineering system and method
CN110199508A (en) * 2016-12-16 2019-09-03 亚马逊技术有限公司 Sensitive data is distributed across the secure data of content distributing network
CN110851872A (en) * 2019-11-19 2020-02-28 支付宝(杭州)信息技术有限公司 Risk assessment method and device for private data leakage

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI355168B (en) * 2007-12-07 2011-12-21 Univ Nat Chiao Tung Application classification method in network traff
US9552478B2 (en) * 2010-05-18 2017-01-24 AO Kaspersky Lab Team security for portable information devices
CN104346566A (en) * 2013-07-31 2015-02-11 腾讯科技(深圳)有限公司 Method, device, terminal, server and system for detecting privacy authority risks
CN103533546B (en) * 2013-10-29 2017-03-22 无锡赛思汇智科技有限公司 Implicit user verification and privacy protection method based on multi-dimensional behavior characteristics
CN103761472B (en) * 2014-02-21 2017-05-24 北京奇虎科技有限公司 Application program accessing method and device based on intelligent terminal
TWI596498B (en) * 2016-11-02 2017-08-21 FedMR-based botnet reconnaissance method
CN109753808B (en) * 2018-11-19 2020-09-11 中国科学院信息工程研究所 Privacy leakage risk assessment method and device
CN109598146B (en) * 2018-12-07 2023-02-17 百度在线网络技术(北京)有限公司 Privacy risk assessment method and device
CN110334537B (en) * 2019-05-31 2023-01-13 华为技术有限公司 Information processing method and device and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103716313A (en) * 2013-12-24 2014-04-09 中国科学院信息工程研究所 User privacy information protection method and user privacy information protection system
CN109716345A (en) * 2016-04-29 2019-05-03 普威达有限公司 Computer implemented privacy engineering system and method
CN110199508A (en) * 2016-12-16 2019-09-03 亚马逊技术有限公司 Sensitive data is distributed across the secure data of content distributing network
CN106845236A (en) * 2017-01-18 2017-06-13 东南大学 A kind of application program various dimensions privacy leakage detection method and system for iOS platforms
CN109145603A (en) * 2018-07-09 2019-01-04 四川大学 A kind of Android privacy leakage behavioral value methods and techniques based on information flow
CN110851872A (en) * 2019-11-19 2020-02-28 支付宝(杭州)信息技术有限公司 Risk assessment method and device for private data leakage

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301844A (en) * 2021-12-30 2022-04-08 天翼物联科技有限公司 Internet of things capability open platform flow control method, system and related components thereof
CN114301844B (en) * 2021-12-30 2024-04-19 天翼物联科技有限公司 Flow control method and system for Internet of things capability open platform and related components thereof
CN114154132A (en) * 2022-02-10 2022-03-08 北京华科软科技有限公司 Data sharing method based on service system
CN114154132B (en) * 2022-02-10 2022-05-20 北京华科软科技有限公司 Data sharing method based on service system
CN116170331A (en) * 2023-04-23 2023-05-26 远江盛邦(北京)网络安全科技股份有限公司 API asset management method, device, electronic equipment and storage medium
CN116170331B (en) * 2023-04-23 2023-08-04 远江盛邦(北京)网络安全科技股份有限公司 API asset management method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110851872B (en) 2021-02-23
CN110851872A (en) 2020-02-28
TWI734466B (en) 2021-07-21
TW202121329A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
WO2021098274A1 (en) Method and apparatus for evaluating risk of leakage of private data
US10628828B2 (en) Systems and methods for sanction screening
CN110399925B (en) Account risk identification method, device and storage medium
US10037533B2 (en) Systems and methods for detecting relations between unknown merchants and merchants with a known connection to fraud
KR102514325B1 (en) Model training system and method, storage medium
US10924514B1 (en) Machine learning detection of fraudulent validation of financial institution credentials
US10346845B2 (en) Enhanced automated acceptance of payment transactions that have been flagged for human review by an anti-fraud system
CN111027094B (en) Risk assessment method and device for private data leakage
US20170140386A1 (en) Transaction assessment and/or authentication
US20190295085A1 (en) Identifying fraudulent transactions
CN106548342B (en) Trusted device determining method and device
TW201816678A (en) Illegal transaction detection method and illegal transaction detection device
US20140303993A1 (en) Systems and methods for identifying fraud in transactions committed by a cohort of fraudsters
CN109831459B (en) Method, device, storage medium and terminal equipment for secure access
US20140181007A1 (en) Trademark reservation system
US9471665B2 (en) Unified system for real-time coordination of content-object action items across devices
US11714919B2 (en) Methods and systems for managing third-party data risk
CN111489175B (en) Online identity authentication method, device, system and storage medium
CN112685774B (en) Payment data processing method based on big data and block chain finance and cloud server
US11736448B2 (en) Digital identity network alerts
CN106330811A (en) Domain name credibility determination method and device
CN112702410B (en) Evaluation system, method and related equipment based on blockchain network
US11144675B2 (en) Data processing systems and methods for automatically protecting sensitive data within privacy management systems
CN112613893A (en) Method, system, equipment and medium for identifying malicious user registration
CN113904828B (en) Method, apparatus, device, medium and program product for detecting sensitive information of interface

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20890858

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20890858

Country of ref document: EP

Kind code of ref document: A1