CN112182592B - Method and device for determining and processing safety of rule set - Google Patents

Method and device for determining and processing safety of rule set Download PDF

Info

Publication number
CN112182592B
CN112182592B CN202011374651.9A CN202011374651A CN112182592B CN 112182592 B CN112182592 B CN 112182592B CN 202011374651 A CN202011374651 A CN 202011374651A CN 112182592 B CN112182592 B CN 112182592B
Authority
CN
China
Prior art keywords
sub
rule set
hit
probability
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011374651.9A
Other languages
Chinese (zh)
Other versions
CN112182592A (en
Inventor
张文彬
李翰林
李漓春
殷山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ant Zhixin Hangzhou Information Technology Co ltd
Original Assignee
Ant Zhixin Hangzhou Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ant Zhixin Hangzhou Information Technology Co ltd filed Critical Ant Zhixin Hangzhou Information Technology Co ltd
Priority to CN202011374651.9A priority Critical patent/CN112182592B/en
Publication of CN112182592A publication Critical patent/CN112182592A/en
Application granted granted Critical
Publication of CN112182592B publication Critical patent/CN112182592B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/034Test or assess a computer or a system

Abstract

The specification provides a method and a device for determining and processing safety of a rule set. Based on the method, a rule set is obtained; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables; dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; and comparing the hit guess probability and the miss guess probability with preset thresholds respectively to determine the safety of the rule set. The risk that data leakage occurs due to the fact that a data provider runs an unsafe rule model is reduced.

Description

Method and device for determining and processing safety of rule set
Technical Field
The specification belongs to the technical field of internet, and particularly relates to a method, a device and a server for determining the security of a rule set.
Background
In some data processing scenarios, the model generator is often separate from the data provider.
Usually, the data provider can respond to the request of the model generator, and run the rule model provided by the model generator by using the data resource owned by the own party to obtain a corresponding processing result; and feeding back the processing result to the model generator. Therefore, the model generator can obtain a corresponding processing result on the premise of not contacting the data resources owned by the data provider; and can perform specific data processing according to the processing result.
However, if the rule model itself is not secure, the data provider may leak the data resources owned by the data provider during the execution of the rule model.
Therefore, a method for determining the security of the rule model more efficiently and accurately is needed.
Disclosure of Invention
The specification provides a method, a device and a server for determining the security of a rule set, so that the security of a rule model can be determined efficiently and accurately, and the risk of data leakage caused by the fact that a data provider runs an unsafe rule model is reduced.
The embodiment of the specification provides a method for determining the safety of a rule set, which comprises the following steps: acquiring a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables; dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; and comparing the hit guess probability and the miss guess probability with preset thresholds respectively to determine the safety of the rule set.
An embodiment of the present specification provides a device for determining security of a rule set, including: the acquisition module is used for acquiring a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables; the conversion module is used for dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; the first calculation module is used for calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; the second calculation module is used for calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; and the determining module is used for comparing the hit guess probability and the miss guess probability with a preset threshold respectively to determine the safety of the rule set.
An embodiment of the present specification provides a method for processing a rule set, including: receiving a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables; dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; refusing to deploy the rule set if at least one of the hit guess probability and the miss guess probability exceeds a preset threshold.
The embodiments provided in this specification may convert a rule set into sub-rule sets corresponding to each variable, and further calculate a hit guess probability and a miss guess probability corresponding to each variable based on an expression condition in each sub-rule set. It is achieved that the risk of leakage of the actual values of the variables is fully evaluated from the case of a hit and a miss of the rule set, respectively. Therefore, the risk degree of the rule model can be determined efficiently and accurately, and the risk of data leakage caused by the fact that the data provider runs the unsafe rule model is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, the drawings in the following description are only some embodiments described in the present specification, and other drawings can be obtained by those skilled in the art without inventive labor.
Fig. 1 is a schematic diagram of an embodiment of a structural component of a system of a method for determining security of a rule set provided in an embodiment of the present specification;
FIG. 2 is a diagram illustrating an embodiment of a system for determining the security of a rule set according to an embodiment of the present disclosure;
FIG. 3 is a flow diagram illustrating a method for determining the security of a rule set provided in one embodiment of the present description;
FIG. 4 is a schematic diagram of a device for determining the security of a rule set provided by one embodiment of the present description;
fig. 5 is a schematic structural diagram of an electronic device provided in one embodiment of the present specification;
FIG. 6 is a schematic diagram of a method for processing a rule set provided in one embodiment of the present description;
fig. 7 is a schematic diagram of a client provided in one embodiment of the present description.
Detailed Description
In order to make the technical solutions in the present specification better understood, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present specification shall fall within the protection scope of the present specification.
Some embodiments of the present disclosure provide a method for determining security of a rule model, which may be applied to a system including a first server, a second server, and a third server.
In particular, reference may be made to fig. 1. The first server may specifically include a server disposed on the model generator side. The second server may specifically include a server disposed on the data provider side. The third server may specifically include a third-party-side server that is responsible for detecting the security of the rule model. The third party may be understood as a service provider trusted by the model generator and the data provider and responsible for detecting the security of the rule model.
In particular, in order to perform corresponding data processing (for example, determining credit risk of a user) by using data resources owned by a data provider, the first server may configure and construct a rule model including only one rule set. The rule set may specifically include a plurality of rules. Each rule may include at least one expression, and multiple expressions may be connected by a logical operator (e.g., "logical and," "logical or").
And the first server sends the rule model to the second server, sends the rule model to the third server for detection at the same time, and sends a detection request about the safety of the rule model to the third server. The third server has authority to disassemble and read specific rules contained in the rule model.
The third server may receive and respond to the detection request, obtain data value distributions of variables used for detecting the rule model, and detect whether the rule model has a security risk according to the data value distributions of the attributes. The data value distribution of the attribute may be provided by the second server, or may be generated by the third server itself.
When the security of the rule model is specifically detected, the third server may first perform a certain conversion on the rule set to form a plurality of sub-rule sets. Each sub-rule set may include a variable. Therefore, the method converts the very complex logic expression of the rule set into the disjunctive normal form formed by combining a plurality of different expressions. In this manner, the security of each variable may be judged based on the set of sub-rules.
In the case that it is determined that the rule model is low in security risk, the third server may send a security prompt to the second server. After receiving the security prompt message, the second server can normally use the owned data resource to run the rule model to obtain a corresponding processing result; and feeding back the processing result to the first server. The first server may complete corresponding data processing (e.g., credit risk rating of the user, etc.) according to the processing result.
In the case that it is determined that the rule model has a greater security risk, the third server may generate and send risk prompt information to the second server. The second server can receive the risk prompt information and refuse to execute the rule model with the greater security risk, so that the data resource owned by the data provider can be effectively prevented from being leaked.
In this embodiment, the first server, the second server, and the third server may specifically include a server that is applied to a data processing system side and can implement functions such as data transmission and data processing. Specifically, the first server, the second server, and the third server may be electronic devices having data operation, storage functions, and network interaction functions, respectively. Alternatively, the first server, the second server, and the third server may also be software programs that run in the electronic device and provide support for data processing, storage, and network interaction, respectively. In this embodiment, the number of servers included in the first server, the second server, and the third server is not particularly limited. The first server, the second server, and the third server may be specifically one server, or may be several servers, or a server cluster formed by several servers.
In some embodiments, since some rule models have security risks, when a data provider runs such rule models with owned data resources to obtain corresponding processing results, the data provider still has the risk of data leakage, which threatens data security of the data provider.
For example, the model generator, when generating the rule model, intentionally configures the rule set n in the rule model as "monthly income =5000 dollars for the user". At this time, if the data provider directly utilizes the owned data resources, the data provider inquires the information data of the user L to be detected (for example, the monthly income of the user L is 5000 yuan); the information data of the user L is input to the rule model, and a processing result of the user L hitting the rule set n (for example, hitting the rule set n) is obtained and fed back to the model generator. In this case, although the data provider does not directly leak the information data that the monthly income data of the user L is 5000 yuan to the model generator, the model generator can accurately guess that the monthly income data of the user L is 5000 yuan based on the processing result. That is, the data resources of the data provider have been compromised.
Therefore, in order to avoid the leakage of the owned data resources when the data provider runs the rule model and protect the data security of the data provider, the data provider can entrust a third party to test the security of the rule model before running the rule model by using the own data resources; and under the condition that the third party determines that the rule model has no security risk, the data provider reuses the data resource of the third party to run the rule model.
In some embodiments, in specific implementation, the first server disposed on the model generator side sends the rule model to the second server, and also sends a detection request carrying the rule model to the third server to request the third server to perform security detection on the rule model.
The third server may obtain the rule model to be detected from the received detection request. The third server has the authority to disassemble the rule model and acquire data such as specific rules in the rule model according to a data processing protocol established before the third server and the first server.
In some embodiments, while the first server sends the rule model to the third server, some information related to the rule model, such as identification information of a rule set contained in the rule model, identification information of variables in the rule set, occurrence times of various attributes in the rule model, and the like, which is allowed to be disclosed to the third server, may also be sent to the third server as basic information of the rule model to assist the third server in performing security detection of the rule model. Accordingly, the third server can obtain the basic information of the rule model through the first server.
The identification information of the rule set may specifically be a name of the rule set, or may be a number of the rule set. The identification information of one rule set corresponds to one rule set. Variables may be components of rules included in a rule set. Further, variables may be used to represent attributes. In particular, the data object may have a plurality of variables, each of which may be used to represent a property corresponding to a usage scenario.
The third server may also obtain a data distribution for detecting actual values of variables of the rule model. The data distribution of the variable may specifically include a distribution ratio corresponding to an actual value in the sample data.
In specific implementation, the third server may obtain sample data provided by the second server and used for detecting the rule model, and calculate data distribution of actual values of corresponding variables according to the sample data. The third server may directly acquire the data distribution of the actual values of the variables disclosed by the second server, or the like. The data distribution of the actual values of the variables generated by the third server through data fitting and the like for the rule model to be detected can also be used. The data distribution of the actual value of the variable is obtained by the third server in a specific manner, and the specification is not limited thereto.
Please refer to fig. 2. The embodiment of the specification provides a method for determining the safety of a rule model, and the method can be particularly applied to a system comprising a client and a server.
The client may be an electronic device with certain computing capability, and the electronic device may be run with a software program. The electronic device may have a memory such that the electronic device may have certain data storage capabilities. Of course, a client may also refer to a software application running in an electronic device.
During the process of using the client by the user, a large amount of usage data is generated. The usage data may include account data of the user, usage behavior data of the user, and the like. The data can cover the use habit, the consumption habit, the travel habit or the eating habit and the like of the user. These data are often considered to be the privacy of the user. In some embodiments, users may want to enjoy more personalized services based on their privacy protection. For example, the software that the user wants to use may push news according to the user's preferences, recommend food based on the user's eating habits, recommend movies based on the user's viewing preferences, and so on. Thus, the user's privacy needs to be protected from disclosure, and the software application may require the user's usage data as a basis for providing further services.
In some implementations, the server can provide the rule model to the client, so that the client retrieves the usage data in the client according to the rule set in the rule model to feed back to the server hit. Therefore, the server can analyze the use preference of the user according to the hit result fed back by the client, and further provide personalized service.
However, in order to avoid privacy disclosure of the usage data of the user caused by the rule model, after receiving the rule model, the client may analyze the rule model to obtain sub-rule sets corresponding to each variable. Thus, the hit guess probability and the miss guess probability corresponding to each variable can be calculated. The client can be preset with a preset threshold, and the hit guess probability and the miss guess probability are respectively compared with the preset threshold to judge the safety of the rule model. The rule model may be run when the rule model is deemed to have a higher security. Under the condition that the rule model is considered to have larger security risk, the rule model can be refused to run, and therefore privacy of the user is protected.
Referring to fig. 3, the present specification provides a method for determining the security of a rule set. Wherein the method can be applied to a server. In particular implementations, the method may include the following.
Step S201: acquiring a rule set; wherein the rule set includes a plurality of expressions formed by part or all of a plurality of variables.
In some embodiments, the server may be specifically understood as a server disposed on the side of the model detector. The model detector may be specifically understood as a server which is independent of the data provider and the model generator and is trusted by both the data provider and the model generator and responsible for detecting the security of the rule model.
In some embodiments, the method may also be applied to a server disposed on the model generator side, in case the model generator allows disclosure of the rule model to the data provider. That is, the data provider may detect the security of the rule model and the like by the server using this method.
In some embodiments, in the case that the model generator needs to perform self-checking on the generated rule model, the method may also be applied to a first server disposed on the model generator side. That is, the model generator may also detect the security of the generated rule model by the first server using this method. Specifically, the first server sends the generated rule model to the second server only when the first server determines that the rule model is not at a security risk through detection.
The embodiments of the present description will be specifically described mainly by taking an example in which the method is applied to the third server. For the case of application to the first server and the second server, the following embodiment applied to the third server may be referred to.
In some embodiments, the rule model may specifically include a set of rules. The rule set may further include one or more rules. In some embodiments, the above-mentioned rules are used to detect whether certain attribute characteristics of the data object satisfy a certain preset data value range. The above rules may specifically take the form of expressions. The expression may include: variables, operators, and data thresholds.
The variable may be understood as a parameter data characterizing a certain property of the data object. For example, the variables may specifically represent monthly income, default rate, height, occupation, and the like. The data threshold may be specifically understood as an upper limit value and/or a lower limit value of a data value set for a certain variable in a rule. E.g., 1000 yuan, 15 times, 5%, etc. The operator is understood in particular as a symbol in a rule defining a decision relationship between an attribute and a data threshold. For example, > (greater than signs), < (less than signs), ≧ or (greater than or equal to signs), and the like. Of course, the variables, operators, and data thresholds listed above are only illustrative.
Specifically, for example, in rule 1 "user's monthly income >1000 yuan", the attribute is "monthly income", the operator is ">, and the data threshold is" 1000 yuan ". If a user has monthly revenue data of 2000 dollars, greater than 1000 dollars, it is understood that the user hits rule 1. If a user's monthly revenue data is 500 dollars, less than 1000 dollars, it may be understood that the user did not hit rule 1.
In some embodiments, the rule set may include only one rule. For example, rule set 1 may contain only one rule, rule 1. If a user hits rule 1, it can be understood that the user hits rule set 1. If a user does not hit rule 1, then it can be understood that the user does not hit rule set 1.
In some embodiments, the rule set may also include a plurality of different rules. The different rules can be connected together through a rule connecting word to form a complex expression. The rule connecting words may specifically include a logical AND "&" OR "AND", a logical OR "|" OR ", AND the like connecting words.
For example, in rule set 2 "number of default times of user >5 times, or default rate of user > 0.5", a rule set 2 is formed by connecting together rule conjunction "or". If the number of violations of a user meets at least one of rule 2 and rule 3, then rule set 2 is considered to be hit. If a user's number of violations does not meet either rule 2 or rule 3, then rule set 2 is understood to have failed to hit.
In some embodiments, a rule set may include a number of different variables. Therefore, the rules in the rule set can be constructed from different dimensions, and hit results from different angles are obtained. Therefore, the information of the data object can be known more comprehensively. In particular, each rule may include one or more variables. For example, a rule may include multiple expressions formed from a single variable, or a rule may include more complex expressions formed from multiple variables. Further, each expression may include all variables included in the rule set, or, of course, may include only some variables in the expression.
In some implementations, the model generator and the data provider can be separate. In this case, the model generator may transmit the rule model described above to the data provider. The data provider can utilize the data resources owned by the data provider, such as a database containing information data of a large number of data objects, to run the rule model to obtain the corresponding processing result; and feeding the processing result back to the model generator so that the model generator can obtain and utilize the processing result to complete corresponding data processing. This also reduces the risk of data resources owned by the data provider being compromised.
Step S202: dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein the variables included in different sub-rule sets are different.
In some embodiments, the rule set may be divided into a plurality of sub-rule sets respectively corresponding to the variables, which may include: analyzing the rules included in the rule set into expressions respectively including designated operators; the expression including the designated operator is divided into a plurality of sub-rule sets according to the variables involved.
Specifically, the rule set can be split and converted, so that the rule set can be converted into a clearer normal form from a more complex expression. In particular, a rule set may be converted into a plurality of rules connected by a specified rule connector. Further, each rule may only include the same kind of rule conjunctions. Specifically, for example, in an expression including a specific operator, the specific operator includes a logical and, and different expressions are connected through a logical or. Specifically, for example, one rule set RS = (X)1≤a1 || X2≤a2)&(X1>a1 || X3>a3). Wherein (X)1≤a1 || X2≤a2) The whole can be used as a rule, two expressions are included in the whole, (X)1>a1’ || X3>a3) As another rule, two expressions are also included. After conversion, RS = (a)1 <X1≤a1)||(X1≤a1 & X3>a3)||(X1>a1 & X2≤a2)||(X2≤a2 & X3>a3). Wherein, X1、X2And X3Can represent a variable, a1 、a1、a2And a3May represent a data threshold. At this time, (a)1 <X1≤a1) Can be regarded as a sub-rule RS1,(X1≤a1 & X3>a3) Can be regarded as a sub-rule RS2,(X1>a1 & X2≤a2) Can be regarded as a sub-rule RS3,(X2≤a2 & X3>a3) Can be regarded as a sub-rule RS4. Obtained, RS = RS1 || RS2 || RS3 || RS4. Thus, for any rule set, the rule set can be converted to a paradigm, RS = RS1|| …… || RSiWherein i is a positive integer.
Further, each of the sub-rule sets includes only expressions that relate to the same variable; the rule set is converted into a plurality of sub-rule sets through logic or connection. In particular, for RS = RS1 || …… || RSiEach RS in (1)iCan be decomposed into RSi=RSi1 & …… &RSijWherein j is a positive integer. Each sub-rule set RSijCan correspond to a variable X respectivelyj. Further, the security risk level of each variable may be determined by analyzing the security risk profile of each sub-rule set. In particular, expressions that all relate to a variable may be aggregated into a sub-rule set such that each variable corresponds to a sub-rule set and only expressions with that variable are included in the sub-rule set. Of course, it is also possible to include only expressions for each sub-rule set that refer to the same variable, but one variable may correspond to one or more sub-rule sets.
Step S203: and calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit.
Step S204: and calculating the miss guessing probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit.
In some implementations, a sub-rule set being hit may satisfy a range of data values for a rule in the sub-rule set for some attribute characteristic of the data object. In particular, for example, the data object may correspond to a particular user and the attribute may represent the revenue of the user. For example, where the user revenue is 5000 dollars/month, a sub-rule set may include: (0 < monthly payroll & monthly payroll <5000 yuan) and (monthly payroll ≦ 7000). At this point the user receives 5000 dollars/month, and the range of data values is met (monthly payroll ≦ 7000), so that the sub-rule set is hit.
In some embodiments, the actual value corresponding to the variable may be the actual value of the attribute of the data object in the data source. The actual value of the data source may typically be privacy related content. So that the actual value of the attribute needs to be protected to avoid privacy disclosure. However, in real life, some usage scenarios need to be based on historical behavior data or basic information of the user as a basis for whether the service provider provides services for the user. At this time, the service provider can be used as a model user, and the hit rule distribution is obtained by calling the rule model, so that the rule distribution can be used as a reference for providing services for the user.
In some embodiments, the hit guess probability may be used to represent the probability that, in the case of a sub-ruleset hit, the actual value corresponding to the variable contained in the sub-ruleset is guessed. It will be appreciated that the actual value of the variable will be guessed with a different probability depending on the setting of the rules in the set of sub-rules. Specifically, for example, a rule in the sub-rule set may be a work age =5 years, and if the rule hits, it also directly results in an actual work age value of the corresponding user of 5 years, which is very easy to directly guess. Alternatively, the rules in the sub-set of rules may be 1 year < age < 30 years, at which point the probability of guessing that the actual value of the user's age can be guessed 5 years is much lower.
Thus, the guessing probability of the actual value can be adjusted by setting different data value ranges of the rule. Specifically, the less total data included in a range of data values in a rule, the greater the probability that the corresponding actual value can be guessed in the event that the rule is hit. Specifically, for example, a rule may be 21 < age ≦ 23, where the data value range of the rule defines data comprising 22 and 23, and it is apparent that when the rule is hit, the probability of actual value being guessed may be up to 50%. It is close to revealing the actual data of the data source directly. In some cases, the more total data included in a range of data values in a rule, the greater the probability that the corresponding actual value can be guessed in the event that the rule is not hit. Specifically, for example, the rule may include that the real estate number is greater than >1, and when the rule is not hit, the actual value also has only 0 and 1, and two data, and when the actual value is guessed, the guessing probability of the actual value is also 50%. In some embodiments, in the case of a hit as described above, the directly calculated guess probability may be used as the hit guess probability. In some embodiments, in the case of a miss as described above, the directly calculated guess probability may be taken as the miss guess probability.
It can be understood that the guessing probability of the actual value being guessed is used as an index, and the safety risk degree of the rule set can be comprehensively evaluated. Namely, risks are respectively checked from two dimensions of rule hit and rule miss, actual data values of the data source can be well protected, and risks that the actual values of the data are leaked are reduced.
In some embodiments, the rule set may be converted to a binary tree structure using structural features of the binary tree structure. And further, structural splitting is carried out on the rule model of the binary tree structure, and a plurality of nodes are obtained. And then, calculating the plurality of nodes one by adopting a recursive algorithm so as to quickly and efficiently calculate the guessing probability of the attribute under the condition of hit of the rule model and the guessing probability of the attribute under the condition of miss of the rule model. Specifically, each sub rule set may be converted into a binary tree structure, and the corresponding guess probability in the case of hit and the guess probability in the case of miss of the sub rule set may be calculated.
In some embodiments, the way the hit or miss probabilities are calculated is not limited to the above description. Both may also be conditional probabilities, i.e., the probability that a rule set is hit needs to be further computed, or the probability that a rule set is not hit. Either the probability that a particular rule is hit or the probability that a rule is not hit. And then, under the condition of hit or miss, calculating the corresponding guess probability to obtain the hit guess probability or the miss guess probability. Of course, in some embodiments, the hit-guess probability may be calculated in the same or similar manner as the miss-guess probability. In some embodiments, the hit or miss probabilities may be calculated in a manner that is different or dissimilar from the miss or miss probabilities. For example, the foregoing embodiments may be used to calculate hit guess probabilities, while conditional probabilities are used to calculate miss guess probabilities; of course, the foregoing embodiments may be used to calculate the miss-guess probability, and the conditional probability may be used to calculate the hit-guess probability.
In some embodiments, in the step of calculating a hit-guess probability that the actual value corresponding to the variable included in the sub-ruleset can be guessed in the case that the sub-ruleset is hit, the method includes: calculating the conditional probability of the sub-rule set being hit under the condition that the rule set is hit; obtaining a first guess probability that the actual value corresponding to the variable in the sub rule set can be guessed under the condition that the sub rule set is hit according to the variable expression included in the sub rule set; the hit guess probability is derived using the conditional probability that the sub-ruleset was hit and the first guess probability.
Specifically, the probability of hit in the rule set may be calculated first, and then the conditional probability of hit in the sub-rule set may be calculated. First, a formula can be calculated according to probability
Figure DEST_PATH_IMAGE001
. In particular, for example, in combination with the aforementioned examples RS = RS1 || RS2 || RS3 || RS4The following can be obtained:
Figure 820751DEST_PATH_IMAGE002
wherein i, j and k are all positive integers within the number 1-4, and i, j and k are used in the expansion formula for visually expressing the relationship. The specific conversion process can be known to those skilled in the art from the above description.
Since each RS after conversioniThe inside only comprises logic AND operation, and on the basis, the rules of the same variable can be merged. That is, the combination between different rules involving the same variable can be decomposed into the calculation of probabilities for different variables. Specifically, for example: p (RS)1 & RS2)=P((a1 <X1≤a1) & (X1≤a1 & X3>a3)) =P((a1 <X1≤a1) & X3>a3)=P(a1 <X1≤a1) × P(X3>a3) Or, P (RS)2 & RS3) = P((X1≤a1 & X3>a3) & (X1>a1 & X2≤a2)) = P((a1 <X1≤a1) & X3>a3 & X2≤a2)=P(a1 <X1≤a1) × P(X3>a3) × P(X2≤a2). Similarly, in the above exemplary manner, a hit probability P (RS) of the rule set can be calculated.
Further, in the event that a rule set is hit, the sub-rule set RSiThe conditional probability of being hit may be:
Figure DEST_PATH_IMAGE003
since, for each RSiAll include a variable XjThis is so that the conditional probability under hit condition for each sub-ruleset:
Figure 675575DEST_PATH_IMAGE004
according to the actual value distribution of the variable, the first guess probability can be calculated in the actual data satisfying the variable expression in the sub-rule set
Figure DEST_PATH_IMAGE005
. In particular, each sub-rule set may include at least one expression corresponding to a variable. There may be a plurality of actual values of the variables, so that the actual values corresponding to the expressions can be obtained, and further, the guessing probability that each actual value can be guessed can be calculated according to the distribution of the actual values corresponding to the expressions. In some embodiments, the way of calculating the guessing probability may be set according to actual needs. For example, the guessing probability of each actual value being guessed is calculated independently, and a certain value, an average value, or a maximum value of all the guessing probabilities is used as the first guessing probability. Of course, it is also possible to calculate the probability that the actual value is guessed as a whole as the first guess probability. Thus, a hit in a rule set RSThe guessing probability can be expressed as
Figure 376684DEST_PATH_IMAGE006
In some embodiments, in the case that the sub rule set is not hit, the step of calculating the miss guess probability that the actual value corresponding to the variable included in the sub rule set can be guessed includes: calculating the conditional probability of the sub-rule set missing under the condition that the rule set is not hit; obtaining a second guess probability that the actual values corresponding to the variables in the sub-rule set can be guessed under the condition that the sub-rule set is not hit according to the variable expression included in the sub-rule set; using the conditional probability of the sub-ruleset miss and the second guess probability to derive the miss-guess probability.
Specifically, the probability that the rule set is missed may be calculated first, and then the conditional probability that the sub-rule set is missed may be calculated. First, it can be derived from the above operation that the rule set has a probability P (| RS) =1-P (RS) that it is not hit. Under the condition that a ruleset is not hit, the conditional probability that a sub-ruleset is not hit is P (! RSi| RS) =1, the conditional probability of a sub-ruleset miss may be expressed as:
Figure DEST_PATH_IMAGE007
according to the actual value distribution of the variable, in the actual data which do not satisfy the variable expression in the sub-rule set, the second guess probability can be calculated
Figure 555248DEST_PATH_IMAGE008
. In particular, each sub-rule set may include at least one expression corresponding to a variable. There may be a plurality of actual values of the variable, so that the actual values that do not conform to the expression can be obtained, and further, the guessing probability that each actual value can be guessed can be calculated according to the distribution of the actual values that do not conform to the expression. In some embodiments, the way to calculate the guessing probability may be according to actual needsAnd (4) setting. For example, the guessing probability of each actual value being guessed is calculated independently, and a certain value, an average value, or a maximum value of all the guessing probabilities is used as the second guessing probability. Of course, it is also possible to calculate the probability that the actual value is guessed as a whole as the second guess probability. Thus, the probability of a miss guess for a ruleset RS can be expressed as
Figure DEST_PATH_IMAGE009
In some embodiments, the step of determining a first guess probability that the actual value of the variable in the subset of rules can be guessed may include: calculating a guess probability that the actual values of the variables corresponding to the sub-rule set can be guessed in case that at least one expression included in the sub-rule set is hit; and taking the maximum value or the average value in the guessing probabilities as the first guessing probability.
In this embodiment, a plurality of expressions may be included in the sub-rule set corresponding to one variable. Each expression may be different from one expression to another. I.e. the data threshold ranges that constitute the expression are different. In this manner, the probability that the actual value of the variable can be guessed in the event that one or more expressions are hit can be calculated. Specifically, for example, one expression may be 22 < age < 60 and the other expression may be 10 < age < 25 corresponding to the sub-rule set of age, and the actual values satisfying these two expressions must be 23 and 24, and between these two values, the guess probability may be calculated as 50%. Of course, one can also choose to operate on the probability of guessing the actual value when each expression is hit. Alternatively, more expressions are selected and the probability of guessing the actual value when all of the expressions are hit is calculated. Of course, the probability of possible guessing of the amount of strain by the sub-rule set may be calculated in a variety of ways.
The maximum value of the calculated median probabilities may be used as the first median probability. Of course, the average of the derived probabilities of guessing may be used as the first probability of guessing. Alternatively, a selection rule may be specified, and one of the calculated plurality of guess probabilities may be selected as the first guess probability.
Similarly, in some embodiments, the step of obtaining a second guess probability that the actual value of the variable in the sub-ruleset can be guessed in case of a miss in the sub-ruleset may include: calculating a guess probability that the actual values of the variables corresponding to the sub-rule set can be guessed in case that at least one expression included in the sub-rule set is not hit; and taking the maximum value or the average value in the guessing probabilities as the second guessing probability.
The guess probability that the actual value of the variable can be guessed in the case of a miss can be calculated based on the actual value distribution of the variable that does not fall into the expression in the sub-rule set. Specifically, for example, an expression { sex = woman }, even if the expression is missing, the guess probability of the actual value being guessed is 50%. Of course, the probability of guessing the actual value when multiple expressions are missed can also be calculated.
In this embodiment, the maximum value of the calculated median probabilities may be used as the second median probability. Of course, the average of the derived median probabilities may be used as the second median probability. Alternatively, a selection rule may be specified, and one of the calculated plurality of guess probabilities may be selected as the second guess probability.
In some embodiments, calculating a guess probability that the actual value of the variable corresponding to the sub-rule set can be guessed in case at least one expression included in the sub-rule set is hit may include: arranging and combining expressions included in the sub-rule sets to obtain a plurality of combination modes; and respectively calculating the corresponding guess probability under the condition that each combination mode is hit.
Multiple expressions in the sub-rule set can be arranged and combined to obtain multiple combination modes. The guessing probability of the actual value of the variable under each combination mode is calculated respectively. Therefore, the condition that the actual value of the variable is possibly guessed can be comprehensively analyzed, and the privacy can be well protected. Specifically, for example, there are 3 expressions in a sub-rule set, and thus, it is possible to adoptBy using
Figure 41725DEST_PATH_IMAGE010
And calculating all possible combination modes, and further calculating the guessing probability corresponding to each combination.
In these embodiments, the probabilities of guessing the actual values of the variables in all cases may be calculated, and the maximum value of the probabilities may be used as the first guess probability. Since the first guess probability is used to calculate the hit-guess probability, which is used as a control indicator, the amount of information that can be passed to the model user is controlled. Thus, in the embodiment, the first guess probability is determined by comprehensively evaluating the guess probability of the actual value of the variable, so that the first guess probability can accurately represent the possible leakage risk of the actual value. And furthermore, a better basis is provided for reasonably controlling the information quantity.
It will be appreciated that in some embodiments, calculating a guess probability that the actual value of the variable corresponding to the sub-ruleset can be guessed if at least one expression included in the sub-ruleset is not hit may include: arranging and combining expressions included in the sub-rule sets to obtain a plurality of combination modes; and respectively calculating the corresponding guess probability under the condition that each combination mode is not hit. Specific implementation manners may be compared with the foregoing implementation manners, and details are not repeated.
In some embodiments, where it is determined that the rule set is at a higher security risk, the method may further comprise: generating risk prompt information; and the risk prompt information is used for prompting a data provider to refuse to run the rule set.
In this embodiment, the server may send risk notification information to the data provider when it finds that the rule set has a greater security risk. In this manner, the data provider, after receiving the risk hint information, can further decide whether to execute the rule set. In general, risk alert information may be used to alert the data provider to deny execution of the rule set, thus protecting the data provider's actual data.
Step S206: and comparing the hit guess probability and the miss guess probability with preset thresholds respectively to determine the safety of the rule set.
In some embodiments, in order to effectively protect the actual data of the data source, it is also necessary that the model user can obtain sufficient basis. Therefore, the information quantity of the rule set obtained from the data source and fed back to the model user is required to reach a proper degree. Under the appropriate degree, the information quantity obtained by the model user according to the rule set is enough to meet the use requirement, and the information quantity provided by the data source is in a certain controllable range. In this case, the rule set may be set in a manner that tends to provide a minimum amount of information to meet the needs of the model user. The implementation mode of the specification can realize that the risk that actual data in the data source is leaked is comprehensively evaluated by taking the hit guess probability and the miss guess probability as evaluation indexes. And finally, the safety is determined by comparing a preset threshold value with the hit guess probability and the miss guess probability. Therefore, the value of the preset threshold is relatively small, and the information quantity obtained by the model user through the rule set is relatively small. However, the value of the preset threshold is relatively large, and at this time, the amount of information obtained by the model user through the rule set is relatively large. If the amount of information obtained by the model user is too small, it may be difficult for the model user to make a decision based on the amount of information obtained, and the quality of service obtained by people in real life may be affected. However, if the amount of information obtained by the model user is too large, privacy information of people in real life may be leaked.
In some embodiments, the preset threshold may be set as an empirical value. Specifically, a large amount of sample data can be adopted for calculation, and the value or the value range of the preset threshold value is determined through statistical analysis of the calculation result. Of course, the preset threshold may also be set according to an algorithm of relative entropy in K-L (Kullback-Leibler) divergence. This allows the computation of the K-L divergence to be combined with the actual value hit guess probability and miss guess probability in this application. The hit guess probability and the miss guess probability are used as indexes for controlling the information amount. The method can achieve better balance between meeting the use requirements of a model user and protecting the actual data of the data source.
In some embodiments, the predetermined threshold may be set separately for the hit or miss guess probability, that is, the predetermined threshold may be the same or different. Of course, a preset threshold value can be set for both of them.
In some embodiments, a plurality of preset thresholds may be set, each preset threshold representing a respective risk level. Thus, the larger the value of the preset threshold is, the higher the corresponding risk level is, and the larger the risk that the actual data of the data source may be leaked is. The corresponding risk levels may be set separately or as a whole, corresponding to the hit guess probability and the miss guess probability.
The embodiments provided in this specification may convert a rule set into sub-rule sets corresponding to each variable, and further calculate a hit guess probability and a miss guess probability corresponding to each variable based on an expression condition in each sub-rule set. It is achieved that the risk of leakage of the actual values of the variables is fully evaluated from the case of a hit and a miss of the rule set, respectively. Therefore, the risk degree of the rule model can be determined efficiently and accurately, and the risk of data leakage caused by the fact that the data provider runs the unsafe rule model is reduced.
Please refer to fig. 4. The present specification further provides a device for determining security of a rule set, which may include: the device comprises an acquisition module, a conversion module, a first calculation module, a second calculation module and a determination module.
The acquisition module is used for acquiring a rule set; wherein the rule set includes a plurality of expressions formed by part or all of a plurality of variables.
The conversion module is used for dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein the variables included in different sub-rule sets are different.
The first calculation module is used for calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit.
And the second calculation module is used for calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit.
And the determining module is used for comparing the hit guess probability and the miss guess probability with preset threshold values respectively to determine the safety of the rule set.
The functions and effects achieved by the device for determining the security of the rule set provided in this embodiment are explained in comparison with the method for determining the security of the rule model provided in the foregoing embodiment, and are not described again here.
The present specification embodiments provide a computer storage medium storing computer program instructions that, when executed by a processor, implement: acquiring a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables; dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; and comparing the hit guess probability and the miss guess probability with preset thresholds respectively to determine the safety of the rule set.
In this embodiment, the memory may include a plurality of layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
The functions and effects realized by the computer program in this embodiment can be explained with reference to other embodiments, and are not described in detail.
Please refer to fig. 5. An electronic device includes a network communication unit, a memory, and a processor. The structures are connected through internal cables so that the structures can perform specific data interaction.
The network communication port can be specifically used for acquiring a rule model and data value distribution of variables; wherein the rule model comprises a rule set comprising a plurality of rules formed by expressions, the plurality of rules being connected by rule connecting words.
The processor may be specifically configured to divide the rule set into a plurality of sub-rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; and comparing the hit guess probability and the miss guess probability with preset thresholds respectively to determine the safety of the rule set.
The memory may be specifically configured to store a corresponding instruction program.
In this embodiment, the network communication unit may be a virtual port that is bound to different communication protocols so that different data can be transmitted or received. For example, the network communication unit may include a port responsible for web data communication, or a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication unit may further include a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.
In this embodiment, the memory may include a plurality of layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
Please refer to fig. 6. The embodiment of the specification further provides a rule set processing method. The processing method can be run on a client. The method may include the following steps.
Step S301: receiving a rule set; wherein the rule set includes a plurality of expressions formed by part or all of a plurality of variables.
Step S303: dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein the variables included in different sub-rule sets are different.
Step S305: and calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit.
Step S307: and calculating the miss guessing probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit.
Step S309 refuses to deploy the rule set if at least one of the hit or miss guess probability exceeds a preset threshold.
In this embodiment, the client may check the received rule set, so that when the surface rule set is executed, a greater risk of revealing the personal privacy of the user is brought. The method can be used for setting a preset threshold value corresponding to the hit guess probability and the miss guess probability respectively, and considering that the rule set has a greater risk to cause the leakage of personal privacy data of a user under the condition that one of the hit guess probability and the miss guess probability exceeds the preset threshold value. In this way, the client can refuse to deploy the rule set, so that the privacy of the user can be well protected.
In some embodiments, the client may deploy the rule set in the event that both the hit-guess probability and the miss-guess probability are less than respective preset thresholds.
In this embodiment, the technical solutions mainly adopted may be explained by referring to other embodiments in this specification, and are not described in detail.
Please refer to fig. 7. The embodiment of the specification provides a client. The client may include a receiving module, a converting module, a first computing module, a second computing module, and an executing module.
The receiving module may receive a rule set; wherein the rule set includes a plurality of expressions formed by part or all of a plurality of variables.
The conversion module may divide the rule set into a plurality of sub-rule sets respectively corresponding to the variables; wherein the variables included in different sub-rule sets are different.
The first calculation module may calculate a hit-guess probability that the actual values corresponding to the variables included in the sub-ruleset can be guessed in case the sub-ruleset is hit.
The second calculation module may calculate a miss-guess probability that the actual values corresponding to the variables included in the sub-ruleset can be guessed if the sub-ruleset is not hit.
The execution module may refuse to deploy the rule set if at least one of the hit guess probability and the miss guess probability exceeds a preset threshold.
In this embodiment, the functions and effects realized by the relevant modules can be explained in comparison with other embodiments, and are not described in detail.
A computer storage medium that may store computer program instructions that, when executed by a processor, implement: receiving a rule set, wherein the rule set comprises a plurality of expressions formed by part or all of a plurality of variables; dividing the rule set into a plurality of sub-rule sets respectively corresponding to the variables, wherein the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; refusing to deploy the rule set if at least one of the hit guess probability and the miss guess probability exceeds a preset threshold.
In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.
The embodiment of the specification provides an electronic device. The electronic device may include a network communication unit, a memory, and a processor. The structures are connected through internal cables so that the structures can perform specific data interaction.
The network communication port can be specifically used for receiving a rule model and data value distribution of variables; wherein the rule model comprises a rule set comprising a plurality of rules formed by expressions, the plurality of rules being connected by rule connecting words.
The processor may be specifically configured to divide the rule set into a plurality of sub-rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different; calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; refusing to deploy the rule set if at least one of the hit guess probability and the miss guess probability exceeds a preset threshold.
The memory may be specifically configured to store a corresponding instruction program.
In this embodiment, the network communication unit may be a virtual port that is bound to different communication protocols so that different data can be transmitted or received. For example, the network communication unit may include a port responsible for web data communication, or a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication unit may further include a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.
In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.
In this embodiment, the memory may include a plurality of layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
It is understood that the embodiments described in the present specification can be combined with each other in a matching manner. After reading the embodiments in the present specification, a person skilled in the art may combine the technical solutions disclosed in the embodiments without any creative effort.
It is to be understood that the various embodiments are described herein primarily in a progressive manner, with each embodiment focusing on introducing features that differ from the others so that the various embodiments may be construed in contrast.
It should be noted that, the units, devices, modules, and the like described in the foregoing embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product having a certain function. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
As can be seen from the above, the device for determining the security of the rule model provided in the embodiments of the present specification converts the rule model into the binary tree structure, and then can efficiently and accurately determine the security of the rule model by using the structural characteristics of the binary tree, thereby reducing the risk of data leakage caused by running an unsafe rule model on a data provider.
Although the present specification provides method steps as described in the embodiments or flowcharts, more or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one of many possible orders of execution and does not represent a unique order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) in accordance with the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments in the present specification.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the specification has been described with respect to the embodiments, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that fall within the spirit and scope of the specification, and it is intended that the appended claims include such variations and modifications as fall within the spirit and scope of the specification.

Claims (18)

1. A method of determining security of a rule set, comprising:
acquiring a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables;
dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different;
calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; wherein, include: calculating the conditional probability of the sub-rule set being hit under the condition that the rule set is hit; obtaining a first guess probability that the actual value corresponding to the variable in the sub rule set can be guessed under the condition that the sub rule set is hit according to the variable expression included in the sub rule set; deriving the hit guess probability using the conditional probability that the sub-ruleset was hit and the first guess probability;
calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; wherein, include: calculating the conditional probability of the sub-rule set missing under the condition that the rule set is not hit; obtaining a second guess probability that the actual values corresponding to the variables in the sub-rule set can be guessed under the condition that the sub-rule set is not hit according to the variable expression included in the sub-rule set; obtaining the miss-guess probability using the conditional probability of the sub-ruleset miss and the second guess probability;
and comparing the hit guess probability and the miss guess probability with preset thresholds respectively to determine the safety of the rule set.
2. The method of claim 1, partitioning the rule set into a plurality of sub-rule sets respectively corresponding to variables, comprising:
analyzing the rules included in the rule set into expressions respectively including designated operators;
the expression including the designated operator is divided into a plurality of sub-rule sets according to the variables involved.
3. The method of claim 2, wherein the set of rules includes expressions that are logically connected by a logical or, and include a designated operator that includes a logical and.
4. The method of claim 1, each of the sub-rulesets comprising only expressions involving the same variable, the ruleset being converted to a logical or connection between a plurality of the sub-rulesets.
5. The method of claim 1, wherein the step of obtaining a first guess probability that the actual value of the variable in the sub-ruleset can be guessed in case of a hit in the sub-ruleset comprises:
calculating a guess probability that the actual values of the variables corresponding to the sub-rule set can be guessed in case that at least one expression included in the sub-rule set is hit;
and taking the maximum value or the average value in the guessing probabilities as the first guessing probability.
6. The method of claim 5, calculating a guess probability that an actual value of a variable corresponding to the sub-ruleset can be guessed in the event that at least one expression included in the sub-ruleset is hit, comprising:
arranging and combining expressions included in the sub-rule sets to obtain a plurality of combination modes;
and respectively calculating the corresponding guess probability under the condition that each combination mode is hit.
7. The method of claim 1, wherein the step of obtaining a second guess probability that the actual value of the variable in the sub-ruleset can be guessed in case of a miss in the sub-ruleset comprises:
calculating a guess probability that the actual values of the variables corresponding to the sub-rule set can be guessed in case that at least one expression included in the sub-rule set is not hit;
and taking the maximum value or the average value in the guessing probabilities as the second guessing probability.
8. The method of claim 7, calculating a guess probability that an actual value of a variable corresponding to the sub-ruleset can be guessed if at least one expression included in the sub-ruleset is not hit, comprising:
arranging and combining expressions included in the sub-rule sets to obtain a plurality of combination modes;
and respectively calculating the corresponding guess probability under the condition that each combination mode is not hit.
9. The method of claim 1, where it is determined that the rule set is at security risk, further comprising:
generating risk prompt information; and the risk prompt information is used for prompting a data provider to refuse to run the rule set.
10. A device for determining security of a rule set, comprising:
the acquisition module is used for acquiring a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables;
the conversion module is used for dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different;
the first calculation module is used for calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; wherein, include: calculating the conditional probability of the sub-rule set being hit under the condition that the rule set is hit; obtaining a first guess probability that the actual value corresponding to the variable in the sub rule set can be guessed under the condition that the sub rule set is hit according to the variable expression included in the sub rule set; deriving the hit guess probability using the conditional probability that the sub-ruleset was hit and the first guess probability;
the second calculation module is used for calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; wherein, include: calculating the conditional probability of the sub-rule set missing under the condition that the rule set is not hit; obtaining a second guess probability that the actual values corresponding to the variables in the sub-rule set can be guessed under the condition that the sub-rule set is not hit according to the variable expression included in the sub-rule set; obtaining the miss-guess probability using the conditional probability of the sub-ruleset miss and the second guess probability;
and the determining module is used for comparing the hit guess probability and the miss guess probability with a preset threshold respectively to determine the safety of the rule set.
11. A method of processing a rule set, comprising:
receiving a rule set; wherein the rule set includes a plurality of expressions formed by some or all of a plurality of variables;
dividing the rule set into a plurality of sub rule sets respectively corresponding to the variables; wherein, the variables included in different sub-rule sets are different;
calculating the hit guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is hit; wherein, include: calculating the conditional probability of the sub-rule set being hit under the condition that the rule set is hit; obtaining a first guess probability that the actual value corresponding to the variable in the sub rule set can be guessed under the condition that the sub rule set is hit according to the variable expression included in the sub rule set; deriving the hit guess probability using the conditional probability that the sub-ruleset was hit and the first guess probability;
calculating the miss guess probability that the actual values corresponding to the variables contained in the sub rule set can be guessed under the condition that the sub rule set is not hit; wherein, include: calculating the conditional probability of the sub-rule set missing under the condition that the rule set is not hit; obtaining a second guess probability that the actual values corresponding to the variables in the sub-rule set can be guessed under the condition that the sub-rule set is not hit according to the variable expression included in the sub-rule set; obtaining the miss-guess probability using the conditional probability of the sub-ruleset miss and the second guess probability;
refusing to deploy the rule set if at least one of the hit guess probability and the miss guess probability exceeds a preset threshold.
12. The method of claim 11, partitioning the rule set into a plurality of sub-rule sets respectively corresponding to variables, comprising:
analyzing the rules included in the rule set into expressions respectively including designated operators;
the expression including the designated operator is divided into a plurality of sub-rule sets according to the variables involved.
13. The method of claim 12, wherein the set of rules includes expressions that are logically connected by a logical or, and include a designated operator that includes a logical and.
14. The method of claim 11, each of the sub-rulesets comprising only expressions involving the same variable, the ruleset being converted to a logical or connection between a plurality of the sub-rulesets.
15. The method of claim 11, wherein the step of obtaining a first guess probability that the actual value of the variable in the sub-ruleset can be guessed in case of a hit in the sub-ruleset comprises:
calculating a guess probability that the actual values of the variables corresponding to the sub-rule set can be guessed in case that at least one expression included in the sub-rule set is hit;
and taking the maximum value or the average value in the guessing probabilities as the first guessing probability.
16. The method of claim 15, calculating a guess probability that an actual value of a variable corresponding to the sub-ruleset can be guessed in the event that at least one expression included in the sub-ruleset is hit, comprising:
arranging and combining expressions included in the sub-rule sets to obtain a plurality of combination modes;
and respectively calculating the corresponding guess probability under the condition that each combination mode is hit.
17. The method of claim 11, wherein the step of obtaining a second guess probability that the actual value of the variable in the sub-ruleset can be guessed in the event that the sub-ruleset misses comprises:
calculating a guess probability that the actual values of the variables corresponding to the sub-rule set can be guessed in case that at least one expression included in the sub-rule set is not hit;
and taking the maximum value or the average value in the guessing probabilities as the second guessing probability.
18. The method of claim 17, calculating a guess probability that an actual value of a variable corresponding to the sub-ruleset can be guessed if at least one expression included in the sub-ruleset is not hit, comprising:
arranging and combining expressions included in the sub-rule sets to obtain a plurality of combination modes;
and respectively calculating the corresponding guess probability under the condition that each combination mode is not hit.
CN202011374651.9A 2020-12-01 2020-12-01 Method and device for determining and processing safety of rule set Active CN112182592B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011374651.9A CN112182592B (en) 2020-12-01 2020-12-01 Method and device for determining and processing safety of rule set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011374651.9A CN112182592B (en) 2020-12-01 2020-12-01 Method and device for determining and processing safety of rule set

Publications (2)

Publication Number Publication Date
CN112182592A CN112182592A (en) 2021-01-05
CN112182592B true CN112182592B (en) 2021-02-23

Family

ID=73918276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011374651.9A Active CN112182592B (en) 2020-12-01 2020-12-01 Method and device for determining and processing safety of rule set

Country Status (1)

Country Link
CN (1) CN112182592B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934004A (en) * 2019-03-14 2019-06-25 中国科学技术大学 The method of privacy is protected in a kind of machine learning service system
CN110609929B (en) * 2019-09-03 2021-07-23 前海飞算科技(深圳)有限公司 Data processing method and device, storage medium and electronic equipment
CN110991905B (en) * 2019-12-05 2022-05-13 支付宝(杭州)信息技术有限公司 Risk model training method and device

Also Published As

Publication number Publication date
CN112182592A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN109862018B (en) Anti-crawler method and system based on user access behavior
Schaberreiter et al. A quantitative evaluation of trust in the quality of cyber threat intelligence sources
US8397304B2 (en) Privacy management of data
US20130013807A1 (en) Systems and methods for conducting more reliable assessments with connectivity statistics
CN110851872B (en) Risk assessment method and device for private data leakage
CN105262760A (en) Method and device for preventing action of maliciously visiting login/register interface
WO2011038491A1 (en) Systems and methods for social graph data analytics to determine connectivity within a community
KR101591910B1 (en) Apparatus and method for evaluating security risks in cloud computing and method of recommendation about cloud service provider using result of evaluation of security risks
US9558348B1 (en) Ranking software applications by combining reputation and code similarity
EP3570242A1 (en) Method and system for quantifying quality of customer experience (cx) of an application
CN111324370A (en) Method and device for carrying out risk processing on to-be-on-line small program
CN112085588B (en) Method and device for determining safety of rule model and data processing method
CN111581258A (en) Safety data analysis method, device, system, equipment and storage medium
CN110599278B (en) Method, apparatus, and computer storage medium for aggregating device identifiers
CN112182592B (en) Method and device for determining and processing safety of rule set
CN112598496A (en) Wind control blacklist setting method and device, terminal equipment and readable storage medium
CN112085590B (en) Method and device for determining safety of rule model and server
CN110351345B (en) Method and device for processing service request
CN112257098B (en) Method and device for determining safety of rule model
Hatcher et al. Machine learning-based mobile threat monitoring and detection
CN112085589B (en) Method and device for determining safety of rule model and server
EP3908954B1 (en) Systems and methods for secure and privacy preserving device classification
CN112668842A (en) Vehicle insurance claim settlement risk factor evaluation method and device, electronic equipment and medium
CN111241277A (en) Sparse graph-based user identity identification method and device
CN113034123B (en) Abnormal resource transfer identification method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant