CN116361488A

CN116361488A - Method and device for mining risk object based on knowledge graph

Info

Publication number: CN116361488A
Application number: CN202310452028.8A
Authority: CN
Inventors: 朱仲书
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-06-30

Abstract

The embodiment of the specification relates to a method and a device for mining risk objects based on a knowledge graph, wherein the method comprises the following steps: and loading pre-formed knowledge graph data, wherein part of users and/or part of transactions in the knowledge graph are marked as risk objects, respectively carrying out rule mining on the knowledge graph by using M knowledge graph rule mining algorithms to obtain M rule sets, merging the M rule sets to obtain a set containing N risk rules, and deriving knowledge points related to the risk objects by using the rules. And then carrying out knowledge reasoning on the original knowledge graph by utilizing the rules to obtain a plurality of assumed knowledge points of which the assumption relates to the risk object. And determining corresponding labeling data according to the cross verification result of the assumed knowledge points and each risk rule, training a target model by using the labeling data, and using the trained target model for mining risk objects.

Description

Method and device for mining risk object based on knowledge graph

Technical Field

One or more embodiments of the present disclosure relate to the field of knowledge graphs, and in particular, to a method and apparatus for mining risk objects based on knowledge graphs.

Background

In recent years, knowledge maps are widely applied to various services such as search recommendation, financial management and control. In particular, in the scenario where the electronic payment platform performs transaction risk assessment, the information of a single user or a single transaction is often insufficient to accurately assess the transaction risk, and therefore, risk users and risk transactions in the electronic payment platform are often discovered by means of the associated information covered by the knowledge graph. Although the conventional deep learning model of the graph has good effect, a specific risk analysis rule is still needed as an aid in the scene of risk analysis because of the lack of intuitive interpretability. Currently, risk rules are often obtained through expert summarized experience, so that the efficiency is low, and a plurality of risk rule mining methods are proposed for related personnel to attempt to automatically mine available risk rules from existing data. However, the existing rule mining method generally has the problems of low accuracy, simple rule format and the like, and risk objects cannot be well mined.

Disclosure of Invention

One or more embodiments of the present disclosure describe a method and apparatus for mining risk objects based on knowledge-graph, which aims to improve accuracy and coverage rate of rule mining, so as to more accurately and efficiently find potential risk objects in electronic payment.

In a first aspect, a method for mining a risk object based on a knowledge graph is provided, including:

acquiring a preformed knowledge graph which comprises a plurality of knowledge points related to a user and a transaction; part of users and/or part of transactions in the knowledge graph are marked as risk objects;

performing rule mining on the knowledge graph respectively by using a plurality of knowledge graph rule mining algorithms with the obtained risk object as a target to obtain N risk rules, wherein any one risk rule is used for deducing knowledge points related to the risk object;

respectively utilizing the N risk rules to infer on the knowledge graph to obtain a plurality of assumed knowledge points of an assumed related risk object;

for any first assumed knowledge point, determining corresponding annotation data, wherein the annotation data comprises first annotation data indicating whether the N risk rules can infer the first assumed knowledge point and second annotation data indicating whether the first assumed knowledge point accords with the knowledge graph;

training a target model by utilizing a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points; the trained target model is used for mining risk objects.

In one possible implementation, the target model is a probability map model; training a target model by using a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points, including:

determining factor values of the first assumed knowledge points corresponding to a plurality of preset factors based on the first labeling data and the second labeling data corresponding to the arbitrary first assumed knowledge points, wherein the preset factors are used for reflecting the association of the assumed knowledge points and each risk rule;

determining joint probability distribution of the plurality of pieces of annotation data based on the weight parameter and the factor value of each of the plurality of assumed knowledge points;

and adjusting the weight parameters with the maximization of the joint probability distribution as a target to obtain the optimization parameters of the probability map model.

In one possible embodiment, the plurality of preset factors includes at least two of:

a first class factor indicating whether the risk rule can infer the first hypothesized knowledge point;

a second class factor indicating whether the first annotation data is consistent with the second annotation data;

and a third class factor indicating whether the inference results of any two risk rules are consistent for the first assumed knowledge point.

In one possible implementation, determining the joint probability distribution of the plurality of pieces of annotation data includes:

and for any first assumed knowledge point, carrying out inner product calculation on a first vector formed by a factor value corresponding to the first assumed knowledge point and a weight vector formed by a weight parameter, and carrying out normalization summation on inner product results corresponding to all the assumed knowledge points to obtain the joint probability distribution.

In a possible implementation manner, after obtaining the optimization parameters of the probability map model, the method further includes:

for a target knowledge point formed by a target user or target transaction to be analyzed, determining corresponding first labeling data, and determining a first group of factor values and a second group of factor values corresponding to the preset factors according to the first labeling data and two label values corresponding to true and false respectively;

based on the optimization parameters, respectively determining a first probability corresponding to the first group of factor values and a second probability corresponding to the second group of factor values;

and determining whether the target user or the target transaction is a risk object according to the label value corresponding to the larger one of the first probability and the second probability.

In a possible implementation manner, the first labeling data includes N elements, when the ith risk rule can infer the first assumed knowledge point, the ith element corresponding to the first labeling data is 1, otherwise, the ith element is 0.

In one possible implementation, the target model is a classification model; training a target model by using a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points, including:

and training to obtain a classification model by taking first labeling data in the plurality of labeling data as sample characteristic data and taking second labeling data as sample label data.

In one possible embodiment, the classification model includes at least: logistic regression model, neural network, gradient boost decision tree GBDT.

In one possible embodiment, after training to obtain the classification model, the method further comprises:

for any second assumed knowledge point in the plurality of assumed knowledge points, classifying the second assumed knowledge point by using the classification model, and calculating a corresponding confidence level;

the arbitrary second assumption knowledge point with the confidence coefficient smaller than the preset first threshold value is sent to the manual auditing platform for manual rechecking, and corrected second annotation data are determined according to the rechecking result;

and retraining the classification model by taking first labeling data in the plurality of labeling data as sample characteristic data and taking corrected second labeling data as sample label data.

for a target knowledge point formed by a target user or target transaction to be analyzed, determining corresponding first annotation data as a sample feature to be analyzed;

and inputting the characteristics of the sample to be tested into the classification model, and determining whether the target user or the target transaction is a risk object according to the classification result output by the classification model.

In one possible implementation, after determining the corresponding annotation data for any first assumed knowledge point, the method further includes:

for any third hypothesized knowledge point of the plurality of hypothesized knowledge points, if it is inferred by less than the preset second threshold bar risk rule, it is removed from the set of hypothesized knowledge points.

In one possible implementation, the several knowledge-graph rule mining algorithms include several of the following: a path ordering algorithm PRA, an association rule mining algorithm AMIE of an incomplete knowledge base and a sub-graph feature extraction method SFE.

In a second aspect, a data mining method based on a knowledge graph is provided, including:

acquiring a preformed knowledge graph which comprises a plurality of knowledge points related to a business object; a part of business objects in the knowledge graph are set as target objects;

Performing rule mining on the knowledge graphs respectively by using a plurality of knowledge graph rule mining algorithms to obtain N item target rules, wherein any item target rule is used for deducing knowledge points related to a target object;

respectively utilizing the N item mark rules to carry out reasoning on the knowledge graph to obtain a plurality of assumed knowledge points of which the assumption relates to the target object;

for any first assumed knowledge point, determining corresponding annotation data, wherein the annotation data comprises first annotation data indicating whether the N item marking rules can infer the first assumed knowledge point and second annotation data indicating whether the first assumed knowledge point accords with the knowledge graph;

training a target model by utilizing a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points; the trained target model is used for mining the target object.

In a third aspect, an apparatus for mining a risk object based on a knowledge graph is provided, including:

an acquisition unit configured to acquire a pre-formed knowledge graph including a plurality of knowledge points related to a user and a transaction; part of users and/or part of transactions in the knowledge graph are marked as risk objects;

The mining unit is configured to use a plurality of knowledge graph rule mining algorithms to respectively perform rule mining on the knowledge graphs with the risk objects as targets to obtain N risk rules, wherein any one risk rule is used for deducing knowledge points related to the risk objects;

the reasoning unit is configured to respectively utilize the N risk rules to conduct reasoning on the knowledge graph to obtain a plurality of presumed knowledge points of which the assumption relates to the risk object;

the determining unit is configured to determine corresponding labeling data of any first assumed knowledge point, wherein the labeling data comprise first labeling data indicating whether the N risk rules can infer the first assumed knowledge point and second labeling data indicating whether the first assumed knowledge point accords with the knowledge graph;

the training unit is configured to train a target model by using a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points; the trained target model is used for mining risk objects.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.

In a fifth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.

According to the method and the device for mining the risk object based on the knowledge graph, which are provided by the embodiment of the specification, the method has no constraint on the form of the rule, and meanwhile, a plurality of rule mining algorithms are supported, so that the rule with a richer form can be provided, the effect of cross verification can be realized, and the accuracy of mining the risk object is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only examples of the embodiments disclosed in the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1A illustrates an implementation scenario diagram for mining risk objects based on knowledge-graph according to one embodiment;

FIG. 1B illustrates a framework diagram of a method of mining risk objects based on knowledge-graph according to one embodiment;

FIG. 2 illustrates a flowchart of a method of mining risk objects based on knowledge-graph according to one embodiment;

FIG. 3 illustrates a flow diagram of a method of training a probability map model, according to one embodiment;

fig. 4 shows a schematic block diagram of an apparatus for mining risk objects based on knowledge-graph according to an embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

FIG. 1A illustrates an implementation scenario diagram for mining risk objects based on knowledge-graph according to one embodiment. In a transaction risk analysis scenario, the knowledge graph based may contain knowledge points related to the user and the transaction, which may be embodied as nodes, node attributes, and relationships between nodes in the knowledge graph. Also, in the knowledge graph, some users and/or some transactions are already marked as risk objects. In the example of fig. 1A, known risk users are represented by nodes in black circles. In order to mine more risk objects based on such a knowledge graph, as shown in fig. 1A, rule mining is first performed on the knowledge graph to obtain some risk rules.

Rules in the knowledge graph are an abstract logical expression, for example: transfer to (a, B) & Pay (a, x) = > Receive (B, x), where A, B, x is a rule-contained variable. The meaning of this rule is: when A initiates a transfer to B and A pays x-grams, it is inferred that B received x-grams. Rule mining is a process of extracting generalizable rules from knowledge graph data through various algorithm strategies. Aiming at the knowledge graph containing the risk object, if the knowledge graph is subjected to rule mining by taking the risk object as a target, a plurality of risk rules, namely rules related to the risk object, can be obtained. For example, transfer-typeA (a, B) & Label (a, "y") = > Label (B, "y") may be considered a risk rule that indicates that if a initiates a transfer of typeA to B and the risk tag of a is y (indicating whether it is a fraudulent user), then it is inferred that the risk tag of B is also y.

And then, carrying out rule reasoning on the original knowledge graph by using the obtained risk rule to obtain a plurality of assumed knowledge points of the assumed related risk object.

Rule reasoning is the process of deriving new knowledge from existing knowledge using one or more rules. When reasoning is performed in the knowledge graph using the mined risk rules, some assumed knowledge points can be obtained, including some assumed risk objects. For example: using the rule transfer-type a (a, B) & Label (a, "y") = > Label (B, "y"), in combination with the existing knowledge transfer-type a (Anna, bob) and Label (Anna, "1"), a new knowledge Label (Bob, "1") can be inferred. That is, when Anna initiates a transfer of typeA to Bob and the risk tag of Anna is 1 (indicating a fraudulent user), bob is also inferred to be a fraudulent user (risk object) using the risk rule transfer-typeA (a, B) & Label (a, "y") = > Label (B, "y") described above.

Then, based on the obtained assumed knowledge points and the mined risk rules, a model based on the risk rules is formed for determining the risk objects.

In the related art, the rule mining method mainly includes PRA (Path Ranking Algorithm, path sorting algorithm), tensorLog and Neural LP. The PRA searches paths among entities based on the assumption of the closed world for each relation, generalizes path rules, and learns rule weights through linear regression. The closed world assumption is that the facts included in the knowledge graph are all true, and the facts not included in the knowledge graph are all false. This assumption may lead to a false true event due to the lack of atlas data. The method thus has the following problems: 1. only simple rules in the form of paths can be learned, and more complex rules cannot be supported; while risk rules are likely to have a more complex rule form. 2. Based on the assumption of the closed world, the linear regression classifier is trained, the model effect is very dependent on sample sampling, and the obtained rule weight value is large in change and unstable.

The two methods of TensorLog and Neural LP both use Neural network fitting path rules, and the path rules and weights are extracted from trained network parameters, and the problems are as follows: 1. both methods only support path rules. 2. Both methods use neural network models, which have high overall complexity and low efficiency, and are difficult to process large-scale data, including knowledge maps describing transaction payment relationships.

Therefore, the existing various rule mining methods only provide a single algorithm, have the problems of small rule number, single form and low accuracy, and are difficult to accurately and comprehensively obtain results when mining risk transactions and risk users in a large-scale payment/transaction relationship knowledge graph. Based on the above, in one or more embodiments of the present disclosure, a plurality of rule mining algorithms are adopted in the rule mining stage, and in the modeling stage, cross-validation is performed on the assumed risk objects obtained by different risk rules, and based on the result of cross-validation, a rule-based model is constructed, so that accuracy of mining transaction risk objects is effectively improved.

FIG. 1B illustrates a framework diagram of a method of mining risk objects based on knowledge-graph according to one embodiment. Firstly, loading preformed knowledge graph data, wherein part of users and/or part of transactions in the knowledge graph are calibrated as risk objects, then, respectively carrying out rule mining on the knowledge graph by using M knowledge graph rule mining algorithms to obtain M rule sets, merging the M rule sets to obtain a set containing N risk rules, and deriving knowledge points related to the risk objects by using the rules. And carrying out knowledge reasoning on the original knowledge graph by utilizing the rules to obtain a plurality of assumed knowledge points of which the assumption relates to the risk object. After corresponding labeling data are determined according to the cross verification results of the assumed knowledge points and each risk rule, training a target model by using the labeling data, wherein the trained target model is used for mining risk objects. The target model may be a probabilistic graph model or a classification model, such as a logistic regression model, a neural network, a gradient-lifting decision tree GBDT, etc.

The following describes specific implementation steps of the rule mining method in conjunction with specific embodiments. Fig. 2 illustrates a flow diagram of a method of mining risk objects based on knowledge-graph, the subject of execution of which may be any platform or server or cluster of devices with computing, processing capabilities, etc., in accordance with one embodiment.

In step 201, a pre-formed knowledge graph K is obtained, which contains a plurality of knowledge points related to the user and the transaction; part of the user and/or part of the transaction in the knowledge graph is marked as a risk object.

In daily electronic payments and transactions, there are risky users and risky transactions in addition to normal users and transactions. The risky user may be a user whose payment account is fraudulent by the non-principal, e.g., a user whose account is stolen, or a fraudulent user who intentionally conducts an illegal transaction (e.g., malicious cash-out, fraud, illegal funds transfer, etc.); the risk transaction may be a transaction made by a risk user or a transaction whose attribute is determined to be illegally suspicious, for example, a transaction which is transferred to other users after an account of one user is stolen, or a transaction which is generated when goods are purchased after a treasured net account is stolen, or a transaction which is generated when a normal user transfers to a suspicious user.

Knowledge points related to transactions are formed by collecting user related attribute information, such as login equipment, frequent residence, recent transactions, etc., and collecting transaction related attribute information, such as transaction account, transaction amount, payment method, transaction time, transaction place, etc. Knowledge points containing information related to the user and the transaction are preformed into a knowledge graph, and part of users and/or part of the transactions with risks in the knowledge graph are marked as risk objects.

In one embodiment, the knowledge points may be in the form of triples, in particular, triples (s, p, o), where s represents a subject, i.e., a subject, p represents a prediction, i.e., a predicate, and o represents an object, i.e., an object. The triplet (s, p, o) represents that there is a relationship p between the entity s and the entity o. For example, the triplet (Anna, transferTo, bob) initiates a transfer to Bob on behalf of Anna.

In another embodiment, the knowledge points may be in the form of "relationship (entity 1, entity 2)", which represents that there is a relationship between entity 1 and entity 1. For example, transfer to (Anna, bob) initiates a transfer to Bob on behalf of Anna.

In yet another embodiment, the knowledge points may also be represented as attributes of nodes or relationships in the knowledge graph. For example, label (Anna, "1").

In step 202, a plurality of knowledge graph rule mining algorithms are used to respectively rule mine the knowledge graph K with the risk object as a target, so as to obtain N risk rules, wherein any one risk rule is used for deducing knowledge points related to the risk object.

In one embodiment, M kinds of knowledge graph Rule mining algorithms Algo1, algo2, … and Algo M are used to respectively Rule mine the knowledge graph K to obtain M Rule sets Rule1, rule2, … and Rule M, and the M Rule sets are combined to obtain N risk rules, wherein any one risk Rule is used to deduce knowledge points related to risk objects. Optionally, after merging the M rule sets, the repeated rules may be removed, and then N risk rules are obtained.

In a more specific embodiment, the knowledge-graph rule mining algorithm used may include: PRA (Path Ranking Algorithm, path ordering algorithm), AMIE (Association Rule Mining under Incomplete Evidence, association rule mining of incomplete knowledge base), SFE (Subgraph Feature Extractor, subgraph feature extraction method), and any other possible knowledge-graph rule mining algorithm, without limitation.

In step 203, reasoning is performed on the knowledge graph K by using the N risk rules, so as to obtain a plurality of assumed knowledge points of the risk object.

In one embodiment, the N risk rules and the knowledge graph K are input into a knowledge reasoning system, and knowledge reasoning is performed on the knowledge graph K according to each rule by using the knowledge reasoning system, so as to obtain a plurality of assumed knowledge points of which the assumption relates to the risk object. The knowledge reasoning system uses a set of rules to infer new knowledge from existing knowledge.

For example, using the rules transfer to (a, B) & Pay (a, 100) = > Receive (B, 100), in combination with knowledge transfer to (Anna, bob) and Pay (Anna, 100), new knowledge Receive (Bob, 100) can be inferred. That is, when Anna initiates a transfer to Bob and Anna pays 100, using the rule transfer to (a, B) & Pay (a, 100) = > Receive (B, 100), it is inferred that Bob received 100.

In step 204, for any first assumed knowledge point, corresponding labeling data is determined, where the labeling data includes first labeling data Λ indicating whether the N risk rules can infer the first assumed knowledge point, and second labeling data Y indicating whether the first assumed knowledge point matches the knowledge graph.

In one specific example, the first annotation data and the second annotation data may take the form of records as follows. If the jth rule in the N risk rules can infer the ith knowledge point in the plurality of assumed knowledge points, marking the data of the ith row and the jth column in the first marked data Λ as 1, otherwise marking the data as 0; if the ith knowledge point in the plurality of assumed knowledge points is true in the knowledge graph K, that is, the ith knowledge point appears in the knowledge graph K, the data of the ith row of the second labeling data Y is marked as 1, and if the ith knowledge point is false or does not appear in the knowledge graph K, the data of the ith row of the second labeling data Y is marked as 0.

For example, in one specific example, the N risk rules include rules { r1, r2}, and the plurality of assumed knowledge points has 3 knowledge points { k1, k2, k3}. The rule r1 can infer a knowledge point k1 and a knowledge point k3, a knowledge point k2 cannot be inferred, the rule r2 can infer the knowledge point k1 and the knowledge point k2, and the knowledge point k3 cannot be inferred; knowledge points K2 and K3 appear in the knowledge graph K, and knowledge point K1 does not appear in the knowledge graph K. In this particular example, the first annotation data Λ is shown in table 1 and the second annotation data Y is shown in table 2.

Table 1: first annotation data Λ in one particular example

	r1	r2
			k1	1	1
k2	0	1
			k3	1	0

Table 2: second annotation data Y in one specific example

	Y
		k1	0
k2	1
		k3	1

It should be noted that the first row and the first column in table 1 and the first row and the first column in table 2 are only schematically added for convenience of illustration and presentation. In practical applications, the first labeling data Λ in this example may be a matrix of 3 rows and 2 columns, and the second labeling data Y may be a matrix of 3 rows and 1 column (column vector).

It should be further understood that the foregoing illustrates only one specific recording form of the first annotation data and the second annotation data, but the annotation data may also take other recording forms, for example, other symbol/mark/value recording assumes matching of knowledge points with other information, and recording the annotation data as a numerical string, which is not limited herein.

In one embodiment, for any third hypothesized knowledge point of the plurality of hypothesized knowledge points, if it is inferred by less than the preset second threshold bar risk rule, it is removed from the set of hypothesized knowledge points. Specifically, for the third assumed knowledge points ks, if the number of 1 s in the row where the third assumed knowledge points ks are located in the first labeling data Λ is smaller than the second threshold value, the third assumed knowledge points ks are removed from the set of assumed knowledge points, so that the assumed knowledge points with lower credibility can be filtered, and the situation that misjudgment occurs in the subsequent risk object mining process is reduced.

In step 205, training a target model by using the plurality of labeling data corresponding to the plurality of assumed knowledge points; the trained target model is used for mining risk objects.

The target model can at least comprise two major classes, namely a probability map model and a classification model.

When the target model is a probability map model, the training method for the probability map model is shown in fig. 3.

In step 301, factor values of the first assumed knowledge points corresponding to a plurality of preset factors for reflecting association of the assumed knowledge points with each risk rule are determined based on the first annotation data Λ and the second annotation data Y corresponding to the arbitrary first assumed knowledge points.

In one embodiment, the plurality of preset factors includes: a first class factor indicating whether the risk rule can infer the first hypothesized knowledge point; a second class factor indicating whether the first annotation data is consistent with the second annotation data; and a third class factor indicating whether the inference results of any two risk rules are consistent for the first assumed knowledge point.

Specifically, the first type of factor is a tag predisposition factor

The value of the method is determined by reasoning whether any knowledge point i can be obtained according to any rule j in the labeling matrix, namely +. >

The second type of factor is an accuracy factor

The value of the label is deduced according to any rule j in the labeling matrix to obtain the true and false of any knowledge point i and the label Y of the ith row in the second labeling data Y _i Whether or not to agree, i.e.)>

The third class of factors is related coefficient factors

The value is determined according to whether the reasoning results of any two different rules j and k on the same knowledge point i are the same, namely +.>

In the above-described factor expression, the operator 1{ } indicates that the value is 1 when the determination expression in the bracket is true, and the value is 0 when the determination expression in the bracket is false.

When there are N risk rules, for the ith knowledge point in the multiple assumed knowledge points, the first factor is corresponding to the ith knowledge point

There may be N values, corresponding to the second type of factor +.>

There may be N values corresponding to the third class factor->

There may be C values, where the maximum value of C is N x N/2, i.e. the number of rule pairwise combinations.

In other embodiments, other or more factors may also be defined with reference to the above, so long as they can reflect the association of the hypothesized knowledge points with the respective risk rules. For example, a fourth class of factors may also be defined, representing the number of risk rules that can infer the piece of hypothesized knowledge points, and so on.

In step 302, a joint probability distribution of the plurality of labeled data is determined based on the weight parameter ω and the factor values of each of the plurality of hypothetical knowledge points.

In one embodiment, for any first assumed knowledge point, performing inner product calculation on a first vector formed by a factor value corresponding to the first assumed knowledge point and a weight vector formed by a weight parameter, and performing normalized summation on inner product results corresponding to all the assumed knowledge points to obtain the joint probability distribution.

In a more specific embodiment, for an i-th hypothesized knowledge point of the plurality of hypothesized knowledge points, a first vector phi is formed of its corresponding factor values _i (Λ,y _i ) First vector phi _i (Λ,y _i ) Is 2N+C, y _i Is the label of the ith row in the second labeling data Y. At this time, the joint probability distribution p _ω (Λ, Y) is as shown in formula (1):

wherein,,

for normalizing the result of the calculation, exp (x) represents e ^x M is the total number of assumed knowledge points,it will be appreciated that ω and φ _i (Λ,y _i ) Is the same.

For example, the following example of step 204 described above continues to be used herein, for knowledge k1, which pertains to the first class factors of rules r1 and r2

The values of (1) and (1), respectively, are the second type factors of the rules r1 and r2 +. >

The values of (2) are 0 and 0, respectively, which are related to the third class factors of the rules r1 and r2 +.>

The values of (1), (1) and (1) are respectively obtained, and all the values of the three factors are combined into a first vector phi ₁ (Λ,y ₁ )＝(1,1,0,0,1,1,1)。

For knowledge k2, it relates to a first type factor of rules r1 and r2

The values of (1) are 0 and 1, respectively, which are the second factors of the rules r1 and r2 +.>

The values of (2) are 0 and 1, respectively, which are related to the third class factors of the rules r1 and r2 +.>

The values of the three factors are respectively 1, 0 and 1, and all the values of the three factors are combined into a first vector phi ₂ (Λ,y ₂ )＝(0,1,0,1,1,0,1)。

For knowledge k3, it relates to a first type factor of rules r1 and r2

The values of (1) and (0), respectively, are about the second type factor of the rules r1 and r2 +.>

The values of (1) and (0) are respectively given by a third class factor of the rules r1 and r2>

The values of the three factors are respectively 1, 0 and 1, and all the values of the three factors are combined into a first vector phi ₃ (Λ,y ₃ )＝(1,0,1,0,1,0,1)。

At this time, the joint probability distribution p _ω (Λ, Y) is as shown in formula (2):

in step 303, the weight parameter ω is adjusted with the goal of maximizing the joint probability distribution, so as to obtain the optimization parameter of the probability map model.

In one embodiment, the optimized weight parameter is obtained by maximizing the log-marginal likelihood function corresponding to the joint probability distribution

Thus, the training results in a probability map model.

In other embodiments, other types of preset factors may be defined, and then a corresponding joint probability distribution model is constructed, and the model is trained using the first annotation data Λ and the second annotation data Y.

In some embodiments, after obtaining the optimization parameters of the probabilistic graphical model at step 303, the method further comprises the step of mining risk objects using the trained probabilistic graphical model:

for a target knowledge point formed by a target user or target transaction to be analyzed, determining corresponding first labeling data, and determining a first group of factor values and a second group of factor values corresponding to the preset factors according to the first labeling data and two label values corresponding to true and false respectively; based on the optimization parameters, respectively determining a first probability corresponding to the first group of factor values and a second probability corresponding to the second group of factor values; and determining whether the target user or the target transaction is a risk object according to the label value corresponding to the larger one of the first probability and the second probability.

Specifically, for a target knowledge point formed by a target user or a target transaction to be analyzed, firstly, carrying out reasoning annotation on the target knowledge point by using N risk rules to obtain corresponding first annotation data Λ. Then taking the values of y to 1 and 0 respectively to obtain a first group of factor values phi (lambda, 1) and a second group of factor values phi (lambda, 0), and then based on the optimized values

Respectively calculating first probabilities p _ω (Λ, 1) and a second probability p _ω And (Λ, 0) determining whether the target user or the target transaction is a risk object according to the label value corresponding to the larger one of the first probability and the second probability.

When the target model in step 205 is a classification model, the training method for the classification model includes:

and training to obtain a classification model by taking first annotation data Λ in the plurality of pieces of annotation data as sample characteristic data and second annotation data Y as sample label data, wherein the classification model at least comprises: logistic regression model, neural network, gradient boost decision tree GBDT.

Optionally, in one embodiment, for any second hypothesized knowledge point of the plurality of hypothesized knowledge points, classifying it using a primarily trained classification model, and calculating a corresponding confidence level; the arbitrary second assumption knowledge point with the confidence coefficient smaller than the preset first threshold value is sent to the manual auditing platform for manual rechecking, and corrected second annotation data are determined according to the rechecking result; and retraining the classification model by taking first labeling data in the plurality of labeling data as sample characteristic data and taking corrected second labeling data as sample label data.

After training the classification model, the method further comprises the step of mining the risk object using the trained classification model:

and determining first annotation data corresponding to a target knowledge point formed by a target user or target transaction to be analyzed, and determining whether the target user or target transaction is a risk object according to a classification result of the classification model on the first annotation data.

By combining the above, the method for mining risk objects based on the knowledge graph provided by the embodiment of the specification has no constraint on the form of the risk rules, supports multiple rule mining algorithms, and can provide the risk rules with richer forms. And, in calculating the rule confidence, it will be assumed whether the knowledge point appears in the original atlas as one of the labeling data, and cross-validating with other labeling data, without relying on the closed world assumption. The model based on the risk rule constructed in this way can more effectively and accurately mine the risk objects.

Based on the conception, the knowledge-graph-based data mining method in other scenes can be obtained by extending from the scenes of risk analysis. In particular, the method may comprise the following steps. Acquiring a preformed knowledge graph which comprises a plurality of knowledge points related to a business object; and a part of the business objects in the knowledge graph are set as target objects. Wherein the business object may be a user, transaction, commodity or other object, and the target object may be a business object of interest to be analyzed. And then, using a plurality of knowledge graph rule mining algorithms to respectively rule mine the knowledge graphs to obtain N item target rules, wherein any item target rule is used for deducing knowledge points related to the target object. And then, respectively utilizing the N item mark rules to perform reasoning on the knowledge graph to obtain a plurality of assumed knowledge points of which the assumption relates to the target object. And determining corresponding annotation data of any first assumed knowledge point, wherein the annotation data comprises first annotation data indicating whether the N item annotation rules can infer the first assumed knowledge point and second annotation data indicating whether the first assumed knowledge point accords with the knowledge graph. After determining to obtain a plurality of pieces of labeling data corresponding to a plurality of assumed knowledge points, training a target model by using the plurality of pieces of labeling data; the trained target model is used for mining the target object.

According to an embodiment of another aspect, there is also provided an apparatus for mining a risk object based on a knowledge-graph. Fig. 4 shows a schematic block diagram of an apparatus according to an embodiment, which may be deployed in any device, platform or cluster of devices with computing, processing capabilities. As shown in fig. 4, the apparatus 400 includes:

an obtaining unit 401 configured to obtain a pre-formed knowledge graph, including a number of knowledge points related to the user and the transaction; part of users and/or part of transactions in the knowledge graph are marked as risk objects;

the mining unit 402 is configured to use a plurality of knowledge graph rule mining algorithms to respectively perform rule mining on the knowledge graphs with the risk objects as targets to obtain N risk rules, wherein any one risk rule is used for deducing knowledge points related to the risk objects;

an inference unit 403 configured to perform inference on the knowledge graph by using the N risk rules, to obtain a plurality of assumed knowledge points assumed to relate to a risk object;

a determining unit 404, configured to determine, for an arbitrary first assumed knowledge point, corresponding labeling data, where the labeling data includes first labeling data indicating whether the N risk rules can infer the first assumed knowledge point, and second labeling data indicating whether the first assumed knowledge point matches the knowledge graph;

A training unit 405 configured to train a target model using a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points; the trained target model is used for mining risk objects.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in any of the above embodiments.

According to an embodiment of yet another aspect, there is also provided a computing device including a memory and a processor, wherein the memory has executable code stored therein, and the processor, when executing the executable code, implements the method described in any of the above embodiments.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method of mining risk objects based on knowledge-graph, comprising:

2. The method of claim 1, wherein the target model is a probability map model; training a target model by using a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points, including:

3. The method of claim 2, wherein the plurality of preset factors includes at least two of:

4. The method of claim 2, wherein determining the joint probability distribution of the plurality of tagged data comprises:

5. The method of claim 2, wherein after deriving the optimization parameters of the probability map model, the method further comprises:

6. The method of claim 1, wherein the first annotation data comprises N elements, and when an ith risk rule can infer the first assumed knowledge point, the ith element corresponding to the first annotation data is 1, and otherwise is 0.

7. The method of claim 1, wherein the target model is a classification model; training a target model by using a plurality of pieces of labeling data corresponding to the plurality of assumed knowledge points, including:

8. The method of claim 7, wherein the classification model comprises at least: logistic regression model, neural network, gradient boost decision tree GBDT.

9. The method of claim 7, wherein after training the classification model, the method further comprises:

10. The method of claim 7, wherein after training the classification model, the method further comprises:

11. The method of claim 1, wherein, after determining the corresponding annotation data for any first hypothesized knowledge point, the method further comprises:

12. The method of claim 1, wherein the plurality of knowledge-graph rule mining algorithms comprises a plurality of: a path ordering algorithm PRA, an association rule mining algorithm AMIE of an incomplete knowledge base and a sub-graph feature extraction method SFE.

13. A data mining method based on a knowledge graph comprises the following steps:

14. An apparatus for mining risk objects based on knowledge-graph, comprising:

15. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-13.

16. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-13.