WO2020168851A1 - 行为识别 - Google Patents

行为识别 Download PDF

Info

Publication number
WO2020168851A1
WO2020168851A1 PCT/CN2020/071002 CN2020071002W WO2020168851A1 WO 2020168851 A1 WO2020168851 A1 WO 2020168851A1 CN 2020071002 W CN2020071002 W CN 2020071002W WO 2020168851 A1 WO2020168851 A1 WO 2020168851A1
Authority
WO
WIPO (PCT)
Prior art keywords
confidence
user
graph model
message value
recharge
Prior art date
Application number
PCT/CN2020/071002
Other languages
English (en)
French (fr)
Inventor
张振华
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2020168851A1 publication Critical patent/WO2020168851A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/22Payment schemes or models
    • G06Q20/28Pre-payment schemes, e.g. "pay before"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists

Definitions

  • the present disclosure relates to the field of Internet technology, and in particular to a behavior recognition method, a behavior recognition device, an electronic device, and a computer-readable storage medium.
  • the purpose of the present disclosure is to provide a behavior recognition method and device, electronic equipment, and storage medium, so as to overcome at least to a certain extent the problem that fraudulent behavior cannot be accurately recognized due to limitations and defects of related technologies.
  • a behavior recognition method which includes: constructing a graph model based on order data corresponding to a user's historical behavior, and determining the initial confidence level of each node included in the graph model;
  • the data obtains the community characteristics and comprehensive characteristics for the user;
  • the message value of the graph model is determined by the community characteristics, the comprehensive characteristics, the initial confidence and the message update rule, and the message value is determined according to the message value.
  • a graph model and calculate the confidence level of the order data corresponding to the current behavior through the graph model, so as to determine the recognition result of the current behavior according to the confidence level.
  • building a graph model based on order data corresponding to a user's historical behavior includes: acquiring order data corresponding to the historical behavior, the order data including a user number associated with the user And a recharge number; based on the association relationship between the user number and the recharge number, the graph model is constructed.
  • constructing the graph model includes: grouping according to the user number, and constructing the user number and the recharge number.
  • the co-occurrence matrix of the recharged number and construct the graph model according to the co-occurrence matrix; or use the user number and the recharged number as an index to construct a co-occurrence array, and construct the co-occurrence array according to the co-occurrence array Graph model.
  • determining the initial confidence of each node included in the graph model includes: constructing the confidence of the multiple users according to the confidence score data and historical reference numbers of multiple users Data set; train a confidence prediction function based on the confidence data set to obtain a trained confidence prediction function; predict each user according to the trained confidence prediction function, and determine each user corresponding to each user The initial confidence level of the node.
  • obtaining community characteristics and comprehensive characteristics for the user through the order data includes: constructing the graph model based on the order data; and performing a penetration algorithm on the graph model Process to obtain a group set; mark the user's confidence data set according to the group set and the historical reference number to obtain the community feature.
  • obtaining community features and comprehensive features for the user through the order data includes: obtaining data on at least one dimensional feature of the user, and evaluating the at least one dimensional feature The dimensional feature data is clustered to obtain the comprehensive feature.
  • the at least one dimensional characteristic includes one or more of age habit characteristics, real-time consumption characteristics, geographic characteristics, and consumer business characteristics, and the comprehensive characteristics include risk level characteristics.
  • determining the message value of the graph model according to the community feature, the comprehensive feature, the initial confidence, and the message update rule includes: according to the community feature and the Comprehensive feature generation weight training data; separately train the weight training data and the confidence data set to obtain multiple weight coefficients; input the multiple weight coefficients and the comprehensive feature into the message update rule to obtain The initial message value of the graph model.
  • the method further includes: updating the co-occurrence matrix, and updating the initial message value according to the co-occurrence frequency of each node in the co-occurrence matrix to obtain all The target message value of the graph model.
  • updating the initial message value according to the co-occurrence frequency of each node in the co-occurrence matrix, and obtaining the target message value of the graph model includes: passing the initial message Calculate the confidence that the recharge number meets a preset condition; calculate the confidence loss of all recharge numbers that meet the preset condition; optimize the initial message value by minimizing the confidence loss to obtain the target Message value.
  • determining the graph model according to the message value includes: generating the graph model for the co-occurrence matrix and the target message value.
  • the method further includes: alerting the order data whose confidence level is greater than a preset value.
  • a behavior recognition device including: a confidence calculation module for constructing a graph model based on the order data corresponding to the user's historical behavior, and determining the initial value of each node included in the graph model Confidence; a feature extraction module, used to obtain community features and comprehensive features for the user through the order data; an identification control module, used to use the community features, the comprehensive features, the initial confidence, and the message
  • the update rule determines the message value of the graph model, determines the graph model according to the message value, and calculates the confidence level of the order data corresponding to the current behavior through the graph model to determine the current behavior according to the confidence level
  • the recognition result including: a confidence calculation module for constructing a graph model based on the order data corresponding to the user's historical behavior, and determining the initial value of each node included in the graph model Confidence; a feature extraction module, used to obtain community features and comprehensive features for the user through the order data; an identification control module, used to use the community features, the comprehensive features, the initial confidence, and the message
  • an electronic device including: a processor; and
  • the memory is configured to store executable instructions of the processor; wherein the processor is configured to execute the behavior identification method described in any one of the foregoing by executing the executable instructions.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the behavior recognition method described in any one of the above is implemented.
  • the community characteristics and comprehensive characteristics for the user are obtained through the order data corresponding to the historical behavior.
  • the feature description dimension is avoided, the error caused by a single feature is avoided, and the accuracy is improved.
  • the message value of the graph model is determined through the community feature, comprehensive feature, initial confidence, and message update rules, and then the graph model is constructed based on the message value.
  • the obtained graph model is used to calculate the confidence of the order data corresponding to the current behavior, so that user behavior can be identified quickly and accurately based on the confidence, and the risk of fraud can be avoided in time .
  • Fig. 1 schematically shows a schematic diagram of a behavior recognition method in an exemplary embodiment of the present disclosure
  • Fig. 2 schematically shows a schematic diagram of building a graph model in an exemplary embodiment of the present disclosure
  • FIG. 3 schematically shows a schematic diagram of determining the initial confidence of a node in an exemplary embodiment of the present disclosure
  • FIG. 4 schematically shows a schematic diagram of calculating an initial message value in an exemplary embodiment of the present disclosure
  • FIG. 5 schematically shows a schematic diagram of optimizing the initial message value in an exemplary embodiment of the present disclosure
  • Fig. 6 schematically shows a block diagram of a behavior recognition device in an exemplary embodiment of the present disclosure
  • Fig. 7 schematically shows a block diagram of an electronic device in an exemplary embodiment of the present disclosure
  • Fig. 8 schematically shows a program product in an exemplary embodiment of the present disclosure.
  • the behavior recognition method provided in the present disclosure can be applied to electronic equipment, which can be a terminal such as a mobile phone, a tablet computer, or a desktop computer, or a server, such as a server or a server cluster.
  • electronic equipment can be a terminal such as a mobile phone, a tablet computer, or a desktop computer, or a server, such as a server or a server cluster.
  • This example embodiment first provides a behavior recognition method, which can be applied to various anti-fraud scenarios, such as recharge anti-fraud, cash out fraud, or telecommunication fraud.
  • the behavior recognition method will be described in detail below with reference to FIG. 1.
  • step S110 a graph model is constructed based on the order data corresponding to the user's historical behavior, and the initial confidence of each node included in the graph model is determined.
  • the user may be a user registered on at least one platform.
  • the historical behavior can be the historical recharge behavior of the user on all platforms.
  • Order data refers to all historical order data corresponding to the user. For example, it can be the recharge order data corresponding to the recharge request.
  • the order data can include the user number corresponding to the user.
  • the user number can be related to the user account, that is, the user ID (Identity , The mobile phone number bound to the identity identifier), the order data may also include the recharge number included in the order request, and the recharge number may be the mobile phone number used to recharge the user account.
  • the user number and the recharge number can be the same or different, and there is no special restriction here.
  • the user account may also be an account used by the user to log in to the target application, or other identifiers that can uniquely identify the user.
  • the embodiment of the present disclosure does not limit the user account, and the target application may be an application that provides a recharge service for the user.
  • the order data includes the recharge number included in the order request, which can be the mobile phone number used to recharge the user account, or other accounts used to recharge the user account, such as game accounts.
  • the embodiment of the present disclosure does not limit the recharge number .
  • step S210 The process of constructing a graph model based on the order data corresponding to the user's historical behavior is shown in FIG. 2, and may include step S210 and step S220:
  • step S210 order data corresponding to the historical behavior is acquired, and the order data includes a user number and a recharge number associated with the user.
  • each order data can correspond to a recharge behavior of the user.
  • the acquired order data may include user ID, user binding number, user recharge number, recharge amount for the recharge number, and so on. In addition, it can also include the order number and the time when the order was generated.
  • step S220 the graph model is constructed based on the association relationship between the user number and the recharge number.
  • the association relationship between the user number and the recharge number can be described by the number of times a user number is recharged for a certain recharge number.
  • the graph model may be, for example, an MRF (MRF-Markov Random Field, Markov Random Field) model, or a conditional random field model, and so on.
  • the MRF model is an MRF graph, for example, a directed graph.
  • the MRF model simulates the image as a grid composed of random variables, each of which has a clear dependence on neighbors composed of random variables other than itself (Markovian) .
  • the process in step S220 includes two manners of step S221 and step S222:
  • Top-up number 2 Top-up number 3 ...
  • User number 1 2 1 0 ...
  • User number 2 0 0 3 ...
  • User number 3 0 0 2 ... ... ... ... ... ... ...
  • step S221 grouping is performed according to the user number, a co-occurrence matrix of the user number and the recharged number is constructed, and the graph model is constructed according to the co-occurrence matrix.
  • Grouping by user number refers to grouping all historical order data according to the user number bound to the user. Since the user can bind at least one user number, a user number can be divided into a group.
  • the co-occurrence matrix refers to a matrix composed of the number of times that multiple user numbers and multiple recharge numbers appear together. The co-occurrence matrix of user numbers and recharge numbers is shown in Table 1.
  • the element Eij in the co-occurrence matrix E represents the number of orders that occurred between the user number i and the recharge number j.
  • the graph model can be constructed by the following code:
  • node1 nodes[0]
  • node2 nodes[1]
  • the co-occurrence array can be used to construct a graph model.
  • the user number and the recharge number are combined as an index to construct a co-occurrence array, and the graph model is constructed according to the co-occurrence array.
  • the order data corresponding to the historical behavior can be acquired first, and the order data includes the user number and the recharge number associated with the user.
  • a co-occurrence array A describing the number of times that multiple user numbers and multiple recharge numbers appear together is established. For example, referring to Table 2, the user number and recharge number can be combined as an index to construct a co-occurrence array. It should be noted that, in order to save storage space, only user numbers that actually have recharge behavior and combinations of recharge numbers are stored in the co-occurrence array.
  • the graph model can be constructed by the following code:
  • node1 nodes[0]
  • node2 nodes[1]
  • the edge weight in the graph model can be temporarily set as the number of co-occurrences of the user number and the recharge number.
  • the network structure diagram can be visualized, for example, the interface of networkx is called to visualize the graph model.
  • the number of nodes in the graph model is too large, the time required for drawing and rendering will be too long.
  • other visualization tools such as pm large visualization tools, are needed to visualize the graph model.
  • the network structure diagram may not be visualized, and it can be set according to actual needs.
  • the relationship between the bank card reserved mobile phone number and the recharged mobile phone number can also be used for composition; or the relationship network composition of the user’s mobile phone number, the bank card reserved mobile phone number, and the recharged mobile phone number can be used according to the actual needs of the user.
  • the initial confidence of each node included in the graph model can be determined.
  • the nodes included in the graph model include but are not limited to multiple user numbers and multiple recharge numbers.
  • Confidence refers to the degree to which a specific individual believes in the authenticity of a specific proposition, that is, the probability.
  • the initial confidence of each node refers to the initial probability of each node being involved in the crime.
  • step S310 the process of determining the initial confidence of each node is shown in FIG. 3, and may include step S310 to step S330:
  • step S310 the confidence data set of the multiple users is constructed according to the confidence score data and historical reference numbers of the multiple users.
  • the confidence score data of each of the multiple users includes but is not limited to those shown in Table 3.
  • the historical reference number can be all hacked numbers stored in the hacked case database, and the hacked number can be the mobile phone number involved in a historical fraud case.
  • the confidence score data set for each user shown in Table 3 can be constructed.
  • the historical reference numbers can also be part of the gang-related numbers stored in the gang-related case database.
  • the gang-related numbers in the gang-related case database can be known global gang-related numbers, and some of the gang-related numbers can be certain The gang-related numbers in this area are not limited in the embodiment of the present disclosure.
  • step S320 a confidence prediction function is trained based on the confidence data set to obtain a trained confidence prediction function.
  • a confidence prediction function can be trained through the data in the confidence data set.
  • the confidence prediction function may be a classifier function, and softmax may be used to train it to optimize the performance of the confidence prediction function, thereby obtaining a trained confidence prediction function.
  • the parameter obtained is W b .
  • step S330 each user is predicted according to the trained confidence prediction function, and the initial confidence of each node corresponding to each user is determined.
  • each historical order data of each user can be input into the trained confidence prediction function, and then the user number in the historical order data and the corresponding recharge number can be determined according to the confidence prediction function with the parameter W b The initial confidence of.
  • a more accurate confidence preset function can be trained based on the confidence data set, and then an accurate initial confidence can be obtained.
  • step S120 the community characteristics and comprehensive characteristics for the user are obtained through the order data.
  • the community feature is used to describe the social relationship of the user.
  • the following steps can be used: the first step is to construct the MRF graph model, and the composition steps are the same as those shown in Figure 2. Here No longer.
  • the second step is to use the penetration algorithm to process the graph model to obtain the clique set, where the clique set is used to describe the communities to which multiple nodes belong, and the nodes in a clique set belong to the same community.
  • clique infiltration algorithm For an MRF graph, if there is a complete subgraph (there are edges between any two nodes), and the number of nodes is k, then this complete subgraph can be called a complete subgraph. k-clique. Furthermore, if there are k-1 common nodes between two k-cliques, then the two cliques are said to be "adjacent". Such a string of cliques adjacent to each other constitutes the largest set, which can be called a community. Among them, the complete subgraph includes at least two nodes, and there are edges between any two nodes.
  • the group set obtained by the group penetration algorithm may be, for example: [('mobile phone number 1','mobile phone number 2','mobile phone number 3'),('mobile phone number 3','mobile phone number 4','Mobile number 5'),...].
  • the number of nodes k included in each clique set can be manually adjusted, and k determines when the total subgraph generated by how many nodes are adjacent, it can be used as a community.
  • a clique can include 3 or 5 nodes and so on.
  • the third step is to label the user's confidence data set according to the group set and historical reference number to obtain the community characteristics.
  • the historical reference number refers to the gang-related number in the gang-related case database
  • the user's confidence data set refers to the user's multiple confidence score data shown in Table 3.
  • Labeling the user's confidence data set refers to adding tags to the data in each confidence data set.
  • the user is not in the group, set the ratio of the number of hacked numbers in the group where the user number is located to -1. In addition, you can also mark the number of recharge numbers in the user's group/the number of recharge numbers chargeMobileCntInCliqueRatio. If the user is not in the group, set the number of the user number in the group/the number of recharged numbers to -1.
  • the user-specific community characteristics can be obtained, and the community characteristics can be further added to the confidence data set shown in Table 3 to update the confidence data set.
  • the community features can also include features such as community size and community density, and which community features to use can be determined according to actual scenarios.
  • the comprehensive feature is used to comprehensively describe at least one dimensional feature, and the at least one dimensional feature includes but is not limited to one or more of age features, real-time consumption features, geographic features, and consumer business features.
  • at least one dimensional feature is an age habit feature and a real-time consumption feature as an example for description.
  • the comprehensive characteristic may be, for example, a risk level characteristic.
  • the risk level feature can be obtained by clustering data of at least one dimensional feature.
  • the kmeans clustering algorithm can be used, or any suitable clustering algorithm such as K-MEDOIDS algorithm, CLARANS algorithm, DBSCAN algorithm, OPTICS algorithm, and DENCLUE algorithm can be used. Because the older people use mobile phones to pay less frequently; youth and middle-aged people have different active periods, so the risk level characteristics can be obtained by rating the payment risk based on such perceptions.
  • the cluster center is [age group center, recharge order number segment center, recharge amount segment center]
  • the obtained risk level feature is the risk level of people of different ages, the ratio of the number of recharge orders, and the total recharge transaction amount n within a day. It should be noted that the larger or smaller the central value of the age group the user is in, the more orders and the higher the recharge amount, the higher the risk level.
  • the cluster center is [age group center, active time center], and the risk level characteristics obtained are people of different age groups in a day, active The risk level of the transaction that occurred at the time.
  • the user's recharge consumption frequency and frequency For real-time consumption characteristics, it describes the user's recharge consumption frequency and frequency. Small, high-frequency behaviors can be used as the basis for risk inference, and the risk evaluation level can be given. If the number of user recharge requests in the past hour and the total recharge amount of the user in the past hour are clustered, the cluster center is [recharge request number segment center, recharge amount segment center], and the risk level feature is The risk level of the recharge order issued by the user within one hour from the time of the current order. The greater the number of recharge requests in the category center of the user, the greater the recharge amount, and the higher the risk level.
  • the comprehensive characteristics can be added to the confidence data set shown in Table 3 to update the confidence data set.
  • user behaviors can also be described by geographic features and consumer business preference.
  • community features and comprehensive features it is possible to increase the dimension describing the user's order data, thereby comprehensively describing the user's recharge behavior from multiple dimensions. In this way, the deviation caused when a single feature describes the user's recharge behavior can be avoided, thereby improving the accuracy.
  • step S130 the message value of the graph model is determined according to the community feature, the comprehensive feature, the initial confidence, and the message update rule, the graph model is determined according to the message value, and the graph model The model calculates the confidence of the order data corresponding to the current behavior to determine the recognition result of the current behavior according to the confidence.
  • the message is mainly used to describe the mutual influence between local nodes in the MRF graph model.
  • Setting the message that is, setting the rules for mutual influence between local nodes, that is, the message update rule, is mainly to determine the transfer update formula of the confidence between the local nodes in the graph model.
  • the message value used in determining the graph model in this step is the target message value obtained after optimizing the initial message value, where the initial message value refers to the initial edge weight of the graph model, and the target message value refers to the target of the graph model
  • the edge weight for example, may be an optimized or trained edge weight.
  • the message update rule can be as shown in formula (1), for example:
  • X is a comprehensive feature extracted from the order data associated with the user's historical behavior, and the feature weight W k is determined by the logistic algorithm training.
  • V iu represents the user's mobile phone number, and V jc represents the recharged mobile phone number.
  • W kViuVjc refers to the weight value from node V iu to node V jc in state k.
  • X kViuVjc refers to the characteristic value from node V iu to node V jc in state k.
  • ⁇ iu,jc (V iu ,V jc ) is the edge weight from node V iu to node V jc .
  • the message value refers to the edge weight of the graph model
  • determining the graph model based on the message value refers to determining the edge weight of the graph model constructed according to the order data according to the message value, which makes the information in the graph model richer.
  • the process of determining the initial message value is shown in FIG. 4, which may include steps S410 to S430.
  • step S410 weight training data is generated according to the community feature and the comprehensive feature.
  • the community features and comprehensive features can be sorted to generate weight training data, which is represented by X.
  • the weight training data obtained by sorting may include the following features: X1, the group closest to the user number, and the shortest distance from its center to the user number. X2, the ratio of hacked numbers in the group where the user number is located. If the user is not in the group, set X2 to -1. X3, the number of the user's recharge number in the group the user is in/the number of the user's recharged mobile phones If the user is not in the group, set X3 to -1. X4, within a day, people of different ages, the ratio of the number of recharge orders m, the risk level when the total recharge transaction amount is n.
  • X5 within a day, people of different ages, the risk level of transactions that occur when they are active.
  • X6 the risk level of the recharge order issued by the user within one hour from the time of the current order.
  • X7 the risk level of the recharge order issued by the user within 5 minutes from the time of the current order.
  • step S420 the weight training data and the confidence data set are separately trained to obtain multiple weight coefficients.
  • the confidence data set may be as shown in Table 3.
  • the confidence level of the user recharge transaction distribution shown in Table 3 of the trained confidence data set may include label1 representing the proportion of orders with recharge number equal to the bound number in the user's historical order data. Indicates that in the user's historical order data, the recharge number is not equal to the bound number, but the recharge number does not involve hacking (not necessarily a secure mobile phone number), which accounts for label2. Indicates that in the user's historical order data, the recharge number is not equal to the bound number, but the recharge number is label3 of the proportion of orders involving hacked numbers.
  • the multiple weight coefficients refer to the weight coefficients for label1, label2, and label3.
  • the initial message value for label1, label2, and label3 can be obtained.
  • Different machine learning algorithms can be used to train the weight training data and the confidence of different user recharge transaction distributions to obtain multiple weight coefficients.
  • (X, label1) and the Lasso regression algorithm can be used to train the regression model to obtain the weight coefficient W 1 corresponding to the trained regression model.
  • (X, label2) and the support vector machine algorithm including the tanh kernel are used to train the support vector machine model, and the weight coefficient W 2 corresponding to the trained support vector machine model is obtained.
  • weight coefficient W 3 corresponding to the trained support vector machine model is obtained.
  • machine learning models such as ridge logistic regression models, logistic regression models, and support vector machine models under different kernel functions can also be trained to obtain weight coefficients, which are not specifically limited in this example.
  • step S430 the multiple weight coefficients and the comprehensive feature are input into the message update rule to obtain the initial message value of the graph model.
  • step S420 On the basis of step S420, the weight coefficients for label1, label2, and label3 and the corresponding comprehensive feature X are substituted into the above formula (1), so that the initial message value of the graph model, that is, the initial edge weight, can be obtained.
  • the initial message value may be updated to obtain the target message value.
  • ⁇ c refers to the confidence level of the recharge number node, for example, it can be [0.5, 0.5].
  • W b , W are the values obtained by using multiple random sampling and cross-validation, W b refers to the coefficient of the trained confidence prediction function for calculating the initial confidence, and W refers to the weight parameter.
  • the target message value refers to the edge weight of a relatively stable and better-performing graph model.
  • the initial message value can be optimized and updated to obtain the target message value according to the initial message value.
  • the co-occurrence matrix used to describe the number of times that multiple user numbers and multiple recharge numbers co-occur in step S221 can be updated.
  • the co-occurrence matrix can be The co-occurrence frequency of each node in the matrix is updated to the message value, that is, the message value is updated from the co-occurrence frequency to the co-occurrence frequency.
  • the pgmpy code in the python package can be used to construct the graph model, and the edge weight in the graph model is the co-occurrence frequency.
  • the confidence that the recharge number meets a preset condition is calculated according to the initial message value.
  • the preset condition refers to a condition to be determined ultimately, for example, it may be a hacking condition
  • the recharged number meeting the preset condition means that the recharged number is involved in hacking.
  • the confidence propagation algorithm can be used to determine the confidence that the top-up number is involved in hacking.
  • the confidence propagation algorithm uses the mutual information between nodes to update the current marking status of the entire MRF. After multiple iterations, the confidence of all nodes no longer changes. It is said that the marking of each node is the most Excellent mark, MRF has also reached a state of convergence.
  • the edge weight of each of the multiple edges of the graph model can be determined.
  • the node can be determined by the edge weight of all the edges connected by the node The confidence level I C.
  • the weight of the edge between node 1 and recharge number 1 is a
  • the edge weight between node 2 and recharge number 1 is b
  • node 3 The weight of the edge between node 4 and the recharge number 1 is c
  • the weight of the edge between node 4 and recharge number 1 is d
  • a+b+c+d is equal to 1.
  • the top-up number 1 its confidence is the product of the side weights of these four sides, namely a*b*c*d.
  • step S520 the confidence loss of all top-up numbers that meet the preset conditions is calculated.
  • the confidence loss function is shown in formula (2):
  • t is the index of a certain hacking number in the hacking library
  • T is the total number of hacking numbers in the hacking library
  • R ct is the real hacking label (if hacking is involved, set it to 1, otherwise, it is set to 0)
  • I ct is the confidence that the predicted recharge number is involved in hacking.
  • the above formula (2) can be used to calculate the confidence loss of all recharge numbers included in the historical order data.
  • step S530 the initial message value is optimized by minimizing the confidence loss to obtain the target message value.
  • the confidence loss can be minimized.
  • it can be as shown in formula (3):
  • the initial message value can be calculated based on the user's historical confidence, which provides relatively comprehensive user prior information and enables the calculated initial message value More accurate and comprehensive. Further, the optimal parameters are determined by minimizing the confidence loss, and the target message value is determined according to the optimal parameters, so that the target message value of the graph model, that is, the edge weight of the graph model, can be more accurate.
  • the graph model can be determined by the target message value obtained after optimization, so that the performance of the graph model is more stable.
  • the target message value can be recalculated according to the optimal parameters ( ⁇ c , W b , W), and the co-occurrence matrix can be updated according to the target message value, and the co-occurrence frequency of each node in the co-occurrence matrix Update to the target message value, which is the edge weight in the generated graph model. Further, the co-occurrence frequency of each node is updated to the co-occurrence matrix E of the target message value, the graph model is constructed using the pgmpy function in the python package, and then the confidence level I C of the order data corresponding to the current behavior is calculated according to the graph model.
  • the current behavior refers to the current recharge behavior
  • the order data refers to the recharge order data included in the current recharge behavior, and there may be at least one order data corresponding to the current recharge request.
  • Calculating the confidence may be determining the confidence of the recharge number included in the order data according to the graph model. Calculating the confidence level of the recharge number based on the graph model with stable performance can make the calculated confidence level more accurate, thereby more accurately identifying the user recharge behavior, and performing recharge anti-fraud in a timely and efficient manner.
  • the method further includes: alerting the order data whose confidence is greater than a preset value.
  • the preset value can be set according to actual accuracy requirements, for example, it can be set to 0.7 or 0.8 and so on. For example, if the confidence level of the recharge number included in the order data 1 is 0.9 calculated in step S130, the order data 1 can be filtered out. While filtering out the order data 1, an early warning can be carried out.
  • the way of early warning includes, for example, generating a prompt message, and the prompt message may include an order number describing order data 1 or other information.
  • the prompt information can be sent to the supervisory department, and after receiving the prompt information indicating an early warning, the supervisory department can check and review the order data again.
  • the behavior recognition method provided in the exemplary embodiment of the present disclosure, on the one hand, obtains community features and comprehensive features for users through order data corresponding to historical behaviors, increases feature description dimensions, avoids errors caused by single features, and improves Accuracy; on the one hand, the message value of the graph model is determined through community characteristics, comprehensive characteristics, initial confidence, and message update rules, and then the graph model is constructed based on the message value, which can obtain accurate message values and build an accurate graph model; another On the one hand, the obtained graph model is used to calculate the confidence of the order data corresponding to the current behavior, so that user behavior can be identified quickly and accurately based on the confidence, and fraud risks can be avoided in time.
  • the behavior recognition apparatus 600 may include:
  • the confidence calculation module 601 is used to construct a graph model based on the order data corresponding to the user's historical behavior, and determine the initial confidence of each node included in the graph model;
  • the feature extraction module 602 is configured to obtain community features and comprehensive features for the user through the order data;
  • the identification control module 603 is configured to determine the message value of the graph model according to the community feature, the comprehensive feature, the initial confidence level, and the message update rule, determine the graph model according to the message value, and pass all
  • the graph model calculates the confidence of the order data corresponding to the current behavior to determine the recognition result of the current behavior according to the confidence.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • an electronic device capable of implementing the above method is also provided.
  • the electronic device 700 according to this embodiment of the present invention will be described below with reference to FIG. 7.
  • the electronic device 700 shown in FIG. 7 is only an example, and should not bring any limitation to the function and application scope of the embodiment of the present invention.
  • the electronic device 700 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 700 may include, but are not limited to: the aforementioned at least one processing unit 710, the aforementioned at least one storage unit 720, and a bus 730 connecting different system components (including the storage unit 720 and the processing unit 710).
  • the storage unit stores program code, and the program code can be executed by the processing unit 710, so that the processing unit 710 executes the various exemplary methods described in the "Exemplary Method" section of this specification. Implementation steps.
  • the processing unit 710 may perform the steps shown in FIG. 1.
  • the storage unit 720 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 7201 and/or a cache storage unit 7202, and may further include a read-only storage unit (ROM) 7203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 720 may also include a program/utility tool 7204 having a set (at least one) program module 7205.
  • program module 7205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 730 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the display unit 740 may be a display with a display function to display the processing result obtained by the processing unit 710 executing the method in this exemplary embodiment through the display.
  • the display includes, but is not limited to, a liquid crystal display or other displays.
  • the electronic device 700 can also communicate with one or more external devices 900 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable users to interact with the electronic device 700, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 700 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 750.
  • the electronic device 700 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 760. As shown in the figure, the network adapter 760 communicates with other modules of the electronic device 700 through the bus 730.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • an electronic device including: a processor; and a memory for storing executable instructions of the processor;
  • the processor is configured as:
  • the processor is further configured to:
  • order data corresponding to the historical behavior, where the order data includes a user number and a recharge number associated with the user;
  • the graph model is constructed.
  • the processor is further configured to:
  • the user number and the recharge number are combined as an index to construct a co-occurrence array, and the graph model is constructed based on the co-occurrence array.
  • the processor is further configured to:
  • the processor is further configured to:
  • the processor is further configured to:
  • the at least one dimensional characteristic includes one or more of age habit characteristics, real-time consumption characteristics, geographic characteristics, and consumer business characteristics, and the comprehensive characteristic includes risk level characteristics.
  • the processor is further configured to:
  • the multiple weight coefficients and the comprehensive feature are input into the message update rule to obtain the initial message value of the graph model.
  • the processor is further configured to:
  • the co-occurrence matrix is updated, and the initial message value is updated according to the co-occurrence frequency of each node in the co-occurrence matrix to obtain the target message value of the graph model.
  • the processor is further configured to:
  • the initial message value is optimized by minimizing the confidence loss to obtain the target message value.
  • the processor is further configured to:
  • the graph model is generated.
  • the processor is further configured to:
  • a computer-readable storage medium on which is stored a program product capable of implementing the above method in this specification.
  • various aspects of the present invention may also be implemented in the form of a program product, which includes program code, and when the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present invention described in the above "Exemplary Method" section of this specification.
  • a program product 800 for implementing the above method according to an embodiment of the present invention is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be installed in a terminal device, For example, running on a personal computer.
  • the program product of the present invention is not limited thereto.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of the present invention can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural styles. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computing device (for example, using Internet service providers) Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers Internet service providers
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, it realizes:
  • order data corresponding to the historical behavior, where the order data includes a user number and a recharge number associated with the user;
  • the graph model is constructed.
  • the user number and the recharge number are combined as an index to construct a co-occurrence array, and the graph model is constructed based on the co-occurrence array.
  • the at least one dimensional characteristic includes one or more of age habit characteristics, real-time consumption characteristics, geographic characteristics, and consumer business characteristics, and the comprehensive characteristic includes risk level characteristics.
  • the multiple weight coefficients and the comprehensive feature are input into the message update rule to obtain the initial message value of the graph model.
  • the co-occurrence matrix is updated, and the initial message value is updated according to the co-occurrence frequency of each node in the co-occurrence matrix to obtain the target message value of the graph model.
  • the initial message value is optimized by minimizing the confidence loss to obtain the target message value.
  • the graph model is generated.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种行为识别方法及装置、电子设备,该方法包括:基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度(S110);通过所述订单数据得到针对所述用户的社区特征以及综合特征(S120);通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果(S130)。

Description

行为识别
本申请要求于2019年02月18日提交、申请号为201910120241.2、发明名称为“行为识别方法及装置、电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及互联网技术领域,具体而言,涉及一种行为识别方法、行为识别装置、电子设备以及计算机可读存储介质。
背景技术
由于互联网金融具有成交量大、监控手段不完善等特点,使其很容易成为欺诈分子刷单、套现的首选。而手机充值支付的单笔金额小、日交易量大,其中发生的欺诈行为更容易被监控系统漏报。
常见的手机充值反欺诈方案中,一般根据用户的账户信息、行为特点等实时信息,判断用户当前的充值请求是否是欺诈行为。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的目的在于提供一种行为识别方法及装置、电子设备、存储介质,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的不能准确识别欺诈行为的问题。
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。
根据本公开的一个方面,提供一种行为识别方法,包括:基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度;通过所述订单数据得到针对所述用户的社区特征以及综合特征;通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
在本公开的一种示例性实施例中,基于用户的历史行为对应的订单数据构建图模型包括:获取所述历史行为对应的订单数据,所述订单数据中包括与所述用户关联的用户号码以及充值号码;基于所述用户号码与所述充值号码之间的关联关系,构建所述图模型。
在本公开的一种示例性实施例中,基于所述用户号码与所述充值号码之间的关联关系,构建所述图模型包括:按照所述用户号码进行分组,构建所述用户号码与所述充值号码的共现矩阵,并根据所述共现矩阵构建所述图模型;或将所述用户号码以及所述充值号码联合作为索引构建共现数组,并根据所述共现数组构建所述图模型。
在本公开的一种示例性实施例中,确定所述图模型中包含的每个节点的初始置信度包括: 根据多个用户的置信分数据以及历史参考号码,构建所述多个用户的置信数据集;基于所述置信数据集对一置信度预测函数进行训练,得到训练好的置信度预测函数;根据训练好的置信度预测函数对每个用户进行预测,确定每个用户对应的每个节点的所述初始置信度。
在本公开的一种示例性实施例中,通过所述订单数据得到针对所述用户的社区特征以及综合特征包括:基于所述订单数据构建所述图模型;利用渗透算法对所述图模型进行处理,得到团集合;根据所述团集合与历史参考号码对用户的置信数据集进行标注,以得到所述社区特征。
在本公开的一种示例性实施例中,通过所述订单数据得到针对所述用户的社区特征以及综合特征包括:获取针对所述用户的至少一个维度特征的数据,并对所述至少一个维度维度特征的数据进行聚类,得到所述综合特征。
在本公开的一种示例性实施例中,所述至少一个维度特征包括年龄习惯特征、实时消费特征、地理特征以及消费业务特征中的一种或多种,所述综合特征包括风险等级特征。
在本公开的一种示例性实施例中,通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值包括:根据所述社区特征和所述综合特征生成权重训练数据;分别对所述权重训练数据以及所述置信数据集进行训练,得到多个权重系数;将所述多个权重系数以及所述综合特征输入所述消息更新规则,得到所述图模型的初始消息值。
在本公开的一种示例性实施例中,所述方法还包括:更新所述共现矩阵,并根据所述共现矩阵中各节点的共现频次对所述初始消息值进行更新,得到所述图模型的目标消息值。
在本公开的一种示例性实施例中,根据所述共现矩阵中各节点的共现频次对所述初始消息值进行更新,得到所述图模型的目标消息值包括:通过所述初始消息值计算所述充值号码满足预设条件的置信度;计算所有满足预设条件的充值号码的置信度损失;通过将所述置信度损失最小化对所述初始消息值进行优化,得到所述目标消息值。
在本公开的一种示例性实施例中,根据所述消息值确定所述图模型包括:针对所述共现矩阵以及所述目标消息值,生成所述图模型。
在本公开的一种示例性实施例中,所述方法还包括:对所述置信度大于预设值的订单数据进行预警。
根据本公开的一个方面,提供一种行为识别装置,包括:置信度计算模块,用于基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度;特征提取模块,用于通过所述订单数据得到针对所述用户的社区特征以及综合特征;识别控制模块,用于通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
根据本公开的一个方面,提供一种电子设备,包括:处理器;以及
存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的行为识别方法。
根据本公开的一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的行为识别方法。
本公开示例性实施例中提供的一种行为识别方法、行为识别装置、电子设备以及计算机可读存储介质中,一方面,通过历史行为对应的订单数据得到针对用户的社区特征以及综合 特征,增加了特征描述维度,避免了单一特征造成的误差,提高了准确率;一方面,通过社区特征、综合特征、初始置信度以及消息更新规则确定图模型的消息值进而根据消息值构建图模型,能够得到准确的消息值并且能够构建准确的图模型;另一方面,通过得到的图模型计算当前行为对应的订单数据的置信度,从而可以根据置信度快速准确地进行用户行为识别,及时避免欺诈风险。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性示出本公开示例性实施例中一种行为识别方法示意图;
图2示意性示出本公开示例性实施例中构建图模型的示意图;
图3示意性示出本公开示例性实施例中确定节点初始置信度的示意图;
图4示意性示出本公开示例性实施例中计算初始消息值的示意图;
图5示意性示出本公开示例性实施例中对初始消息值进行优化的示意图;
图6示意性示出本公开示例性实施例中一种行为识别装置的框图;
图7示意性示出本公开示例性实施例中一种电子设备的框图;
图8示意性示出本公开示例性实施例中一种程序产品。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
本公开提供的行为识别方法可以应用于电子设备中,该电子设备可以是手机、平板电脑、台式电脑等终端,也可以为服务器,如一台服务器或者一个服务器集群。
本示例实施方式中首先提供了一种行为识别方法,可以应用于各种反欺诈场景中,例如充值反欺诈、套现欺诈或者是电信欺诈等等。下面将参考图1所示对该行为识别方法进行详细 描述。
在步骤S110中,基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度。
本示例性实施例中,用户可以为注册至少一个平台的用户。历史行为可以为该用户在所有平台上的历史充值行为。订单数据指的是与用户对应的所有历史订单数据,例如可以为充值请求对应的充值订单数据,该订单数据中可以包括与用户对应的用户号码,用户号码可以是与用户账号即用户ID(Identity,身份标识)绑定的手机号,订单数据中还可以包括订单请求中包含的充值号码,该充值号码可以是用户账号进行充值的手机号。用户号码与充值号码可以相同,也可以不同,此处不作特殊限定。
其中,用户账号还可以是用户登录目标应用的账号,还可以是其他能唯一确定用户的标识,本公开实施例对用户账号不做限定,目标应用可以是为用户提供充值服务的应用。而订单数据中包括订单请求中包含的充值号码,可以是用户账号进行充值的手机号,也可以是用户账号进行充值的其他账号,例如,游戏账号等,本公开实施例对充值号码不做限定。
基于用户的历史行为对应的订单数据构建图模型的过程如图2所示,可以包括步骤S210和步骤S220:
在步骤S210中,获取所述历史行为对应的订单数据,所述订单数据中包括与所述用户关联的用户号码以及充值号码。
在本步骤中,每一个订单数据均可对应用户的一次充值行为。获取的订单数据可以包括用户ID、用户绑定号码、用户充值号码以及为充值号码的充值金额等等。除此之外,还可包括订单号码以及订单生成时间等等。
在步骤S220中,基于所述用户号码与所述充值号码之间的关联关系,构建所述图模型。
在本步骤中,用户号码以及充值号码之间的关联关系可以用一个用户号码为某一个充值号码的充值次数来描述。图模型例如可以为MRF(MRF-Markov Random Field,马尔科夫随机场)模型,也可以为条件随机场模型等等。MRF模型即为MRF图,例如可以为有向图。为了便于分析因果关系,MRF模型将图像模拟成一个随机变量组成的网格,其中的每一个变量具有明确的对由其自身之外的随机变量组成的近邻的依赖性即(马尔科夫性)。在一种可能实现方式中,步骤S220中的过程包括步骤S221和步骤S222两种方式:
  充值号码1 充值号码2 充值号码3 ……
用户号码1 2 1 0 ……
用户号码2 0 0 3 ……
用户号码3 0 0 2 ……
…… …… …… …… ……
表1
在步骤S221中,按照所述用户号码进行分组,构建所述用户号码与所述充值号码的共现矩阵,并根据所述共现矩阵构建所述图模型。按照用户号码分组指的是按照用户绑定的用户号码将所有历史订单数据分组,由于用户可以绑定至少一个用户号码,则一个用户号码可分为一个组。共现矩阵指的是多个用户号码与多个充值号码共同出现的次数构成的矩阵。用户号码与充值号码的共现矩阵例如表1所示。
其中,共现矩阵E中的元素Eij表示的是用户号码i与充值号码j之间发生的订单数目。Eij=0代表在号码i和j之间,不存在历史充值订单。
在得到共现矩阵之后,可以利用networkx构建一个空的图模型。例如可通过以下代码构建图模型:
import networkx as nx
def createGraph(A,V):
G=nx.Graph()#建立一个空的图
G.add_nodes_from(V)#从v中添加节点
for edge in zip(A.index,A['cnt']):
nodes=edge[0][:].split('_')
node1=nodes[0]
node2=nodes[1]
G.add_edges_from(node1,node2,weight=edge[1])
return G
当用户号码和充值号码数目很多时,由于共现矩阵的存储需求过大,可能将导致内存溢出,可采用共现数组构建图模型。在步骤S222中,将所述用户号码以及所述充值号码联合作为索引构建共现数组,并根据所述共现数组构建所述图模型。在本步骤中,与步骤S221类似地,可先获取历史行为对应的订单数据,所述订单数据中包括与所述用户关联的用户号码以及充值号码。接下来建立用于描述多个用户号码与多个充值号码共同出现的次数的共现数组A。例如,参考表2中所示,可将用户号码和充值号码联合作为索引,构建共现数组。需要说明的是,为了节省存储空间,共现数组中只存储实际存在充值行为的用户号码以及充值号码的组合。
  数量
用户号码1_充值号码1 2
用户号码1_充值号码2 1
用户号码2_充值号码3 2
用户号码3_充值号码3 2
…… ……
表2
在得到共现数组A之后,可以利用networkx构建一个空的图模型。例如可以通过以下代码构建图模型:
def createGraph(A,V):
G=nx.Graph()#建立一个空的图
G.add_nodes_from(V)#从v中添加节点
for edge in zip(A.index,A['cnt']):
nodes=edge[0][:].split('_')
node1=nodes[0]
node2=nodes[1]
G.add_edges_from(node1,node2,weight=edge[1])
return G
在图2的基础上,图模型中的边权重可以暂时设置为用户号码与充值号码的共现次数。通过步骤S221以及步骤S222中的方法建立图模型之后,可以可视化网络结构图,例如调用networkx的接口对图模型进行可视化。当图模型中的节点数目过多时,画图渲染所需时间会过长。这时候需要其他可视化工具,如pm大型可视化工具对图模型进行可视化处理。当然也可以不可视化网络结构图,可以根据实际需求而设置。
需要补充的是,还可以采用银行卡预留手机号与充值手机号的关系进行构图;或者采用用户手机号、银行卡预留手机号、充值手机号的关系网构图,可根据用户实际需求和应用场景而进行设置。
在构建图模型之后,可确定所述图模型中包含的每个节点的初始置信度。图模型中包括的节点包括但不限于多个用户号码以及多个充值号码。对于每一个节点而言,可确定其初始置信度。置信度指的是特定个体对待特定命题真实性相信的程度,即概率。每个节点的初始置信度指的是每个节点涉黑的初始概率。
本示例中,确定每个节点的初始置信度的过程如图3中所示,可以包括步骤S310至步骤S330:
在步骤S310中,根据多个用户的置信分数据以及历史参考号码,构建所述多个用户的置信数据集。
在本步骤中,多个用户中的每个用户的置信分数据,即置信分数数据包括但不限于表3中所示。历史参考号码可以是存储在涉黑案件库中的所有涉黑号码,该涉黑号码可以是历史欺诈案件中的涉案手机号。根据多个用户对应的置信分数据以及存储在涉黑案件库中的所有涉黑号码,可构建表3中所示的针对每个用户的置信数据集。
历史参考号码还可以是存储在涉黑案件库中的部分涉黑号码,该涉黑案件库中的涉黑号码可以是已知的全球范围内的涉黑号码,而部分涉黑号码可以是某个地区内的涉黑号码,本公开实施例对此不做限定。
Figure PCTCN2020071002-appb-000001
表3
在步骤S320中,基于所述置信数据集对一置信度预测函数进行训练,得到训练好的置信度预测函数。
在步骤S310中建立的置信数据集的基础上,可以通过置信数据集中的数据对一个置信度预测函数进行训练。在一种可能实现方式中,置信度预测函数可以为分类器函数,可采用softmax对其进行训练,以使置信度预测函数性能最优,从而得到训练好的置信度预测函数。训练好的置信度预测函数中,得到的参数为W b
在步骤S330中,根据训练好的置信度预测函数对每个用户进行预测,确定每个用户对应的每个节点的所述初始置信度。
在步骤S320的基础上,可以将每个用户的每个历史订单数据输入训练好的置信度预测函数,进而根据参数为W b的置信度预测函数确定历史订单数据中的用户号码以及充值号码对应的初始置信度。通过本示例性实施例中的方法,能够根据置信数据集训练比较准确的置信度预设函数,进而得到准确的初始置信度。
在步骤S120中,通过所述订单数据得到针对所述用户的社区特征以及综合特征。
本示例性实施例中,社区特征用于描述用户的社交关系,在生成社区特征时,可以采用以下步骤:第一步,构建MRF图模型,构图步骤与图2中所示的相同,此处不再赘述。第二步,利用渗透算法对图模型进行处理得到团集合,其中团集合用于描述多个节点属于的社区,处于一个团集合的节点属于同一个社区。
团渗透算法clique的基本思想在于:对于一个MRF图而言,如果其中有一个完全子图(任意两个节点之间均存在边),节点数是k,那么这个完全子图就可称为一个k-clique。进而,如果两个k-clique之间存在k-1个共同的节点,那么就称这两个clique是“相邻”的。彼此相邻的这样一串clique构成最大集合,就可以称为一个社区。其中,完全子图包括至少两个节点,且任意两个节点之间均存在边。
本示例性实施例中,利用团渗透算法得到的团集合例如可以为:[(‘手机号1’,‘手机号2’,‘手机号3’),(‘手机号3’,‘手机号4’,‘手机号5’),…]。需要注意的是,每一个团集合汇总包括的节点数k可手动调节,k确定了当有多少个节点生成的完全子图相邻时,可以作为一个社区。例如一个团集合中可包括3个或者是5个节点等等。
第三步,根据团集合与历史参考号码对用户的置信数据集进行标注,得到社区特征。历史参考号码指的是涉黑案件库中的涉黑号码,用户的置信数据集指的是表3中所示的用户的多种置信分数据。对用户的置信数据集进行标注指的是为每一个置信数据集中的数据添加标签。在进行标注时,可标注与用户号码距离最近的团的中心到用户号码的最短距离minDistToClosetsClique,该最短距离即图上的边数。还可以标注用户号码所在的团中,涉黑号码比率fraudMobileCntInCliqueRatio。若用户不在团中,则将用户号码所在团中涉黑号码的比率设置为-1。另外,还可以标注充值号码在用户所在的团中的数目/充值号码数目chargeMobileCntInCliqueRatio。若用户不在团中,将用户号码所在团中中的数目/充值号码数目设置为-1。在标注完这些数据后,可以得到针对用户的的社区特征,进一步可将社区特征添加至表3所示的置信数据集中,以对置信数据集进行更新。除此之外,社区特征中还可以包括社区大小以及社区密度等特征,使用哪些社区特征可根据实际场景进行确定。
除此之外,还可以获得针对用户的综合特征。此处的综合特征用于综合描述至少一个维度特征,至少一个维度特征包括但不限于年龄特征、实时消费特征、地理特征以及消费业务特征中的一种或多种。在本示例性实施例的手机充值涉黑的应用场景中,为了更全面地考虑用户的个人习惯以及年龄等因素的影响,以至少一个维度特征为年龄习惯特征和实时消费特征为例进行说明。
为了使得预测结果更准确,可将年龄习惯特征和实时消费特征均转化为一个综合特征来描述,该综合特征例如可以为风险等级特征。在一种可能实现方式中,可通过对至少一个维度特征的数据进行聚类,得到风险等级特征。在聚类时,可采用kmeans聚类算法,也可以采用K-MEDOIDS算法、CLARANS算法、以及DBSCAN算法、OPTICS算法、DENCLUE算 法等任意合适的聚类算法。由于年龄越大的人,使用手机支付的频率更低;青年和中年的活跃时段有所不同,因此可依据此类认知对支付风险评级得到风险等级特征。
举例而言,若对用户年龄、用户一天内的订单数目、用户一天内的总充值金额进行聚类,其聚类中心为[年龄段中心,充值订单数目分段中心,充值金额分段中心],得到的风险等级特征即为一天内,不同年龄段的人,充值订单数目m比,总充值交易额n时的风险等级。需要说明的是,用户所在的年龄段中心值越大或者越小,所在类中心的订单数越多、充值金额越大,风险等级越高。
若对用户年龄、用户一天24小时内每个小时的订单数目进行聚类,其聚类中心为[年龄段中心,活跃时段中心],得到的风险等级特征为一天内不同年龄段的人,活跃时发生的交易的风险等级。用户所在的类中心,年龄段中心越大或者越小,充值时间段中心越处于休息时间,风险等级越高。
对于实时消费特征而言,其描述用户的充值消费频率和频次。可将小额、高频次的行为作为危险推断依据,给出风险评判等级。若对用户过去一小时内充值请求数目以及用户过去一小时内总共充值的金额进行聚类,其聚类中心为[充值请求数分段中心,充值金额分段中心],得到的风险等级特征为从当前订单的时间起过去一小时内,用户发出的充值订单的危险等级。用户所在的类中心的充值请求数越大,充值金额越大,风险等级越高。
需要说明的是,在得到用户的综合特征之后,可以将综合特征添加至表3所示的置信数据集中,以对置信数据集进行更新。
此外,若在业务套现等其他反欺诈场景中,还可以通过地理特征以及消费业务种类偏好等特征描述用户的行为。本示例性实施例中,通过社区特征和综合特征,可以增加描述用户的订单数据的维度,从而从多个维度全面描述用户的充值行为。如此一来,可避免单特征描述用户充值行为时造成的偏差,从而提高准确率。另外,利用用户充值号码的关系网,针对不同用户的社交特性与消费特征,完成个性化的反欺诈识别。
在步骤S130中,通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
本示例性实施例中,消息主要用于描述MRF图模型中,局部节点之间的互相影响。设定消息,即设定局部节点之间互相影响的规则,即消息更新规则,主要是确定图模型中局部节点间置信度的转移更新公式。本步骤中确定图模型时使用的消息值为对初始消息值经过优化后得到的目标消息值,其中,初始消息值指的是图模型的初始边权重,目标消息值指的是图模型的目标边权重,例如可以为优化后的或者是训练后的边权重。消息更新规则可例如公式(1)所示:
Figure PCTCN2020071002-appb-000002
其中,X为从用户的历史行为关联的订单数据中抽取的综合特征,特征权重W k由logistic算法训练决定。P为k的取值范围中的最大值,P=3。k=1表示充值手机号是账户手机号的状态;k=2表示充值手机号不是账户手机号,但是充值手机号是否涉黑不明的状态;k=3表示充值手机号不是账户手机号,但是充值手机号涉黑的状态。V iu表示用户手机号,V jc表示充值 手机号。W kViuVjc指的是状态k下,由节点V iu到节点V jc的权重值。X kViuVjc指的是状态k下,由节点V iu到节点V jc的特征值。ψ iu,jc(V iu,V jc)为节点V iu到节点V jc的边权重。
其中,消息值指的是图模型的边权重,根据消息值确定图模型是指,根据消息值确定根据订单数据构建的图模型的边权重,使得图模型中的信息更加丰富。
通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值时,其中确定初始消息值的过程如图4中所示,可以包括步骤S410至步骤S430。
在步骤S410中,根据所述社区特征和所述综合特征生成权重训练数据。
在本步骤中,可以对社区特征和综合特征进行整理,生成权重训练数据,并用X表示。其中,整理得到的权重训练数据可以包括以下特征:X1,与用户号码距离最近的团,其中心到用户号码的最短距离。X2,用户号码所在的团中,涉黑号码比率。若用户不在团中,将X2设置为-1。X3,用户充值号码在用户所在的团中的数目/用户充值的手机数目。若用户不在团中,将X3设置为-1。X4,一天内,不同年龄段的人,充值订单数目m比,总充值交易额n时的风险等级。X5,一天内,不同年龄段的人,活跃时发生的交易的风险等级。X6,从当前订单的时间起,计一小时内,用户发出的充值订单的危险等级。X7,从当前订单的时间起,计5分钟内,用户发出的充值订单的危险等级。
在步骤S420中,分别对所述权重训练数据以及所述置信数据集进行训练,得到多个权重系数。
置信数据集可以如表3所示,训练的置信数据集表3中所示的用户充值交易分布的置信度,可包括表示用户历史订单数据中,充值号码等于绑定号码的订单占比label1。表示用户历史订单数据中,充值号码不等于绑定号码,但充值号码不涉黑的(不涉黑不一定是安全的手机号)订单占比label2。表示用户历史订单数据中,充值号码不等于绑定号码,但充值号码是涉黑号码的订单占比label3。
多个权重系数指的是分别针对label1、label2以及label3的权重系数,通过将每个权重系数代入上述公式(1),可得到分别针对label1、label2以及label3的初始消息值。可以采用不同的机器学习算法分别对权重训练数据以及不同的用户充值交易分布的置信度进行训练,以得到多个权重系数。例如,可以采用(X,label1)以及Lasso回归算法对回归模型进行训练,得到训练好的回归模型对应的权重系数W 1。采用(X,label2)以及包括tanh核的支持向量机算法对支持向量机模型进行训练,得到训练好的支持向量机模型对应的权重系数W 2。采用(X,label3)以及包括线性核的支持向量机算法对支持向量机模型进行训练,得到训练好的支持向量机模型对应的权重系数W 3。除此之外,还可以对岭逻辑回归模型、逻辑回归模型、不同核函数下的支持向量机模型等机器学习模型进行训练,得到权重系数,本示例中对此不作特殊限定。
在步骤S430中,将所述多个权重系数以及所述综合特征输入所述消息更新规则,得到所述图模型的初始消息值。
在步骤S420的基础上,分别将针对label1、label2以及label3的权重系数以及对应的综合特征X代入上述公式(1),从而可得到图模型的初始消息值,即初始边权重。
在根据一组参数(β c,W b,W)得到初始消息值之后,可对初始消息值进行更新以得到目标消息值。该组参数中,β c指的是充值号码节点的置信度,例如可以为[0.5,0.5]。W b,W是采用多次随机采样下与交叉验证得到的值,W b指的是计算初始置信度的训练好的置信度预测函数的系数,W指的是权重参数。目标消息值指的是比较稳定、性能较好的图模型的边权重。
在得到初始消息值之后,为了使得计算结果更准确,可对初始消息值进行优化和更新,以根据初始消息值得到目标消息值。在对初始消息值进行更新时,可对步骤S221中的用于描述多个用户号码与多个充值号码共同出现的次数的共现矩阵进行更新,在一种可能实现方式中,可将共现矩阵中各节点的共现频次更新为消息值,即消息值由共现次数更新为共现频次。在更新共现矩阵E之后,可利用python包中的pgmpy代码构建图模型,图模型中的边权重为共现频次。
对初始消息值进行优化得到图模型的目标消息值的具体过程如图5所示,包括步骤S510至步骤S530:
在步骤S510中,通过所述初始消息值计算所述充值号码满足预设条件的置信度。其中,预设条件指的是最终要判断的条件,例如可以为涉黑条件,充值号码满足预设条件指的是充值号码涉黑。可采用置信度传播算法确定充值号码涉黑的置信度。置信度传播算法利用节点与节点之间相互传递信息而更新当前整个MRF的标记状态,经过多次迭代后,所有节点的信度不再发生变化,就称此时每一个节点的标记即为最优标记,MRF也达到了收敛状态。
在通过置信度传播后,可以确定图模型的多条边中每个边的边权重,对于每一个节点代表的充值号码或用户号码而言,可通过节点连接的所有边的边权重确定该节点的置信度I C。例如,对于充值号码1而言,如有4个节点与其相连,且已知节点1与充值号码1之间的边权重为a,节点2与充值号码1之间的边权重为b,节点3与充值号码1之间的边权重为c,节点4与充值号码1之间的边权重为d,且a+b+c+d等于1。则对于充值号码1而言,其置信度为这四条边的边权重之积,即a*b*c*d。通过置信度传播算法,可减小数据计算量,从而提升计算效率。
在步骤S520中,计算所有满足预设条件的充值号码的置信度损失。置信度损失函数如公式(2)所示:
Figure PCTCN2020071002-appb-000003
其中,t为涉黑库中某一个涉黑号码的索引,T为涉黑库中涉黑号码的总数量,R ct为真实的涉黑标签(如果涉黑置1,反之置0),I ct为预测的充值号码涉黑的置信度。通过上述公式(2)可计算历史订单数据中包含的所有充值号码的置信度损失。
在步骤S530中,通过将所述置信度损失最小化对所述初始消息值进行优化,得到所述目标消息值。
在本步骤中,可以对置信度损失进行最小化,在一种可能实现方式中,可如公式(3)所示:
Figure PCTCN2020071002-appb-000004
也就是说,可多次迭代更新消息值、根据消息值计算充值号码涉黑的置信度以及计算充值号码涉黑的置信度损失三个步骤,直至置信度损失最小时停止迭代过程,得到最优的参数(β c,W b,W)其中,β c指的是充值号码节点的置信度,W b指的是计算初始置信度的训练好的置信度预测函数的系数,W指的是权重参数。如此一来,可通过将置信度损失最小化得到最优参数,从而使得计算的充值号码的置信度更准确。另外,在待优化参数比较多时,还可以 采用贪心算法进行迭代,从而加快处理速度。在置信度损失最小时,即可以将初始消息值优化为目标消息值。
通过确定的社区特征以及综合特征、初始置信度、消息更新规则确定初始消息值,可基于用户的历史置信度计算初始消息值,提供了相对全面的用户先验信息,能够使计算的初始消息值更准确,更全面。进一步地,通过置信度损失最小化确定最优参数,从而根据最优参数确定目标消息值,能够使得到的图模型目标消息值,即图模型的边权重更准确。
本示例性实施例中,可通过优化后得到的目标消息值确定图模型,使得该图模型的性能更稳定。在一种可能实现方式中,可根据最优的参数(β c,W b,W)重新计算目标消息值,并根据目标消息值更新共现矩阵,将共现矩阵中各节点的共现频次更新为目标消息值,即生成图模型中的边权重。进一步地,针对各节点的共现频次更新为目标消息值的共现矩阵E,利用python包中的pgmpy函数构建图模型,进而根据图模型计算当前行为对应的订单数据的置信度I C。当前行为指的是当前的充值行为,订单数据指的是当前的充值行为包括的充值订单数据,且当前充值请求对应的订单数据可以为至少一个。计算置信度可以为根据图模型确定订单数据中包含的充值号码的置信度。根据性能稳定的图模型计算充值号码的置信度,可以使得计算的置信度更准确,从而更精准的识别用户充值行为,及时高效地进行充值反欺诈。
需要补充的是,由于用户的社交关系、置信分值随着交易的发生有变动的可能性,所以可以每隔预设周期执行上述步骤S110至步骤S130,完成对用户信息的实时更新,从而重新确定图模型,以确保图模型的准确性。
此外,本示例性实施例中,所述方法还包括:对所述置信度大于预设值的订单数据进行预警。预设值可以根据实际精度要求进行设置,例如可以设置为0.7或0.8等等。举例而言,若通过步骤S130中计算得到订单数据1中包括的充值号码的置信度为0.9,则可以筛选出订单数据1。在筛选出订单数据1的同时,可进行预警。进行预警的方式例如包括生成一个提示信息,提示信息中可包括描述订单数据1的订单编号或其他信息。进一步地,可将提示信息发送监管部门,监管部门在接收到表示预警的提示信息之后,可对该订单数据再次进行查验审核。通过对置信度大于预设值的订单数据进行预警,可自动识别充值欺诈行为,提高反欺诈监控的准确性和高效性。
由于常见的手机充值反欺诈方案中,一般根据用户的账户信息、行为特点等实时信息,判断用户当前的充值请求是否是欺诈行为,因此输入的数据都是单一特征数据,通过单一数据构建的图模型不能准确全面识别用户行为是否为欺诈行为。另外,仅根据历史数据构建模型对充值行为进行识别,由于历史数据并不准确,因此不能准确识别用户行为,不能及时避免欺诈风险。
而本公开示例性实施例中提供的行为识别方法,一方面,通过历史行为对应的订单数据得到针对用户的社区特征以及综合特征,增加了特征描述维度,避免了单一特征造成的误差,提高了准确率;一方面,通过社区特征、综合特征、初始置信度以及消息更新规则确定图模型的消息值进而根据消息值构建图模型,能够得到准确的消息值并且能够构建准确的图模型;另一方面,通过得到的图模型计算当前行为对应的订单数据的置信度,从而可以根据置信度快速准确地进行用户行为识别,及时避免欺诈风险。
本公开还提供了一种行为识别装置。参考图6所示,该行为识别装置600可以包括:
置信度计算模块601,用于基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度;
特征提取模块602,用于通过所述订单数据得到针对所述用户的社区特征以及综合特征;
识别控制模块603,用于通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
需要说明的是,上述行为识别装置中各模块的细节已经在对应的行为识别方法中进行了详细描述,因此此处不再赘述。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图7来描述根据本发明的这种实施方式的电子设备700。图7显示的电子设备700仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图7所示,电子设备700以通用计算设备的形式表现。电子设备700的组件可以包括但不限于:上述至少一个处理单元710、上述至少一个存储单元720、连接不同系统组件(包括存储单元720和处理单元710)的总线730。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元710执行,使得所述处理单元710执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。例如,所述处理单元710可以执行如图1中所示的步骤。
存储单元720可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)7201和/或高速缓存存储单元7202,还可以进一步包括只读存储单元(ROM)7203。
存储单元720还可以包括具有一组(至少一个)程序模块7205的程序/实用工具7204,这样的程序模块7205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线730可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
显示单元740可以为具有显示功能的显示器,以通过该显示器展示由处理单元710执行本示例性实施例中的方法而得到的处理结果。显示器包括但不限于液晶显示器或者是其它显示器。
电子设备700也可以与一个或多个外部设备900(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备700交互的设备通信,和/或与使得该电 子设备700能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口750进行。并且,电子设备700还可以通过网络适配器760与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器760通过总线730与电子设备700的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备700使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
在本公开的示例性实施例中,还提供了一种电子设备,该电子设备包括:处理器;以及存储器,用于存储该处理器的可执行指令;
其中,该处理器配置为:
基于用户的历史行为对应的订单数据构建图模型,并确定该图模型中包含的每个节点的初始置信度;
通过该订单数据得到针对该用户的社区特征以及综合特征;
通过该社区特征、该综合特征、该初始置信度以及消息更新规则确定该图模型的消息值,根据该消息值确定该图模型,并通过该图模型计算当前行为对应的订单数据的置信度,以根据该置信度确定该当前行为的识别结果。
在本公开的示例性实施例中,该处理器还被配置为:
获取该历史行为对应的订单数据,该订单数据中包括与该用户关联的用户号码以及充值号码;
基于该用户号码与该充值号码之间的关联关系,构建该图模型。
在本公开的示例性实施例中,该处理器还被配置为:
按照该用户号码进行分组,构建该用户号码与该充值号码的共现矩阵,并根据该共现矩阵构建该图模型;或
将该用户号码以及该充值号码联合作为索引构建共现数组,并根据该共现数组构建该图模型。
在本公开的示例性实施例中,该处理器还被配置为:
根据多个用户的置信分数据以及历史参考号码,构建该多个用户的置信数据集;
基于该置信数据集对一置信度预测函数进行训练,得到训练好的置信度预测函数;
根据训练好的置信度预测函数对每个用户进行预测,确定每个用户对应的每个节点的该初始置信度。
在本公开的示例性实施例中,该处理器还被配置为:
基于该订单数据构建该图模型;
利用渗透算法对该图模型进行处理,得到团集合;
根据该团集合与历史参考号码对用户的置信数据集进行标注,以得到该社区特征。
在本公开的示例性实施例中,该处理器还被配置为:
获取针对该用户的至少一个维度特征的数据,并对该至少一个维度特征的数据进行聚类,得到该综合特征。
在本公开的示例性实施例中,该至少一个维度特征包括年龄习惯特征、实时消费特征、地理特征以及消费业务特征中的一种或多种,该综合特征包括风险等级特征。
在本公开的示例性实施例中,该处理器还被配置为:
根据该社区特征和该综合特征生成权重训练数据;
分别对该权重训练数据以及该置信数据集进行训练,得到多个权重系数;
将该多个权重系数以及该综合特征输入该消息更新规则,得到该图模型的初始消息值。
在本公开的示例性实施例中,该处理器还被配置为:
更新该共现矩阵,并根据该共现矩阵中各节点的共现频次对该初始消息值进行更新,得到该图模型的目标消息值。
在本公开的示例性实施例中,该处理器还被配置为:
通过该初始消息值计算该充值号码满足预设条件的置信度;
计算所有满足预设条件的充值号码的置信度损失;
通过将该置信度损失最小化对该初始消息值进行优化,得到该目标消息值。
在本公开的示例性实施例中,该处理器还被配置为:
针对该共现矩阵以及该目标消息值,生成该图模型。
在本公开的示例性实施例中,该处理器还被配置为:
对该置信度大于预设值的订单数据进行预警。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施方式的步骤。
参考图8所示,描述了根据本发明的实施方式的用于实现上述方法的程序产品800,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执 行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现:
基于用户的历史行为对应的订单数据构建图模型,并确定该图模型中包含的每个节点的初始置信度;
通过该订单数据得到针对该用户的社区特征以及综合特征;
通过该社区特征、该综合特征、该初始置信度以及消息更新规则确定该图模型的消息值,根据该消息值确定该图模型,并通过该图模型计算当前行为对应的订单数据的置信度,以根据该置信度确定该当前行为的识别结果。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
获取该历史行为对应的订单数据,该订单数据中包括与该用户关联的用户号码以及充值号码;
基于该用户号码与该充值号码之间的关联关系,构建该图模型。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
按照该用户号码进行分组,构建该用户号码与该充值号码的共现矩阵,并根据该共现矩阵构建该图模型;或
将该用户号码以及该充值号码联合作为索引构建共现数组,并根据该共现数组构建该图模型。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
根据多个用户的置信分数据以及历史参考号码,构建该多个用户的置信数据集;
基于该置信数据集对一置信度预测函数进行训练,得到训练好的置信度预测函数;
根据训练好的置信度预测函数对每个用户进行预测,确定每个用户对应的每个节点的该初始置信度。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
基于该订单数据构建该图模型;
利用渗透算法对该图模型进行处理,得到团集合;
根据该团集合与历史参考号码对用户的置信数据集进行标注,以得到该社区特征。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
获取针对该用户的至少一个维度特征的数据,并对该至少一个维度特征的数据进行聚类,得到该综合特征。
在本公开的示例性实施例中,该至少一个维度特征包括年龄习惯特征、实时消费特征、地理特征以及消费业务特征中的一种或多种,该综合特征包括风险等级特征。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
根据该社区特征和该综合特征生成权重训练数据;
分别对该权重训练数据以及该置信数据集进行训练,得到多个权重系数;
将该多个权重系数以及该综合特征输入该消息更新规则,得到该图模型的初始消息值。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
更新该共现矩阵,并根据该共现矩阵中各节点的共现频次对该初始消息值进行更新,得到该图模型的目标消息值。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
通过该初始消息值计算该充值号码满足预设条件的置信度;
计算所有满足预设条件的充值号码的置信度损失;
通过将该置信度损失最小化对该初始消息值进行优化,得到该目标消息值。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
针对该共现矩阵以及该目标消息值,生成该图模型。
在本公开的示例性实施例中,该计算机程序被处理器执行时实现:
对该置信度大于预设值的订单数据进行预警。
此外,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。

Claims (26)

  1. 一种行为识别方法,包括:
    基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度;
    通过所述订单数据得到针对所述用户的社区特征以及综合特征;
    通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
  2. 根据权利要求1所述的行为识别方法,基于用户的历史行为对应的订单数据构建图模型包括:
    获取所述历史行为对应的订单数据,所述订单数据中包括与所述用户关联的用户号码以及充值号码;
    基于所述用户号码与所述充值号码之间的关联关系,构建所述图模型。
  3. 根据权利要求2所述的行为识别方法,基于所述用户号码与所述充值号码之间的关联关系,构建所述图模型包括:
    按照所述用户号码进行分组,构建所述用户号码与所述充值号码的共现矩阵,并根据所述共现矩阵构建所述图模型;或
    将所述用户号码以及所述充值号码联合作为索引构建共现数组,并根据所述共现数组构建所述图模型。
  4. 根据权利要求1所述的行为识别方法,确定所述图模型中包含的每个节点的初始置信度包括:
    根据多个用户的置信分数据以及历史参考号码,构建所述多个用户的置信数据集;
    基于所述置信数据集对一置信度预测函数进行训练,得到训练好的置信度预测函数;
    根据训练好的置信度预测函数对每个用户进行预测,确定每个用户对应的每个节点的所述初始置信度。
  5. 根据权利要求4所述的行为识别方法,通过所述订单数据得到针对所述用户的社区特征以及综合特征包括:
    利用渗透算法对所述图模型进行处理,得到团集合;
    根据所述团集合与历史参考号码对用户的置信数据集进行标注,以得到所述社区特征。
  6. 根据权利要求1所述的行为识别方法,通过所述订单数据得到针对所述用户的社区特征以及综合特征包括:
    获取针对所述用户的至少一个维度特征的数据,并对所述至少一个维度特征的数据进行聚类,得到所述综合特征。
  7. 根据权利要求6所述的行为识别方法,所述至少一个维度特征包括年龄习惯特征、实时消费特征、地理特征以及消费业务特征中的一种或多种,所述综合特征包括风险等级特征。
  8. 根据权利要求3所述的行为识别方法,通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值包括:
    根据所述社区特征和所述综合特征生成权重训练数据;
    分别对所述权重训练数据以及置信数据集进行训练,得到多个权重系数;
    将所述多个权重系数以及所述综合特征输入所述消息更新规则,得到所述图模型的初始消息值。
  9. 根据权利要求8所述的行为识别方法,所述方法还包括:
    更新所述共现矩阵,并根据所述共现矩阵中各节点的共现频次对所述初始消息值进行更新,得到所述图模型的目标消息值。
  10. 根据权利要求9所述的行为识别方法,根据所述共现矩阵中各节点的共现频次对所述初始消息值进行更新,得到所述图模型的目标消息值包括:
    通过所述初始消息值计算所述充值号码满足预设条件的置信度;
    计算所有满足预设条件的充值号码的置信度损失;
    通过将所述置信度损失最小化对所述初始消息值进行优化,得到所述目标消息值。
  11. 根据权利要求9所述的行为识别方法,根据所述消息值确定所述图模型包括:
    针对所述共现矩阵以及所述目标消息值,生成所述图模型。
  12. 根据权利要求1所述的行为识别方法,所述方法还包括:
    对所述置信度大于预设值的订单数据进行预警。
  13. 一种行为识别装置,其特征在于,包括:
    置信度计算模块,用于基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度;
    特征提取模块,用于通过所述订单数据得到针对所述用户的社区特征以及综合特征;
    识别控制模块,用于通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
  14. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为:
    基于用户的历史行为对应的订单数据构建图模型,并确定所述图模型中包含的每个节点的初始置信度;
    通过所述订单数据得到针对所述用户的社区特征以及综合特征;
    通过所述社区特征、所述综合特征、所述初始置信度以及消息更新规则确定所述图模型的消息值,根据所述消息值确定所述图模型,并通过所述图模型计算当前行为对应的订单数据的置信度,以根据所述置信度确定所述当前行为的识别结果。
  15. 根据权利要求14所述的电子设备,所述处理器还被配置为:
    获取所述历史行为对应的订单数据,所述订单数据中包括与所述用户关联的用户号码以及充值号码;
    基于所述用户号码与所述充值号码之间的关联关系,构建所述图模型。
  16. 根据权利要求15所述的电子设备,所述处理器还被配置为:
    按照所述用户号码进行分组,构建所述用户号码与所述充值号码的共现矩阵,并根据所 述共现矩阵构建所述图模型;或
    将所述用户号码以及所述充值号码联合作为索引构建共现数组,并根据所述共现数组构建所述图模型。
  17. 根据权利要求14所述的电子设备,所述处理器还被配置为:
    根据多个用户的置信分数据以及历史参考号码,构建所述多个用户的置信数据集;
    基于所述置信数据集对一置信度预测函数进行训练,得到训练好的置信度预测函数;
    根据训练好的置信度预测函数对每个用户进行预测,确定每个用户对应的每个节点的所述初始置信度。
  18. 根据权利要求17所述的电子设备,所述处理器还被配置为:
    基于所述订单数据构建所述图模型;
    利用渗透算法对所述图模型进行处理,得到团集合;
    根据所述团集合与历史参考号码对用户的置信数据集进行标注,以得到所述社区特征。
  19. 根据权利要求14所述的电子设备,所述处理器还被配置为:
    获取针对所述用户的至少一个维度特征的数据,并对所述至少一个维度特征的数据进行聚类,得到所述综合特征。
  20. 根据权利要求19所述的电子设备,所述至少一个维度特征包括年龄习惯特征、实时消费特征、地理特征以及消费业务特征中的一种或多种,所述综合特征包括风险等级特征。
  21. 根据权利要求16所述的电子设备,所述处理器还被配置为:
    根据所述社区特征和所述综合特征生成权重训练数据;
    分别对所述权重训练数据以及置信数据集进行训练,得到多个权重系数;
    将所述多个权重系数以及所述综合特征输入所述消息更新规则,得到所述图模型的初始消息值。
  22. 根据权利要求21所述的电子设备,所述处理器还被配置为:
    更新所述共现矩阵,并根据所述共现矩阵中各节点的共现频次对所述初始消息值进行更新,得到所述图模型的目标消息值。
  23. 根据权利要求22所述的电子设备,所述处理器还被配置为:
    通过所述初始消息值计算所述充值号码满足预设条件的置信度;
    计算所有满足预设条件的充值号码的置信度损失;
    通过将所述置信度损失最小化对所述初始消息值进行优化,得到所述目标消息值。
  24. 根据权利要求22所述的电子设备,所述处理器还被配置为:
    针对所述共现矩阵以及所述目标消息值,生成所述图模型。
  25. 根据权利要求14所述的电子设备,所述处理器还被配置为:
    对所述置信度大于预设值的订单数据进行预警。
  26. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-12任意一项所述的行为识别方法。
PCT/CN2020/071002 2019-02-18 2020-01-08 行为识别 WO2020168851A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910120241.2A CN109886699A (zh) 2019-02-18 2019-02-18 行为识别方法及装置、电子设备、存储介质
CN201910120241.2 2019-02-18

Publications (1)

Publication Number Publication Date
WO2020168851A1 true WO2020168851A1 (zh) 2020-08-27

Family

ID=66928333

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071002 WO2020168851A1 (zh) 2019-02-18 2020-01-08 行为识别

Country Status (2)

Country Link
CN (1) CN109886699A (zh)
WO (1) WO2020168851A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886699A (zh) * 2019-02-18 2019-06-14 北京三快在线科技有限公司 行为识别方法及装置、电子设备、存储介质
CN110909765B (zh) * 2019-10-24 2023-06-20 中电海康集团有限公司 一种面向轨迹大数据的行人行为模式分类方法
CN110992169B (zh) * 2019-11-29 2023-06-09 深圳乐信软件技术有限公司 一种风险评估方法、装置、服务器及存储介质
CN111311408B (zh) * 2020-02-10 2021-08-03 支付宝(杭州)信息技术有限公司 电子交易属性识别方法及装置
CN111325350B (zh) * 2020-02-19 2023-09-29 第四范式(北京)技术有限公司 可疑组织发现系统和方法
CN111861698B (zh) * 2020-07-02 2021-07-16 北京睿知图远科技有限公司 一种基于贷款多头数据的贷前审批预警方法及系统
CN114418593A (zh) * 2021-12-23 2022-04-29 中国电信股份有限公司 非法行为识别方法、装置、电子设备及可读介质
CN116226527B (zh) * 2023-03-03 2024-06-07 中浙信科技咨询有限公司 通过居民大数据实现行为预测的数字化社区治理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161526A1 (en) * 2008-12-19 2010-06-24 The Mitre Corporation Ranking With Learned Rules
CN107451703A (zh) * 2017-08-31 2017-12-08 杭州师范大学 一种基于因子图模型的社交网络多任务预测方法
CN108322473A (zh) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 用户行为分析方法与装置
CN108520343A (zh) * 2018-03-26 2018-09-11 平安科技(深圳)有限公司 风险模型训练方法、风险识别方法、装置、设备及介质
CN109886699A (zh) * 2019-02-18 2019-06-14 北京三快在线科技有限公司 行为识别方法及装置、电子设备、存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161526A1 (en) * 2008-12-19 2010-06-24 The Mitre Corporation Ranking With Learned Rules
CN107451703A (zh) * 2017-08-31 2017-12-08 杭州师范大学 一种基于因子图模型的社交网络多任务预测方法
CN108322473A (zh) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 用户行为分析方法与装置
CN108520343A (zh) * 2018-03-26 2018-09-11 平安科技(深圳)有限公司 风险模型训练方法、风险识别方法、装置、设备及介质
CN109886699A (zh) * 2019-02-18 2019-06-14 北京三快在线科技有限公司 行为识别方法及装置、电子设备、存储介质

Also Published As

Publication number Publication date
CN109886699A (zh) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020168851A1 (zh) 行为识别
WO2021012783A1 (zh) 基于大数据的保单核保模型训练方法和核保风险评估方法
CN112084383A (zh) 基于知识图谱的信息推荐方法、装置、设备及存储介质
CN112270545A (zh) 基于迁移样本筛选的金融风险预测方法、装置和电子设备
CN107220217A (zh) 基于逻辑回归的特征系数训练方法和装置
WO2022083093A1 (zh) 图谱中的概率计算方法、装置、计算机设备及存储介质
JP2007502484A (ja) 不活性顧客を予測するための方法およびシステム
CN112508580A (zh) 基于拒绝推断方法的模型构建方法、装置和电子设备
CN114462532A (zh) 模型训练方法、预测交易风险的方法、装置、设备及介质
WO2023284516A1 (zh) 基于知识图谱的信息推荐方法、装置、设备、介质及产品
CN111210109A (zh) 基于关联用户预测用户风险的方法、装置和电子设备
CN115840738A (zh) 一种数据迁移方法、装置、电子设备及存储介质
WO2021189949A1 (zh) 信息推荐方法、装置、电子设备及介质
CN112419025A (zh) 用户数据处理方法和装置、存储介质、电子设备
CN111209930B (zh) 一种生成授信策略的方法、装置和电子设备
WO2023185125A1 (zh) 产品资源的数据处理方法及装置、电子设备、存储介质
US20240112065A1 (en) Meta-learning operation research optimization
Davami et al. Improving the performance of mobile phone crowdsourcing applications
CN113112311B (zh) 训练因果推断模型的方法、信息提示方法以装置
US20160042277A1 (en) Social action and social tie prediction
CN114298825A (zh) 还款积极度评估方法及装置
CN115795345A (zh) 信息处理方法、装置、设备及存储介质
CN110413632A (zh) 管理状态的方法、装置、计算机可读介质及电子设备
WO2021115269A1 (zh) 用户集群的预测方法、装置、计算机设备和存储介质
CN114385121A (zh) 一种基于业务分层的软件设计建模方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20759504

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20759504

Country of ref document: EP

Kind code of ref document: A1