WO2020052168A1 - 反欺诈模型的生成及应用方法、装置、设备及存储介质 - Google Patents

反欺诈模型的生成及应用方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020052168A1
WO2020052168A1 PCT/CN2018/124819 CN2018124819W WO2020052168A1 WO 2020052168 A1 WO2020052168 A1 WO 2020052168A1 CN 2018124819 W CN2018124819 W CN 2018124819W WO 2020052168 A1 WO2020052168 A1 WO 2020052168A1
Authority
WO
WIPO (PCT)
Prior art keywords
social network
training
network graph
objective function
data set
Prior art date
Application number
PCT/CN2018/124819
Other languages
English (en)
French (fr)
Inventor
侯明远
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020052168A1 publication Critical patent/WO2020052168A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular, to a method, a device, a device, and a storage medium for generating and applying an anti-fraud model.
  • the traditional method of social network analysis is to count the frequency of the user's behavior characteristics.
  • the frequency of a certain behavior characteristic of the user is higher than the normal frequency range, potential fraud groups are discovered. For example, if an insured person makes multiple risks over a period of time, and the number of risks is significantly higher than the normal level, the insured person may be involved in fraud and fraud.
  • the risk frequency is significantly higher than the normal risk level. Therefore, there may be fraud and fraud in the ID card number.
  • the traditional method of social network analysis has the following two shortcomings.
  • the traditional method of social network analysis requires manual definition of user behavior characteristics. That is, when a user's behavior characteristics are different from the normal range, the user may be suspected of fraud. Behavior, and further define these behavior characteristics as the characteristic variables of suspected fraud.
  • These characteristic characteristics of suspected fraud are generally provided by business experts or modelers based on their own work experience summary. For example, counting the number of times that any insured person in the case database has set a risk within a preset period of time and setting the number of normal times of risk in the preset time period, or counting the number of times that the same ID card appears in different cases and setting the same ID card number The normal number of times.
  • a social network is a point network topology structure composed of individuals or communities.
  • a local structure is made up of a point and which points are connected together; a global structure is made up of the local structures of all the different points.
  • the traditional method of social network analysis simply counts the frequency of user behavior characteristics, and does not use the method of graph theory. Therefore, it does not take into account the local structure and global structure of the social network, and cannot dig out the valuable information hidden in the social network.
  • the embodiments of the present application provide a method, device, device and computer-readable storage medium for generating and applying an anti-fraud model, aiming to solve the traditional method of analyzing through social networks to identify fraud in an insurance claim case, which depends on manual definition.
  • the problem of user behavior characteristics, and the local structure and global structure of the social network structure diagram can be effectively taken into account, thereby digging out more potentially valuable information.
  • an embodiment of the present application provides a method for generating an anti-fraud model, which includes: obtaining a historical data set from an insurance claims database, where the historical data set is within a preset time range in the insurance claims database A collection of all case data, the historical data set includes a training data set and a test data set; generating a target social network graph for training according to the training data set; obtaining an objective function of the SDNE algorithm according to the target social network graph for training, Use the objective function as a first objective function; construct a second objective function according to the first objective function and preset constraints; obtain an optimal hyperparameter of the second objective function, and use the optimal hyperparameter as Adding a known amount of the second objective function to the second objective function to generate an optimal objective function; and training the optimal objective function using the training target social network graph to generate the anti-fraud model .
  • an embodiment of the present application further provides an application method of an anti-fraud model, which includes: obtaining a data set to be detected from an insurance claims database to generate a target social network graph for detection, where the data set to be detected is insurance Any or more data sets to be detected in the claims database; and using the anti-fraud model described in the first aspect, mapping nodes in the detection target social network graph to a high-dimensional vector space for users to The mapping situation of the high-dimensional vector space analyzes whether the node has fraudulent behavior, wherein any node of the target social network graph for detection has a unique vector corresponding to it in the high-dimensional vector space and the stronger the degree of association The closer the corresponding vector is in the high-dimensional vector space.
  • an embodiment of the present application further provides an apparatus including a unit for performing the methods of the first and second aspects.
  • an embodiment of the present application further provides a computer device.
  • the computer device includes a memory and a processor.
  • the memory stores a computer program, and the processor implements the first and the foregoing when the computer program is executed.
  • the second aspect of the method is described.
  • an embodiment of the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions may be executed by a processor.
  • FIG. 1 is a schematic flowchart of a method for generating an anti-fraud model according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-flow of a method for generating an anti-fraud model according to an embodiment of the present application
  • FIG. 3 is a schematic flowchart of an application method of an anti-fraud model according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of an application method of an anti-fraud model according to another embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a device according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of an optimal function generating unit of a device according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of another device according to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of another device according to another embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for generating an anti-fraud model according to an embodiment of the present application.
  • the method of generating the anti-fraud model is applied to the scenario of anti-fraud in insurance claims.
  • a structured deep network embedding (SDNE) algorithm is introduced during the analysis of the insurance claim social network structure diagram.
  • the SDNE algorithm has multiple non-linear function layers, so that it can capture
  • the highly non-linear network structure can utilize the local structure and global structure in the social network structure diagram, and improve the objective function of the SDNE algorithm accordingly to generate an anti-fraud model according to the characteristics of the insurance claims social network structure diagram.
  • the method for generating the anti-fraud model may include steps S110 to S160.
  • the historical data set is a data set of all cases in the insurance claims database within a preset time
  • the historical data set is randomly divided into a training data set and a test data set according to a preset ratio, that is, All cases within a preset time range in the insurance claims database are randomly divided according to a preset ratio.
  • the historical data set is randomly divided into the training data set and the test data set according to a ratio of seven to three, wherein the training data set accounts for 70% of the historical data set. Number of cases, the number of cases in which the test data set accounts for 30% of the historical data set.
  • the preset ratio can be customized according to user requirements.
  • the step of generating a training target social network graph according to the training data set specifically includes the following steps A and B:
  • a social network structure diagram is composed of nodes and edges. Nodes represent objects. Edges represent the connection between two objects.
  • a graph is considered as an abstract network composed of "nodes.” Each node in the node can be connected to each other through “edges", indicating that there is an association between the two nodes.
  • the training data set is a case-related data set that accounts for 70% of all claims cases in a preset time period in the insurance claims database, and the claims cases may be automobile insurance claims cases, major illness claims cases, etc. Among them, take car insurance claims as an example.
  • the relevant data of a car insurance claim mainly includes the case number, the person involved, the vehicle involved, the relevant documents and other data.
  • the persons involved mainly include the insured, repair shop personnel, insurance company personnel, and related traffic police.
  • all relevant data of the case can be constructed into a social network structure diagram of a car insurance claim case, and the training data set includes multiple claims cases, and is constructed according to the training data set
  • the original social network map for training, the social network structure of the auto insurance claim case is the local structure of the original social network map for training, the original social network map for training is the global structure, and the original social network map for training is mainly composed of The relevant data of all claims cases in the training data set are constructed and described.
  • Each edge in the E set represents a pair of ordered nodes (s, t), where s is the source node and t is the target node.
  • the nodes in the original social network graph are heterogeneous nodes, then any pair of nodes may have a pointing relationship or affiliation, for example, the insured vehicle belongs to the insured.
  • the original training with a social network graph node is heterogeneous, and as a directed graph.
  • the original social network graph for training is made by the person involved, the vehicle involved, and related Certificates and other heterogeneous nodes are constructed, and because the nodes are heterogeneous, the original social network graph for training is a directed graph, so the original social network graph for training needs to be processed to generate only homogeneous data.
  • a training target social network graph constructed by nodes.
  • the step B specifically includes: performing node screening on the training original social network graph to obtain homogeneous nodes, filtering out heterogeneous nodes; obtaining key nodes related to the homogeneous nodes, and adding the key nodes to The homogeneous node to generate a target social network graph for training.
  • the nodes related to the case number and other nodes can be filtered out, other irrelevant nodes can be cleaned out, and the related nodes involved in the case can be connected through the case number, and the key related to the homogeneous node can be obtained.
  • the key node is other personal data that needs special attention, such as data information of members of a known criminal gang, and then the key node is added to the homogeneous node to generate a target social network for training Since the nodes in the generated training target social network graph are homogeneous, there is no directional relationship or subordinate relationship between a pair of nodes connected by edges.
  • the obtained objective function of the SDNE algorithm is determined by the node attributes of the training target social network graph, that is, the training target social network graph that includes only nodes involved in the case (or other information that requires special attention).
  • the objective function is different from the objective function of the training target social network graph that includes only the nodes involved in the case. Therefore, the objective function of the SDNE algorithm is obtained according to the training target social network graph, and this objective function is used as the first target. Function, that is, the obtained first objective function will change due to the change of the node attributes of the training target social network graph. Because the SDNE algorithm has multiple layers of non-linear functions, it can capture highly non-linear network structures, and can make use of the local and global structures in the social network structure diagram.
  • the SDNE algorithm includes unsupervised and supervised algorithms.
  • the unsupervised algorithm includes a deep belief network algorithm, a self-encoding neural network algorithm, and a deep Bolsman neural network algorithm.
  • the local structure in the social network structure map may be captured by a semi-supervised algorithm.
  • the global structure is captured using an unsupervised algorithm.
  • the step of constructing a second objective function according to the first objective function and a preset constraint condition specifically includes the following steps C and D:
  • Step C Obtain a preset constraint condition.
  • Step D Add the preset constraint condition as a known quantity of the first objective function to the first objective function to construct a second objective function.
  • the preset constraint condition can be preset by a user according to the characteristics of the historical data set in the insurance claims database, for example, a preset number of normal insurance occurrences of the insured within one year or a number of normal insurance occurrences of the insured vehicle within one year. You can also analyze and summarize the past cases involving fraud to obtain the characteristic information of the cases involving fraud. You can add the characteristics of the historical data set and the characteristics of the past fraud cases to the objective function as preset constraints. In order to obtain a new objective function, the preset constraint condition is added to the objective function as a known quantity of the objective function. Therefore, the new objective function is constructed by combining the SDNE algorithm with the characteristics of the historical data set.
  • the characteristic information of the members of a known insurance fraud group can be added as a preset condition
  • the objective function of the SDNE algorithm is the first objective function. Groups as member information is added to the pre-set constraints to the objective function such that these seemingly unrelated cases associate different members of such groups belonging to the same in a potentially fraudulent social network correlate the structure of FIG. Therefore, the obtained preset constraint condition can be added to the first objective function to construct a second objective function, and the preset constraint condition can be customized by a user according to his specific business application scenario, and the customized setting Including adding new constraints, deleting constraints, and modifying the constraints based on the original preset constraints.
  • the hyperparameters are unknown quantities that cannot be solved in the objective function.
  • the optimal set of hyperparameters can only be obtained through data set test verification, and then the optimal hyperparameters are added to the target function as known quantities of the objective function.
  • the number of hyperparameters of the objective function may be hundreds or tens, and each hyperparameter may include multiple values. Therefore, in the embodiment, it is necessary to involve multiple groups of hyperparameters in the The test uses the target social network graph to perform multiple cross-tests to find the optimal hyperparameter of the second objective function, so that the optimal hyperparameter is substituted into the second objective function as a known quantity of the second objective function to Generate the optimal objective function.
  • the target social network graph for training is obtained, the optimal target function is trained using the target social network graph for training, the unknown amount of the optimal target function is solved, and the optimal target function The unknown is substituted into the optimal objective function to generate the anti-fraud model, wherein the training target social network graph is a training sample for training the anti-fraud model, and the training target social network graph node
  • the information is input into the optimal objective function in the form of adjacency matrix data to train the anti-fraud model.
  • a historical data set in the insurance claims database is obtained, the historical data set includes a training data set and a test data set, and a training target social network graph is generated according to the training data set, and then according to the training target
  • the node attribute of the social network graph obtains the objective function corresponding to the SDNE algorithm, and uses this objective function as the first objective function. Since the cases involving fraud in the previous insurance claims database can be analyzed and summarized to obtain the fraud cases, Feature information, which is added to the objective function of the SDNE algorithm as a preset constraint condition to generate a new objective function.
  • a second objective function is constructed according to the first objective function and the preset constraint condition, and then the The optimal hyperparameter of the second objective function is used to generate an optimal objective function.
  • the target social network graph for training is used as a training sample for training an anti-fraud model.
  • the optimal target function is trained using the target social network graph for training to The anti-fraud model is obtained, and the accuracy of the model is high, and The credibility is high, and the use of this anti-fraud model can realize anti-gang fraud, which can solve the problem that the analysis of social network graphs in the prior art to identify fraud in insurance claims requires the manual definition of user behavior characteristics. Utilizing information such as the local structure and global structure of the social network graph, and then digging out more hidden information and more potential behavior patterns, can improve the accuracy of identifying fraudulent behaviors and recover losses for the company.
  • FIG. 2 is a schematic diagram of a sub-flow of a method for generating an anti-fraud model according to an embodiment of the present application.
  • step S150 the second objective function is obtained.
  • the steps of optimal hyperparameters include the following steps S151-S153.
  • a hyperparameter of the second objective function and a value included in the hyperparameter are obtained.
  • the test data set is a case-related data set that accounts for 30% of all claims cases in a preset time period in the insurance claims database.
  • the step S152 specifically includes the following steps S1521 and S1522, wherein step S1521 constructs a test original social network graph according to the test data set and step S1522 performs data processing on the test original social network graph to generate a test data.
  • Target network diagram Since the method for constructing the original social network graph for testing and generating the target social network graph for testing are similar to the method for constructing the original social network graph for training and generating the target social network graph for training in step S120, details are not described herein again.
  • S153 Cross-validate the second objective function according to the hyperparameters and the test target social network graph to obtain an optimal hyperparameter.
  • the hyper-participants are combined with the values included in the hyperparameters and then cross-validated with the target social network graph for testing in order.
  • the second objective function to obtain an optimal hyperparameter. For example, if the hyperparameters of the second objective function include two [a, b], and each hyperparameter includes two values [a1, a2] and [b1, b2], there are 4 sets of hyperparameters [a1, b1].
  • the optimal super parameter can further be added to the second objective function as a known quantity to generate an optimal objective function.
  • the target social network graph for testing is a test sample that is cross-validated with the second objective function to obtain optimal hyperparameters, and the node information in the target social network graph for testing is input to the data in the form of an adjacency matrix. To obtain an optimal objective function in the second objective function.
  • the hyperparameter is an unknown amount that the objective function of the algorithm cannot be solved, and there are multiple values of the hyperparameter and the hyperparameter, it is necessary to use the test data set, that is, the target social network graph for testing and Multiple sets of hyperparameters are used to cross-validate the objective function to obtain the optimal hyperparameter, and the optimal hyperparameter can be substituted into the objective function as a known quantity to obtain the optimal objective function of the algorithm.
  • FIG. 3 is a schematic flowchart of an application method of an anti-fraud model provided by an embodiment of the present application.
  • the application method of this anti-fraud model is applied in the scenario of insurance claims anti-gang fraud. As shown, the method may include steps S210-S220.
  • step S210 Obtain a data set to be detected from the insurance claims database to generate a target social network graph for detection, where the data set to be detected is any one or more data sets to be detected in the insurance claims database.
  • the generation of the target social network graph for detection in step S210 is similar to the generation of the target social network graph for training in step S120 in the above embodiment.
  • the main difference is that the data set to be detected obtained from the insurance claims database can be insurance claims.
  • the data set of any case or multiple cases in the database that is, the data of all cases in the insurance claims database can be used as the data set to be tested, and the target social network diagram for detection can be constructed based on the obtained data set to be tested Then, the anti-fraud model generated based on the SDNE algorithm is used to test the target social network map for detection, so as to identify all cases where fraud may exist in the insurance claims database to achieve anti-gang fraud.
  • the target social network map for detection is The construction process is similar to the construction process of the training target social network graph in step S120, which is not repeated here.
  • any node in the detection target social network graph can be mapped one by one into a high-dimensional vector space, and any one of the nodes is unique in the high-dimensional vector space.
  • the corresponding vector corresponds to it, in which the stronger the degree of correlation, the closer the corresponding vector in the high-dimensional vector space is, and the weaker the degree of correlation, the farther away the corresponding vector is in the high-dimensional vector space, the stronger the degree of correlation.
  • Nodes explain the strong mutual relationship in insurance claims.
  • the nodes in the target social network graph for detection are mapped one-by-one into a high-dimensional vector space through an anti-fraud model. In this vector space, some highly related nodes are close to each other in the vector space.
  • a node If a node is known to be a fraudulent person, the node with a higher degree of association with the node has a higher probability of gang fraud, that is, a person or a vehicle concentrated near the node has a higher probability of gang fraud. Therefore, users can analyze whether some nodes are fraudulent by observing the mapping of some nodes in the high-dimensional vector space. For example, in the vector space, users can focus on the vicinity of the nodes of known fraudsters, and observe and analyze them. Strong nodes. This can solve the problem that the identification of fraud in insurance claims in the traditional method of social network analysis requires the manual definition of user behavior characteristics, greatly improving the accuracy of identifying fraud, and recovering losses for insurance companies.
  • the target social network graph for detection is obtained by acquiring the to-be-detected data set from the insurance claims database, and then the anti-fraud model generated in steps S110-S160 is used to convert the data in the target social network graph for detection.
  • a node is mapped to a high-dimensional vector space for a user to analyze whether the node has fraudulent behavior according to the mapping situation of the node in the high-dimensional vector space, wherein any node of the detection target social network graph is in the high-dimensional vector space.
  • the nodes with high correlation are close to each other, and the nodes with low correlation are far away from each other.
  • the embodiments of the present application can solve the problem that the analysis of social network diagrams in the prior art to identify fraud in insurance claims requires the manual definition of user behavior characteristics. Information such as the local structure and global structure of the social network diagram can be used. Instead of simply counting the frequency of behavioral features on the user's surface to identify anti-fraud behaviors, the application of anti-fraud models can improve the accuracy of identifying fraud behaviors and recover losses for the company.
  • FIG. 4 is a schematic flowchart of an application method of an anti-fraud model according to another embodiment of the present application.
  • the application method of this anti-fraud model is applied in the scenario of insurance claims anti-gang fraud.
  • the method may include steps S310-S330.
  • Steps S310-S320 are similar to steps S210-S220 in the foregoing embodiment, and details are not described herein again.
  • the following describes step S330 added in this embodiment in detail.
  • a vector operation algorithm calculate a vector in the high-dimensional vector space to obtain an association degree between any vectors in the high-dimensional vector space.
  • the vector operation algorithm includes a regression algorithm, a classification algorithm, and a clustering algorithm. .
  • a vector operation algorithm can be used to calculate between any vectors in the high-dimensional vector space, so as to obtain the degree of correlation between any vectors in the high-dimensional vector space, so as to dig out more potential information and behavior.
  • Mode can also establish clear patterns and quantitative indicators, which can accurately characterize the behavior of users.
  • the clear pattern includes the characteristic variables of the fraud.
  • the characteristic variables include not only the observable characteristic variables but also the unobservable characteristic variables, and the influence of these characteristic variables can be quantified.
  • a vector operation algorithm is used to determine whether the influence of these feature variables on behavior is linear or non-linear, and a complex vector operation is performed on any vector in a high-dimensional vector space to quantify the impact of these feature variables.
  • the target social network graph for detection is obtained by acquiring the to-be-detected data set from the insurance claims database, and then the anti-fraud model generated in steps S110-S160 is used to convert the data in the target social network graph for detection.
  • the nodes are mapped to the high-dimensional vector space one by one, and then the vectors of the high-dimensional vector space are calculated according to the vector operation algorithm to obtain the degree of association between any vectors in the high-dimensional vector space.
  • the data of the data set to be detected is displayed in the form of nodes in the target social network graph for detection, complex mathematical operations cannot be performed on the nodes in the graph, but the nodes in the social network graph are mapped into a high-dimensional vector space to obtain
  • the only vector that corresponds to it can perform complex vector operations on any vector coordinates in the vector space, and then can obtain the degree of correlation between any vector, which can provide quantitative indicators for business personnel's decision-making, which is conducive to more scientific Decision-making, deepening business personnel's understanding of user behavior characteristics, can dig more hidden information and more potential behavior patterns.
  • FIG. 5 is a schematic block diagram of a device 300 according to an embodiment of the present application.
  • the device 300 corresponds to a method for generating an anti-fraud model shown in FIG. 1.
  • the apparatus 300 includes a unit for executing the above-mentioned anti-fraud model generation method.
  • the apparatus 300 may be configured in a terminal such as a desktop computer, a tablet computer, or a laptop computer.
  • the device 300 includes a data acquisition unit 301, a first mapping unit 302, a function acquisition unit 303, a function construction unit 304, an optimal function generation unit 305, and a model generation unit 306.
  • the data obtaining unit 301 is configured to obtain a historical data set from an insurance claims database, where the historical data set is a set of all case data in a preset time range in the insurance claims database, and the historical data set includes training data And test data sets.
  • the first mapping unit 302 is configured to generate a training target social network map according to the training data set. Specifically, the first mapping unit 302 includes a first mapping sub-unit 3021 and a first data processing unit 3022.
  • the first data processing unit 3022 is configured to perform data processing on the training original social network graph to generate a training target social network graph, and the data processing includes performing node filtering on the training original social network graph to obtain Homogeneous nodes, filtering out different homogeneous nodes; obtaining key nodes related to the homogeneous nodes and adding the key nodes to the homogeneous nodes to generate a training target social network graph, and the generated training targets
  • the function obtaining unit 303 is configured to obtain an objective function of the SDNE algorithm according to the training target social network graph, and use the objective function as a first objective function.
  • the function construction unit 304 is configured to construct a second objective function according to the first objective function and a preset constraint condition. Specifically, the function construction unit 304 includes a condition acquisition unit 3041 and a function construction subunit 3042.
  • the condition obtaining unit 3041 is configured to obtain a preset constraint condition.
  • the function construction subunit 3042 is configured to add the preset constraint condition as a known quantity of the first objective function to the first objective function to construct a second objective function.
  • the optimal function generating unit 305 is configured to obtain an optimal hyperparameter of the second objective function, and add the optimal hyperparameter to the second objective function as a known quantity of the second objective function. To generate the optimal objective function.
  • the model generating unit 306 is configured to train the optimal objective function using the training target social network graph to generate the anti-fraud model.
  • FIG. 6 it is a schematic block diagram of an optimal function generating unit 305 of a device according to an embodiment of the present application.
  • the optimal function generating unit 305 includes a super parameter obtaining unit 3051.
  • the super parameter obtaining unit 3051 is configured to obtain a super parameter of the second objective function. Specifically, a hyperparameter of the second objective function and a value included in the hyperparameter are obtained.
  • the second mapping unit 3052 is configured to generate a target social network graph for testing according to the test data set.
  • the test data set is a case-related data set that accounts for 30% of all claims in the insurance claims database within a preset time.
  • the second mapping unit 3052 and the The first mapping unit 302 is similar, and is further configured to construct a test original social network graph according to the test data set, and to perform data processing on the test original social network graph to generate a test target network graph.
  • the function verification unit 3053 is configured to cross-validate the second objective function according to the hyperparameters and the test target social network graph to obtain an optimal hyperparameter.
  • the function verification unit 3053 is configured to participate in the value included in the hyperparameter according to the hyperparameter of the second objective function and the value included in the hyperparameter obtained by the hyperparameter obtaining unit 3051. After the combination, the second objective function is cross-validated with the test target social network graph in order to obtain an optimal hyperparameter.
  • the number of hyperparameters of the second objective function may be hundreds or dozens.
  • FIG. 7 is a schematic block diagram of another apparatus 400 according to an embodiment of the present application.
  • the another device 400 corresponds to an application method of the anti-fraud model shown in FIG. 3.
  • the other device 400 includes a unit for executing the application method of the anti-fraud model, and the other device 400 may be configured in a terminal such as a desktop computer, a tablet computer, or a laptop computer.
  • the another device 400 includes a third mapping unit 401 and a node mapping unit 402.
  • the third mapping unit 401 is configured to obtain a data set to be detected from an insurance claims database to generate a target social network graph for detection.
  • the data set to be detected is any one or more data sets to be detected in the insurance claims database.
  • the third mapping unit 401 is similar to the first mapping unit 302 in the above embodiment.
  • the third mapping unit 401 specifically includes a third mapping sub-unit 4011 and a third data processing unit 4012.
  • the to-be-detected data set obtained by the third mapping subunit 4011 from the insurance claims database may be any case data or multiple to-be-detected data sets in the insurance claims database, that is, data of all cases in the insurance claims database Both can be used as the data set to be detected.
  • a target social network graph for detection can be constructed, and then the anti-fraud model based on the SDNE algorithm is used to test the target social network graph for detection, thereby identifying the insurance claims database.
  • the application process and corresponding functions of the third mapping unit 401 are similar to the first mapping unit 302, and will not be repeated here.
  • the node mapping unit 402 is configured to use the anti-fraud model generated in steps S110-S160 in the above embodiment to map the nodes in the target social network graph for detection to a high-dimensional vector space for users to use according to the nodes'
  • the mapping situation of the dimensional vector space is used to analyze whether the node has any fraudulent behavior.
  • any node of the target social network graph for detection has a unique vector corresponding to it and the stronger the degree of correlation is between The closer the corresponding vector in the high-dimensional vector space is.
  • FIG. 8 is a schematic block diagram of another apparatus 500 according to another embodiment of the present application.
  • another device 500 provided by another embodiment of the present application is a vector operation unit 503 added to the above embodiment, that is, the device 500 includes a fourth mapping unit 501 and a node mapping unit. 502 and vector operation unit 503.
  • the fourth mapping unit 501 is similar to the third mapping unit 402 in the foregoing embodiment.
  • the fourth mapping unit 501 specifically includes a fourth mapping subunit 5011 and a fourth data processing unit 5012.
  • the application process and corresponding functions of the fourth mapping unit 501 and the node mapping unit 502 are similar to the third mapping unit 402 and the node mapping unit 402, and will not be repeated here.
  • the vector operation unit 503 is configured to calculate a vector in the high-dimensional vector space to obtain a correlation degree between any vectors in the high-dimensional vector space according to a vector operation algorithm.
  • the vector operation algorithm includes a regression algorithm, Classification algorithms and clustering algorithms.
  • the above apparatus can be implemented in the form of a computer program, which can be run on a computer device as shown in FIG. 9.
  • the computer device 600 may be a terminal or a server.
  • the terminal may be an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, and a personal digital assistant.
  • the server can be an independent server or a server cluster consisting of multiple servers.
  • the computer device 600 includes a processor 602, a memory, and a network interface 605 connected through a system bus 601.
  • the memory may include a non-volatile storage medium 603 and an internal memory 604.
  • the non-volatile storage medium 603 can store an operating system 6031 and a computer program 6032.
  • the computer program 6032 includes program instructions.
  • the processor 602 can execute a method for generating and applying an anti-fraud model.
  • the processor 602 is used to provide computing and control capabilities to support the operation of the entire computer device 600.
  • the internal memory 604 provides an environment for running the computer program 6032 in the non-volatile storage medium 603.
  • the processor 602 can execute a method for generating and applying an anti-fraud model.
  • the network interface 605 is configured to perform network communication with other devices.
  • the structure shown in FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 600 to which the solution of the present application is applied.
  • the specific computer device 600 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 602 is configured to run a computer program 6032 stored in a memory, so as to implement the generation method and the application method of the anti-fraud model described above.
  • the processor 602 may be a central processing unit (CPU), and the processor 602 may also be another general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), Application-specific integrated circuits (Application Specific Integrated Circuits, ASICs), ready-made programmable gate arrays (Field-Programmable Gate Arrays, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor, or the processor may be any conventional processor.
  • the computer program includes program instructions, and the computer program may be stored in a storage medium, and the storage medium is a storage medium.
  • the program instructions are executed by at least one processor in the computer system to implement the process steps of the embodiment of the method.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program includes program instructions. This program instruction is executed by the processor to generate the anti-fraud model and the application method as described above.
  • the storage medium may be various storage media that can store program codes, such as a U disk, a mobile hard disk, a read-only memory (ROM), a magnetic disk, or an optical disk.
  • program codes such as a U disk, a mobile hard disk, a read-only memory (ROM), a magnetic disk, or an optical disk.

Landscapes

  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

反欺诈模型的生成方法及应用方法、装置、设备及存储介质。该生成方法包括:从保险理赔数据库中获取历史数据集,历史数据集包括训练数据集以及测试数据集(S110);根据训练数据集生成训练用目标社交网络图(S120);根据训练用目标社交网络图获取SDNE算法的目标函数,将该目标函数作为第一目标函数(S130);根据第一目标函数及预设约束条件构造第二目标函数(S140);获取第二目标函数的最优超参,并将最优超参作为已知量加入到第二目标函数中以生成最优目标函数(S150);以及利用训练用目标社交网络图训练最优目标函数以生成反欺诈模型(S160)。

Description

反欺诈模型的生成及应用方法、装置、设备及存储介质
本申请要求于2018年09月10日提交中国专利局、申请号为201811051842.4、申请名称为“反欺诈模型的生成及应用方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种反欺诈模型的生成及应用方法、装置、设备及存储介质。
背景技术
在保险理赔反欺诈应用场景中,社交网络分析的传统方法是对用户的行为特征的频数进行统计,当统计的用户的某一行为特征的频数高于正常频数范围时挖掘出潜在的欺诈团伙。例如,某一被保险人在一段时间内多次出险,且出险次数明显高于正常出险水平,因此该被保险人可能存在欺诈骗险行为。又例如,某一个身份证号码多次出险在不同的案件中,出险频率明显高于正常出险水平,因此该身份证号码有可能存在欺诈骗险行为。
然而社交网络分析的传统方法有以下两个缺陷,第一,社交网络分析的传统方法需要人工定义用户行为特征,即当用户的哪些行为特征异于正常范围时,则该用户可能涉嫌欺诈骗险行为,进而将这些行为特征定义为涉嫌欺诈的特征变量,这些涉嫌欺诈的特征变量一般是由业务专家或者建模人员根据自身工作经验总结提供。比如,统计案件数据库中任一被保险人在预设时间内的出险次数并设置该预设时间内的正常出险次数或者统计在不同案件中同一身份证号码出现的次数以及设置出现同一身份证号码的正常次数,因此,现有技术中通过对社交网络进行分析以识别欺诈行为的效果依赖于人工定义的用户行为特征。第二,社交网路分析的传统方法没有考虑到社交网络中的局部结构和全局结构,其中,社交网络是一个由个人或社区组成的点状网络拓扑结构。在图论中,局部结构是由一个点和哪些点连接到一起构成;全局结构是由所有不同的点的局部结构构成。社交网络分析的传统方法只是简单统计用户行为特征的频数,没 有采用图论的方法,因此没有考虑到社交网络的局部结构和全局结构,无法挖掘出社交网络中隐藏的有价值信息。
发明内容
本申请实施例提供了一种反欺诈模型的生成及应用方法、装置、设备及计算机可读存储介质,旨在解决通过社交网络分析的传统方法以识别保险理赔案件中欺诈行为需要依赖于人工定义用户行为特征的问题,且可以有效考虑到社交网络结构图中的局部结构以及全局结构,从而挖掘出更多潜在的有价值的信息。
第一方面,本申请实施例提供了一种反欺诈模型的生成方法,其包括:从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集;根据所述训练数据集生成训练用目标社交网络图;根据所述训练用目标社交网络图获取SDNE算法的目标函数,将该目标函数作为第一目标函数;根据所述第一目标函数以及预设约束条件构造第二目标函数;获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数;以及利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
第二方面,本申请实施例还提供了一种反欺诈模型的应用方法,其包括:从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合;以及利用如第一方面所述的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。
第三方面,本申请实施例还提供了一种装置,其包括用于执行上述第一和第二方面的方法的单元。
第四方面,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算 机程序时实现上述第一和第二方面的方法。
第五方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时可实现上述第一和第二方面的方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的反欺诈模型的生成方法的流程示意图;
图2为本申请实施例提供的反欺诈模型的生成方法的子流程示意图;
图3为本申请实施例提供的反欺诈模型的应用方法的流程示意图;
图4为本申请另一实施例提供的反欺诈模型的应用方法的流程示意图;
图5为本申请实施例提供的一种装置的示意性框图;
图6为本申请实施例提供的一种装置的最优函数生成单元的示意性框图;
图7为本申请实施例提供的另一种装置的示意性框图;
图8为本申请另一实施例提供的另一种装置的示意性框图;以及
图9为本申请实施例提供的一种计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
请参阅图1,其是本申请实施例提供的反欺诈模型的生成方法的示意性流程图。该反欺诈模型的生成方法应用于保险理赔反团伙欺诈的场景中。其中,本 申请实施例在对保险理赔社交网络结构图的分析过程中引进了结构化深度网络嵌套(Structural Deep Network Embedding,SDNE)算法,该SDNE算法具有多个非线性函数层,从而能够捕获高度非线性的网络结构,可将社交网络结构图中的局部结构以及全局结构利用起来,并根据保险理赔社交网络结构图的自身特性对该SDNE算法的目标函数进行相应改进以生成反欺诈模型,通过该反欺诈模型实现反团伙欺诈。如图所示,该反欺诈模型的生成方法可包括步骤S110至S160。
S110、从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集。
具体地,所述历史数据集为所述保险理赔数据库中预设时间内的所有案件的数据集合,其中,将所述历史数据集按预设比例随机划分为训练数据集和测试数据集,即将所述保险理赔数据库中预设时间范围内的所有案件按预设比例进行随机划分。在本实施例中,将所述历史数据集按七三比例随机划分为所述训练数据集和所述测试数据集,其中,所述训练数据集占所述历史数据集百分之七十的案件数,所述测试数据集占所述历史数据集百分之三十的案件数。在一些可行的实施例中,所述预设比例可以根据用户需求进行自定义设置。
S120、根据所述训练数据集生成训练用目标社交网络图。
具体地,所述根据所述训练数据集生成训练用目标社交网络图步骤具体包括以下步骤A以及步骤B:
步骤A:根据所述训练数据集构建训练用原始社交网络图,所构建的训练用原始社交网络图使用有向图G=(V,E)表示,其中,V为所述训练用原始社交网络图的节点集合,E为所述训练用原始社交网络图的边集合,V集合中的每个节点代表所述训练数据集中的一条数据,E集合中的每条边表示一对有序节点(s,t),其中s为源节点,t为目标节点。
步骤B:对所述训练用原始社交网络图进行数据处理以生成训练用目标社交网络图,所述数据处理包括对所述训练用原始社交网络图进行节点筛选以得到同质节点,过滤掉不同质节点;获取与所述同质节点相关的关键节点以及将所述关键节点添加到所述同质节点中以生成训练用目标社交网络图,所生成的训练用目标社交网络图使用图g=(v,e)表示,v为所述训练用目标社交网络图的节 点集合,e为所述训练用目标社交网络图的边集合。
在图论中,社交网络结构图由节点和边组成,节点代表对象,边表示的是两个对象的连接关系,通常来说,将图视为一种由“节点”组成的抽象网络,网络中的各节点可以通过“边”实现彼此的连接,表示两节点之间有关联。其中,所述训练数据集为保险理赔数据库中预设时间内的所有理赔案件中占百分之七十案件数的案件相关数据集合,所述理赔案件可以为车险理赔案件、重大疾病理赔案件等,其中以车险理赔案件为例,某一车险理赔案件相关数据主要包括案件号、涉案人员、涉案车辆、相关证件等数据,涉案人员主要包括被保险人、修理厂人员、保险公司人员以及相关交警等,利用某一车险理赔案件的案件号可以将该案件的所有相关数据构建成一个车险理赔案件的社交网络结构图,而所述训练数据集包括多个理赔案件,根据所述训练数据集构建训练用原始社交网络图,则该车险理赔案件社交网络结构图为所述训练用原始社交网络图的局部结构,该训练用原始社交网络图为全局结构,该训练用原始社交网络图主要由所述训练数据集中所有理赔案件的相关数据构建而成,其中,若所述训练数据集中所有理赔案件均为车险理赔案件,则其主要包括案件号、被保险人、修理厂人员、保险公司人员、相关交警、涉案车辆以及相关证件等相关数据,将所构建的训练用原始社交网络图则包括以上相关数据的节点,将所述原始社交网络图使用有向图G=(V,E)表示,其中,V为所述训练用原始社交网络图的节点集合,E为所述训练用原始社交网络图的边集合,V集合中的每个节点对应所述训练数据集中的一条数据,即每个案件号、每个涉案人员以及相关证件等数据均在该训练用原始社交网络图中存在与其相对应的一个节点,E集合中的每条边表示一对有序节点(s,t),其中s为源节点,t为目标节点,由于训练用原始社交网络图中的节点为不同质节点,则任一对节点均可能存在指向关系或者从属关系,例如被保险车辆属于被保险人,因此,该训练用原始社交网络图的节点是不同质,且为有向图。由于SDNE算法的数学运算逻辑要求处理的社交网络结构图中的节点为同质节点且该社交网络图需为无向图,而所述训练用原始社交网络图是由涉案人员、涉案车辆以及相关证件等不同质的节点构建而成,且由于所述节点不同质,该训练用原始社交网络图是有向图,因此需要对所述训练用原始社交网络图进行数据处理以生成只由同质节点构建而成的训练用目标社交网络图。
所述步骤B具体包括:对所述训练用原始社交网络图进行节点筛选以得到同质节点,过滤掉不同质节点;获取与所述同质节点相关的关键节点以及将所述关键节点添加到所述同质节点中以生成训练用目标社交网络图。
具体的,可以将与案件号相关联的涉案人员等节点筛选出来,清洗掉其他无关节点,并将有关联的涉案人员等同质节点通过案件号进行相连,且获取与该同质节点相关的关键节点,该关键节点为其他的需要特别关注的人员数据,例如某个已知的犯罪团伙的成员的数据信息,进而将所述关键节点添加到所述同质节点中以生成训练用目标社交网络图,由于所生成的训练用目标社交网络图中的节点为同质节点,通过边相连的一对节点之间并无指向关系或从属关系,因此,所生成的训练用目标社交网路图是无向的,所生成的训练用目标社交网络图使用图g=(v,e)表示,v为所述训练用目标社交网络图的节点集合,e为所述训练用目标社交网络图的边集合。因此,生成的训练用目标社交网络图可以为只包括涉案人员(或者其他的需要特别关注的人员信息)节点的社交网络图,也可以为只包括涉案车辆或物件节点的社交网络图。
S130、根据所述训练用目标社交网络图获取SDNE算法的目标函数,将该目标函数作为第一目标函数。
具体地,获取的SDNE算法的目标函数由所述训练用目标社交网络图的节点属性所决定,即只包括涉案人员(或者其他的需要特别关注的人员信息)节点的训练用目标社交网络图的目标函数与只包括涉案车辆等节点的训练用目标社交网络图的目标函数是不相同的,因此,根据训练用目标社交网络图获取所述SDNE算法的目标函数,将该目标函数作为第一目标函数,即获取的第一目标函数会因训练用目标社交网络图的节点属性改变而改变。由于该SDNE算法具有多个非线性函数层,从而能够捕获高度非线性的网络结构,可将社交网络结构图中的局部结构以及全局结构利用起来,其中,SDNE算法包括无监督算法和有监督算法,所述无监督算法包括深度信念网络算法、自编码神经网络算法以及深度波尔斯曼神经网络算法,在一实施例中,该社交网络结构图中的局部结构可以用半监督算法捕捉,该全局结构用无监督算法捕捉。
S140、根据所述第一目标函数以及预设约束条件构造第二目标函数。
具体地,所述根据所述第一目标函数以及预设约束条件构造第二目标函数步骤具体包括以下步骤C以及步骤D:
步骤C:获取预设约束条件。
步骤D:将所述预设约束条件作为所述第一目标函数的已知量加入到所述第一目标函数中以构造第二目标函数。
其中,该预设约束条件可由用户根据所述保险理赔数据库中历史数据集的自身特征进行预先设定,例如,预设被保险人一年内的正常出险次数或被保险车一年内的正常出险次数,也可以对以往涉及欺诈行为的案件情况进行分析总结以得出涉及欺诈行为的案件的特征信息,可以将历史数据集的自身特征以及以往欺诈案件的特征信息作为预设约束条件加入到目标函数中以得出新的目标函数,其中,该预设约束条件是作为该目标函数的已知量加入到该目标函数中,因此,该新的目标函数是SDNE算法结合历史数据集的特征情况构造生成的,例如,如果已知某个骗保团伙的成员散布在不同的案件中,但是这些案件并没有关联,可以将该已知的某个骗保团伙的成员的特征信息作为预设条件加入到SDNE算法的目标函数中,该目标函数为第一目标函数,通过将某个骗保团伙的成员信息作为预设约束条件加入到该目标函数中以使得这些看似没有关联的案件关联起来,从而在社交网络结构图中将属于同一骗保团伙的不同成员相互关联起来。因此,可以将获取到的预设约束条件加入到所述第一目标函数中以构造第二目标函数,该预设约束条件可由用户根据自身具体业务应用场景进行自定义设置,所述自定义设置包括在原来预设约束条件的基础上增加新的约束条件、删除约束条件以及修改该约束条件。
S150、获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数。
具体地,超参为目标函数中无法求解的未知量,只能通过数据集测试验证得到最优的一组超参,进而将这组最优的超参作为目标函数的已知量加入到该目标函数中,其中,目标函数的超参个数可以为几百个或几十个,且每个超参可以包括多个数值,因此,在实施例中,需要通过将多组超参与所述测试用目标社交网络图进行多次交叉试验以找到所述第二目标函数的最优超参,从而将该最优超参作为该第二目标函数的已知量代入该第二目标函数中以生成最优目标函数。
S160、利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
具体地,获取所述训练用目标社交网络图,利用所述训练用目标社交网络图训练所述最优目标函数,求解该最优目标函数的未知量,将求解得出的最优目标函数的未知量代入到该最优目标函数中以生成所述反欺诈模型,其中,所述训练用目标社交网络图为训练所述反欺诈模型的训练样本,所述训练用目标社交网络图中的节点信息以邻接矩阵的数据形式输入到所述最优目标函数中以训练得到该反欺诈模型。
在上述实施例中,获取保险理赔数据库中的历史数据集,所述历史数据集包括训练数据集以及测试数据集,根据所述训练数据集生成训练用目标社交网络图,进而根据该训练用目标社交网络图的节点属性获取该SDNE算法相对应的目标函数,将该目标函数作为第一目标函数,由于可以对以往的保险理赔数据库中涉及欺诈行为的案件情况进行分析总结以得出欺诈案件的特征信息,将这些特征信息作为预设约束条件加入到SDNE算法的目标函数中以生成新的目标函数,因此,根据所述第一目标函数以及预设约束条件构造第二目标函数,进而获取该第二目标函数的最优超参以生成最优目标函数,将所述训练用目标社交网络图作为训练反欺诈模型的训练样本,利用该训练用目标社交网络图训练所述最优目标函数以得到所述反欺诈模型,其模型的准确度较高,且模型可信度较高,且利用该反欺诈模型可以实现反团伙欺诈,从而可以解决现有技术中对社交网络图分析以识别保险理赔案件中欺诈行为需要依靠人工定义用户行为特征这一问题,可以将社交网络图的局部结构以及全局结构等信息利用起来,进而挖掘出更多隐藏的信息以及更多潜在的行为模式,可以提高识别欺诈行为的准确率,为公司挽回损失。
在一实施例中,请参阅图2,其是本申请实施例提供的反欺诈模型的生成方法的子流程示意图,如图2所示,所述步骤S150中的获取所述第二目标函数的最优超参的步骤具体包括以下步骤S151-S153。
S151、获取所述第二目标函数的超参。
具体地,获取所述第二目标函数的超参以及该超参所包括的数值。
S152、根据所述测试数据集生成测试用目标社交网络图。
具体地,所述测试数据集为保险理赔数据库中预设时间内的所有理赔案件中占百分之三十案件数的案件相关数据集合。所述步骤S152具体包括以下步骤S1521以及步骤S1522,其中,步骤S1521根据所述测试数据集构建测试用原始 社交网路图以及步骤S1522对所述测试用原始社交网络图进行数据处理以生成测试用目标网络图。由于构建测试用原始社交网络图以及生成测试用目标社交网络图方法与步骤S120中构建训练用原始社交网络图以及生成训练用目标社交网络图方法类似,在此不再赘述。
S153、根据所述超参和所述测试用目标社交网络图交叉验证所述第二目标函数,得到最优超参。
具体地,根据获取的所述第二目标函数的超参以及该超参所包括的数值,将该超参与该超参所包括的数值相组合后依次与所述测试用目标社交网络图交叉验证所述第二目标函数,以得到最优超参。例如,若第二目标函数的超参包括两个[a,b],每个超参均包括两个数值[a1,a2]以及[b1,b2],即有4组超参[a1,b1],[a1,b2],[a2,b1],[a2,b2],将这4组超参分别与所述测试用目标社交网络图交叉验证所述第二目标函数,从而获得一组最优超参,进而可以将所述最优超参作为已知量加入到所述第二目标函数中以生成最优目标函数。在此不对第二目标函数的超参的个数进行限制,该第二目标函数的超参可以是上百个或者几十个。其中,所述测试用目标社交网络图为与所述第二目标函数交叉验证以得到最优超参的测试样本,所述测试用目标社交网络图中的节点信息以邻接矩阵的数据形式输入到所述第二目标函数中以得到最优目标函数。
在上述实施例中,由于超参为算法的目标函数不可解的未知量,且超参以及超参所包含的数值的个数为多个,因此需要利用测试数据集即测试用目标社交网络图与多组超参进行交叉验证该目标函数,以得到最优超参,进而可以将该最优超参作为已知量代入该目标函数中,即可得到算法的最优目标函数。
请参阅图3,其是本申请实施例提供的一种反欺诈模型的应用方法的流程示意图。该反欺诈模型的应用方法应用于保险理赔反团伙欺诈的场景中。如图所示,该方法可包括步骤S210-S220。
S210、从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合。具体地,该步骤S210中生成检测用目标社交网络图与上述实施例中的步骤S120生成训练用目标社交网络图类似,其主要区别在于从保险理赔数据库中获取的待检测数据集可以是保险理赔数据库中任一案件的数据集或者多个案件的数据集,即保险理赔数据库中的所有案件的数据均可以作为待检测数据集,可以根据所 获取的待检测数据集构建检测用目标社交网络图,进而利用基于SDNE算法生成的反欺诈模型测试所述检测用目标社交网络图,从而识别出保险理赔数据库中所有可能存在欺诈行为的案件,实现反团伙欺诈,所述检测用目标社交网络图的构建过程与步骤S120中的训练用目标社交网络图的构建过程类似,在此不再赘述。
S220、利用上述实施例中的步骤S110-S160所生成的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。具体地,利用所述反欺诈模型,可以将所述检测用目标社交网络图中的任一节点一一映射到高维向量空间内,且所述任一节点在高维向量空间内均有唯一的向量与其相对应,其中,关联度越强的节点在高维向量空间中所对应的向量越接近,关联度越弱的节点在高维向量空间中所对应的向量越远离,关联度强的节点说明在保险理赔案件中相互较强的关联关系。本申请实施例将检测用目标社交网络图中的节点通过反欺诈模型一一映射到高维向量空间里,在该向量空间中,某些关联度大的节点在向量空间中相互接近,其中,如果存在某一节点为已知的欺诈人员,则与该节点关联度较大的节点存在团伙欺诈的行为的概率较大,即集中在该节点附近的人或车涉及团伙欺诈行为的概率较大,因此,用户可以通过观察某些节点在高维向量空间中的映射情况分析这些节点是否存在欺诈行为,例如,用户在向量空间中可以重点关注已知欺诈人员的节点的附近,观察分析与其关联度强的节点。从而可以解决在社交网络分析的传统方法中识别保险理赔案件中欺诈行为需要依赖人工定义用户行为特征的问题,大大提高了识别欺诈行为的准确率,为保险公司挽回损失,且本申请实施例通过将所有理赔案件数据构建成社交网络图,并通过SDNE算法将所述社交网络图中的节点一一映射到向量空间中,其中,所述检测用目标社交网络图中的节点信息以邻接矩阵的数据形式输入到所述反欺诈模型中,可以有效考虑到社交网络中的局部结构以及全局结构,从而挖掘出更多潜在的有价值的信息。
在上述实施例中,通过从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,进而利用步骤S110-S160生成的该反欺诈模型,将所述检 测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近,由于在高维向量空间内关联度高的节点相互接近,关联度低的节点相互远离,因此可以重点关注高维向量空间内相互聚集的节点,分析节点之间的关系以挖掘出更多潜在的信息以及潜在的行为模式,例如,如果在相互聚集的节点中存在已知的欺诈惯犯的节点,那么在该节点附近的节点存在欺诈行为的可能性较大。本申请实施例可以解决现有技术中对社交网络图分析以识别保险理赔案件中欺诈行为需要依靠人工定义用户行为特征这一问题,可以将社交网络图的局部结构以及全局结构等信息利用起来,而不是对用户的表面上存在的行为特征的频数进行简单统计以识别反欺诈行为,反欺诈模型的应用方法可以提高识别欺诈行为的准确率,为公司挽回损失。
请参阅图4,其是本申请另一实施例提供的一种反欺诈模型的应用方法的流程示意图。该反欺诈模型的应用方法应用于保险理赔反团伙欺诈的场景中。如图所示,该方法可包括步骤S310-S330。其中步骤S310-S320与上述实施例中的步骤S210-S220类似,在此不再赘述。下面详细说明本实施例中所增加的步骤S330。
S330、根据向量运算算法,对所述高维向量空间的向量进行计算以获取高维向量空间内任意向量之间的关联度,其中,所述向量运算算法包括回归算法、分类算法以及聚类算法。具体地,通过向量运算算法可以对所述高维向量空间的任一向量之间进行计算,从而获取高维向量空间内任意向量之间的关联度,进而可以挖掘出更多潜在的信息以及行为模式,也可以建立清晰的模式和量化指标,进而可以准确刻画用户的行为特征。其中,清晰的模式包括欺诈行为有哪些特征变量,具体地,该特征变量不仅包括可以观察到的特征变量,也包括不可被观察到的特征变量,且可以对这些特征变量产生的影响进行量化,比如通过向量运算算法确定这些特征变量对行为的影响是线性的还是非线性的以及通过对高维向量空间内任一向量进行复杂的向量运算来量化表示这些特征变量所产生的影响。
在上述实施例中,通过从保险理赔数据库中获取待检测数据集以生成检测 用目标社交网络图,进而利用步骤S110-S160生成的该反欺诈模型,将所述检测用目标社交网络图中的节点一一映射到高维向量空间,进而根据向量运算算法,对所述高维向量空间的向量进行计算以获取高维向量空间内任意向量之间的关联度。由于待检测数据集的数据在检测用目标社交网络图中以节点的形式展示,无法对图中的节点进行复杂的数学运算,但将社交网络图中的节点映射到高维向量空间内以获得唯一与其对应的向量,就可以对向量空间内的任一向量坐标进行复杂向量运算,进而可以得到任一向量之间的关联度,从而可以为业务人员决策提供量化指标,有利于做出更加科学的决策,加深业务人员对用户行为特征的理解,可以挖掘出更多隐藏的信息以及更多潜在的行为模式。
请参阅图5,其是本申请实施例提供的一种装置300的示意性框图。如图5所示,该装置300对应于图1所示的反欺诈模型的生成方法。该装置300包括用于执行上述反欺诈模型的生成方法的单元,该装置300可以被配置于台式电脑、平板电脑、手提电脑等终端中。具体地,请参阅图5,该装置300包括数据获取单元301、第一建图单元302、函数获取单元303、函数构造单元304、最优函数生成单元305以及模型生成单元306。
所述数据获取单元301用于从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集。
所述第一建图单元302用于根据所述训练数据集生成训练用目标社交网络图。具体地,所述第一建图单元302包括第一建图子单元3021以及第一数据处理单元3022。
所述第一建图子单元3021用于根据所述训练数据集构建训练用原始社交网络图,所构建的训练用原始社交网络图使用有向图G=(V,E)表示,其中,V为所述训练用原始社交网络图的节点集合,E为所述训练用原始社交网络图的边集合,V集合中的每个节点代表所述训练数据集中的一条数据,E集合中的每条边表示一对有序节点(s,t),其中s为源节点,t为目标节点。
所述第一数据处理单元3022用于对所述训练用原始社交网络图进行数据处理以生成训练用目标社交网络图,所述数据处理包括对所述训练用原始社交网络图进行节点筛选以得到同质节点,过滤掉不同质节点;获取与所述同质节点相关的关键节点以及将所述关键节点添加到所述同质节点中以生成训练用目标 社交网络图,所生成的训练用目标社交网络图使用图g=(v,e)表示,v为所述训练用目标社交网络图的节点集合,e为所述训练用目标社交网络图的边集合。
所述函数获取单元303用于根据所述训练用目标社交网络图获取SDNE算法的目标函数,将该目标函数作为第一目标函数。
所述函数构造单元304用于根据所述第一目标函数以及预设约束条件构造第二目标函数。具体地,所述函数构造单元304包括条件获取单元3041以及函数构造子单元3042。
所述条件获取单元3041用于获取预设约束条件。
所述函数构造子单元3042用于将所述预设约束条件作为所述第一目标函数的已知量加入到所述第一目标函数中以构造第二目标函数。
所述最优函数生成单元305用于获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数。
所述模型生成单元306用于利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
在一实施例中,如图6所示,其是本申请实施例提供的一种装置的最优函数生成单元305的示意性框图,所述最优函数生成单元305包括超参获取单元3051、第二建图单元3052以及函数验证单元3053。
所述超参获取单元3051用于获取所述第二目标函数的超参。具体地,获取所述第二目标函数的超参以及该超参所包括的数值。
所述第二建图单元3052用于根据所述测试数据集生成测试用目标社交网络图。
具体地,所述测试数据集为保险理赔数据库中预设时间内的所有理赔案件中占百分之三十案件数的案件相关数据集合,所述第二建图单元3052与上述实施例中的第一建图单元302类似,还用于根据所述测试数据集构建测试用原始社交网路图以及用于对所述测试用原始社交网络图进行数据处理以生成测试用目标网络图。
所述函数验证单元3053用于根据所述超参和所述测试用目标社交网络图交叉验证所述第二目标函数,得到最优超参。
具体地,所述函数验证单元3053用于根据所述超参获取单元3051获取的 所述第二目标函数的超参以及该超参所包括的数值,将该超参与该超参所包括的数值相组合后依次与所述测试用目标社交网络图交叉验证所述第二目标函数,以得到最优超参。在此不对第二目标函数的超参的个数进行限制,该第二目标函数的超参可以是上百个或者几十个。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述装置300和各单元的具体实现过程以及效果,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
请参阅图7,其是本申请实施例提供的另一种装置400的示意性框图。如图7所示,该另一种装置400对应于图3所示的反欺诈模型的应用方法。该另一种装置400包括用于执行上述反欺诈模型的应用方法的单元,该另一种装置400可以被配置于台式电脑、平板电脑、手提电脑等终端中。具体地,请参阅图7,该另一种装置400包括第三建图单元401以及节点映射单元402。
所述第三建图单元401用于从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合。
具体地,该第三建图单元401与上述实施例中的第一建图单元302类似,该第三建图单元401具体包括第三建图子单元4011以及第三数据处理单元4012,其主要区别在于所述第三建图子单元4011从保险理赔数据库中获取的待检测数据集可以为保险理赔数据库中任一案件数据或多个待检测数据集合,即保险理赔数据库中的所有案件的数据均可以作为待检测数据集,可以根据该待检测数据集构建检测用目标社交网络图,进而利用基于SDNE算法生成的反欺诈模型测试所述检测用目标社交网络图,从而识别出保险理赔数据库中所有可能存在欺诈行为的案件,实现反团伙欺诈,所述第三建图单元401的应用过程以及相应功能与所述第一建图单元302类似,在此不再赘述。
所述节点映射单元402用于利用上述实施例中的步骤S110-S160所生成的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述装置400和 各单元的具体实现过程以及效果,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
请参阅图8,其是本申请另一实施例提供的另一种装置500的示意性框图。如图8所示,本申请另一实施例提供的另一种装置500是在上述实施例的基础上增加了向量运算单元503,即所述装置500包括第四建图单元501、节点映射单元502以及向量运算单元503。其中,该第四建图单元501与上述实施例中的第三建图单元402类似,该第四建图单元501具体包括第四建图子单元5011以及第四数据处理单元5012,由于所述第四建图单元501以及节点映射单元502的应用过程以及相应功能与所述第三建图单元402以及节点映射单元402类似,在此不再赘述。
所述向量运算单元503用于根据向量运算算法,对所述高维向量空间的向量进行计算以获取高维向量空间内任意向量之间的关联度,其中,所述向量运算算法包括回归算法、分类算法以及聚类算法。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述装置500和各单元的具体实现过程以及效果,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
上述装置可以实现为一种计算机程序的形式,该计算机程序可以在如图9所示的计算机设备上运行。
请参阅图9,其是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备600可以是终端,也可以是服务器,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑和个人数字助理等电子设备。服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。
参阅图9,该计算机设备600包括通过系统总线601连接的处理器602、存储器和网络接口605,其中,存储器可以包括非易失性存储介质603和内存储器604。
该非易失性存储介质603可存储操作系统6031和计算机程序6032。该计算机程序6032包括程序指令,该程序指令被执行时,可使得处理器602执行一种反欺诈模型的生成及应用方法。
该处理器602用于提供计算和控制能力,以支撑整个计算机设备600的运行。
该内存储器604为非易失性存储介质603中的计算机程序6032的运行提供环境,该计算机程序6032被处理器602执行时,可使得处理器602执行一种反欺诈模型的生成及应用方法。
该网络接口605用于与其它设备进行网络通信。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备600的限定,具体的计算机设备600可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器602用于运行存储在存储器中的计算机程序6032,以实现如上所述的反欺诈模型的生成方法以及应用方法。
应当理解,在本申请实施例中,处理器602可以是中央处理单元(Central Processing Unit,CPU),该处理器602还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序包括程序指令,计算机程序可存储于一存储介质中,该存储介质为存储介质。该程序指令被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。
因此,本申请还提供一种计算机可读存储介质。所述计算机可读存储介质存储有计算机程序,其中计算机程序包括程序指令。该程序指令被处理器执行如上所述的反欺诈模型的生成方法以及应用方法。
所述存储介质可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的存储介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (19)

  1. 一种反欺诈模型的生成方法,包括:
    从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集;
    根据所述训练数据集生成训练用目标社交网络图;
    根据所述训练用目标社交网络图获取SDNE算法的目标函数,将该目标函数作为第一目标函数;
    根据所述第一目标函数以及预设约束条件构造第二目标函数;
    获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数;以及
    利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
  2. 根据权利要求1所述的反欺诈模型的生成方法,其中,所述根据所述训练数据集生成训练用目标社交网络图,包括:
    根据所述训练数据集构建训练用原始社交网络图,所构建的训练用原始社交网络图使用有向图G=(V,E)表示,其中,V为所述训练用原始社交网络图的节点集合,E为所述训练用原始社交网络图的边集合,V集合中的每个节点代表所述训练数据集中的一条数据,E集合中的每条边表示一对有序节点(s,t),其中s为源节点,t为目标节点;以及
    对所述训练用原始社交网络图进行数据处理以生成训练用目标社交网络图,所述数据处理包括对所述训练用原始社交网络图进行节点筛选以得到同质节点,过滤掉不同质节点;获取与所述同质节点相关的关键节点以及将所述关键节点添加到所述同质节点中以生成训练用目标社交网络图,所生成的训练用目标社交网络图使用图g=(v,e)表示,v为所述训练用目标社交网络图的节点集合,e为所述训练用目标社交网络图的边集合。
  3. 根据权利要求1所述的反欺诈模型的生成方法,其中,所述根据所述第一目标函数以及预设约束条件构造第二目标函数,包括:
    获取预设约束条件;以及
    将所述预设约束条件作为所述第一目标函数的已知量加入到所述第一目标函数中以构造第二目标函数。
  4. 根据权利要求1所述的反欺诈模型的生成方法,其中,所述获取所述第二目标函数的最优超参,包括:
    获取所述第二目标函数的超参;
    根据所述测试数据集生成测试用目标社交网络图;以及
    根据所述超参和所述测试用目标社交网络图交叉验证所述第二目标函数,得到最优超参。
  5. 根据权利要求1所述的反欺诈模型的生成方法,其中,所述SDNE算法包括无监督算法和有监督算法,所述无监督算法包括深度信念网络算法、自编码神经网络算法以及深度波尔斯曼神经网络算法。
  6. 一种反欺诈模型的应用方法,包括:
    从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合;以及
    利用权利要求1-5任一项所述的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。
  7. 根据权利要求6所述的反欺诈模型的应用方法,其中,所述将所述检测用目标社交网络图中的节点映射到高维向量空间的步骤之后,还包括:
    根据向量运算算法,对所述高维向量空间的向量进行计算以获取高维向量空间内任意向量之间的关联度,其中,所述向量运算算法包括回归算法、分类算法以及聚类算法。
  8. 一种反欺诈模型的生成装置,包括:
    数据获取单元,用于从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集;
    第一建图单元,用于根据所述训练数据集生成训练用目标社交网络图;
    函数获取单元,用于根据所述训练用目标社交网络图获取SDNE算法的目 标函数,将该目标函数作为第一目标函数;
    函数构造单元,用于根据所述第一目标函数以及预设约束条件构造第二目标函数;
    最优函数生成单元,用于获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数;以及
    模型生成单元,用于利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
  9. 根据权利要求8所述的反欺诈模型的生成装置,其中,所述第一建图单元,包括:
    第一建图子单元,用于根据所述训练数据集构建训练用原始社交网络图,所构建的训练用原始社交网络图使用有向图G=(V,E)表示,其中,V为所述训练用原始社交网络图的节点集合,E为所述训练用原始社交网络图的边集合,V集合中的每个节点代表所述训练数据集中的一条数据,E集合中的每条边表示一对有序节点(s,t),其中s为源节点,t为目标节点;以及
    第一数据处理单元,用于对所述训练用原始社交网络图进行数据处理以生成训练用目标社交网络图,所述数据处理包括对所述训练用原始社交网络图进行节点筛选以得到同质节点,过滤掉不同质节点;获取与所述同质节点相关的关键节点以及将所述关键节点添加到所述同质节点中以生成训练用目标社交网络图,所生成的训练用目标社交网络图使用图g=(v,e)表示,v为所述训练用目标社交网络图的节点集合,e为所述训练用目标社交网络图的边集合。
  10. 根据权利要求8所述的反欺诈模型的生成装置,其中,所述函数构造单元,包括:
    条件获取单元,用于获取预设约束条件;以及
    函数构造子单元,用于将所述预设约束条件作为所述第一目标函数的已知量加入到所述第一目标函数中以构造第二目标函数。
  11. 根据权利要求8所述的反欺诈模型的生成装置,其中,所述最优函数生成单元,包括:
    超参获取单元,用于获取所述第二目标函数的超参;
    第二建图单元,用于根据所述测试数据集生成测试用目标社交网络图;以 及
    函数验证单元,用于根据所述超参和所述测试用目标社交网络图交叉验证所述第二目标函数,得到最优超参。
  12. 一种反欺诈模型的应用装置,包括:
    第三建图单元,用于从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合;以及
    节点映射单元,用于利用权利要求1-5任一项所述的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。
  13. 根据权利要求12所述的反欺诈模型的应用装置,其中,所述应用装置还包括:
    向量运算单元,用于根据向量运算算法,对所述高维向量空间的向量进行计算以获取高维向量空间内任意向量之间的关联度,其中,所述向量运算算法包括回归算法、分类算法以及聚类算法。
  14. 一种计算机设备,包括存储器,以及与所述存储器相连的处理器;
    所述存储器用于存储实现反欺诈模型的生成方法的计算机程序;
    所述处理器用于运行所述存储器中存储的计算机程序,以执行以下步骤:从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集;根据所述训练数据集生成训练用目标社交网络图;根据所述训练用目标社交网络图获取SDNE算法的目标函数,将该目标函数作为第一目标函数;根据所述第一目标函数以及预设约束条件构造第二目标函数;获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数;以及利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
  15. 根据权利要求15所述的计算机设备,其中,所述处理器在执行所述根据所述训练数据集生成训练用目标社交网络图的步骤时,具体执行以下步骤:根 据所述训练数据集构建训练用原始社交网络图,所构建的训练用原始社交网络图使用有向图G=(V,E)表示,其中,V为所述训练用原始社交网络图的节点集合,E为所述训练用原始社交网络图的边集合,V集合中的每个节点代表所述训练数据集中的一条数据,E集合中的每条边表示一对有序节点(s,t),其中s为源节点,t为目标节点;以及对所述训练用原始社交网络图进行数据处理以生成训练用目标社交网络图,所述数据处理包括对所述训练用原始社交网络图进行节点筛选以得到同质节点,过滤掉不同质节点;获取与所述同质节点相关的关键节点以及将所述关键节点添加到所述同质节点中以生成训练用目标社交网络图,所生成的训练用目标社交网络图使用图g=(v,e)表示,v为所述训练用目标社交网络图的节点集合,e为所述训练用目标社交网络图的边集合。
  16. 一种计算机设备,包括存储器,以及与所述存储器相连的处理器;
    所述存储器用于存储实现反欺诈模型的应用方法的计算机程序;
    所述处理器用于运行所述存储器中存储的计算机程序,以执行以下步骤:从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合;以及利用权利要求1-5任一项所述的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。
  17. 根据权利要求17所述的计算机设备,其中,所述处理器用于运行所述存储器中存储的计算机程序,还执行以下步骤:根据向量运算算法,对所述高维向量空间的向量进行计算以获取高维向量空间内任意向量之间的关联度,其中,所述向量运算算法包括回归算法、分类算法以及聚类算法。
  18. 一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现以下步骤:从保险理赔数据库中获取历史数据集,所述历史数据集为所述保险理赔数据库中预设时间范围内的所有案件数据的集合,所述历史数据集包括训练数据集以及测试数据集;根据所述训练数据集生成训练用目标社交网络图;根据所述训练用目标社交网络图获取SDNE算法的目标 函数,将该目标函数作为第一目标函数;根据所述第一目标函数以及预设约束条件构造第二目标函数;获取所述第二目标函数的最优超参,并将所述最优超参作为所述第二目标函数的已知量加入到所述第二目标函数中以生成最优目标函数;以及利用所述训练用目标社交网络图训练所述最优目标函数以生成所述反欺诈模型。
  19. 一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者一个以上计算机程序,所述一个或者一个以上计算机程序可被一个或者一个以上的处理器执行,以实现以下步骤:从保险理赔数据库中获取待检测数据集以生成检测用目标社交网络图,所述待检测数据集为保险理赔数据库中任一或多个待检测数据集合;以及利用权利要求1-5任一项所述的反欺诈模型,将所述检测用目标社交网络图中的节点映射到高维向量空间以供用户根据节点在高维向量空间的映射情况分析该节点是否存在欺诈行为,其中,所述检测用目标社交网络图的任一节点在所述高维向量空间均存在唯一与其对应的向量且关联度越强的节点在高维向量空间中所对应的向量越接近。
PCT/CN2018/124819 2018-09-10 2018-12-28 反欺诈模型的生成及应用方法、装置、设备及存储介质 WO2020052168A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811051842.4A CN109447658A (zh) 2018-09-10 2018-09-10 反欺诈模型的生成及应用方法、装置、设备及存储介质
CN201811051842.4 2018-09-10

Publications (1)

Publication Number Publication Date
WO2020052168A1 true WO2020052168A1 (zh) 2020-03-19

Family

ID=65533265

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124819 WO2020052168A1 (zh) 2018-09-10 2018-12-28 反欺诈模型的生成及应用方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN109447658A (zh)
WO (1) WO2020052168A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147389B (zh) * 2019-03-14 2023-09-26 腾讯科技(深圳)有限公司 帐号处理方法和装置、存储介质及电子装置
CN110263106B (zh) * 2019-06-25 2020-02-21 中国人民解放军国防科技大学 协同舆论欺诈检测方法和装置
CN110490750B (zh) * 2019-07-23 2022-10-28 平安科技(深圳)有限公司 数据识别的方法、系统、电子设备及计算机存储介质
CN111143684B (zh) * 2019-12-30 2023-03-21 腾讯科技(深圳)有限公司 基于人工智能的泛化模型的训练方法及装置
CN111447179A (zh) * 2020-03-03 2020-07-24 中山大学 一种针对以太网钓鱼诈骗的网络表示学习方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109827A1 (en) * 2015-10-15 2017-04-20 International Business Machines Corporation Method and system to determine auto insurance risk
CN106600423A (zh) * 2016-11-18 2017-04-26 云数信息科技(深圳)有限公司 基于机器学习的车险数据处理方法、车险欺诈识别方法及装置
CN108257033A (zh) * 2018-01-12 2018-07-06 中国平安人寿保险股份有限公司 一种保单分析方法、装置、终端设备及存储介质
CN108334647A (zh) * 2018-04-12 2018-07-27 阿里巴巴集团控股有限公司 保险欺诈识别的数据处理方法、装置、设备及服务器
CN108364233A (zh) * 2018-01-12 2018-08-03 中国平安人寿保险股份有限公司 一种保单风险评估方法、装置、终端设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292424B (zh) * 2017-06-01 2020-01-21 四川新网银行股份有限公司 一种基于复杂社交网络的反欺诈和信用风险预测方法
CN107943879A (zh) * 2017-11-14 2018-04-20 上海维信荟智金融科技有限公司 基于社交网络的欺诈团体检测方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109827A1 (en) * 2015-10-15 2017-04-20 International Business Machines Corporation Method and system to determine auto insurance risk
CN106600423A (zh) * 2016-11-18 2017-04-26 云数信息科技(深圳)有限公司 基于机器学习的车险数据处理方法、车险欺诈识别方法及装置
CN108257033A (zh) * 2018-01-12 2018-07-06 中国平安人寿保险股份有限公司 一种保单分析方法、装置、终端设备及存储介质
CN108364233A (zh) * 2018-01-12 2018-08-03 中国平安人寿保险股份有限公司 一种保单风险评估方法、装置、终端设备及存储介质
CN108334647A (zh) * 2018-04-12 2018-07-27 阿里巴巴集团控股有限公司 保险欺诈识别的数据处理方法、装置、设备及服务器

Also Published As

Publication number Publication date
CN109447658A (zh) 2019-03-08

Similar Documents

Publication Publication Date Title
WO2020052168A1 (zh) 反欺诈模型的生成及应用方法、装置、设备及存储介质
CN110009174B (zh) 风险识别模型训练方法、装置及服务器
Tarawneh et al. Stop oversampling for class imbalance learning: A review
CN107633265B (zh) 用于优化信用评估模型的数据处理方法及装置
TWI726341B (zh) 樣本屬性評估模型訓練方法、裝置、伺服器及儲存媒體
CN109087079B (zh) 数字货币交易信息分析方法
US9294497B1 (en) Method and system for behavioral and risk prediction in networks using automatic feature generation and selection using network topolgies
WO2019019630A1 (zh) 反欺诈识别方法、存储介质、承载平安脑的服务器及装置
Zhao et al. A machine learning based trust evaluation framework for online social networks
CN111309822B (zh) 用户身份识别方法及装置
CN113360580B (zh) 基于知识图谱的异常事件检测方法、装置、设备及介质
US11538044B2 (en) System and method for generation of case-based data for training machine learning classifiers
Sánchez-González et al. A study of the effectiveness of two threshold definition techniques
Habibpour et al. Uncertainty-aware credit card fraud detection using deep learning
CN109313541A (zh) 用于显示和比较攻击遥测资源的用户界面
CN116204773A (zh) 一种因果特征的筛选方法、装置、设备及存储介质
CN116307671A (zh) 风险预警方法、装置、计算机设备、存储介质
CN115545103A (zh) 异常数据识别、标签识别方法和异常数据识别装置
Umer et al. Ensemble Deep Learning Based Prediction of Fraudulent Cryptocurrency Transactions
CN111245815B (zh) 数据处理方法、装置、存储介质及电子设备
Hassanat et al. Stop oversampling for class imbalance learning: A critical review
CN111681044A (zh) 积分兑换作弊行为处理方法及装置
CN114418780B (zh) 欺诈团伙识别方法、装置、计算机设备和存储介质
CN112541765A (zh) 用于检测可疑交易的方法和装置
Bodaghi et al. The detection of professional fraud in automobile insurance using social network analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933425

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18933425

Country of ref document: EP

Kind code of ref document: A1