CN114169897A - Training method, risk identification device, computer equipment and medium - Google Patents

Training method, risk identification device, computer equipment and medium Download PDF

Info

Publication number
CN114169897A
CN114169897A CN202111520883.5A CN202111520883A CN114169897A CN 114169897 A CN114169897 A CN 114169897A CN 202111520883 A CN202111520883 A CN 202111520883A CN 114169897 A CN114169897 A CN 114169897A
Authority
CN
China
Prior art keywords
user
risk
user relationship
network
risk identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111520883.5A
Other languages
Chinese (zh)
Inventor
石荣华
狄先红
龚剑
邹琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111520883.5A priority Critical patent/CN114169897A/en
Publication of CN114169897A publication Critical patent/CN114169897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The disclosure provides a training method and a risk recognition method of a risk recognition model, which can be applied to the field of information security. The method comprises the following steps: acquiring m user relationship networks, wherein m is a positive integer greater than 2; based on the risk user identification, marking n user relationship networks in the m user relationship networks to obtain n user relationship networks with marks, wherein n is less than m and is greater than or equal to 1; and training a risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, wherein the risk identification model is used for carrying out risk identification on the unmarked user relationship networks to be identified. The present disclosure also provides a risk identification apparatus, a device, a storage medium, and a program product.

Description

Training method, risk identification device, computer equipment and medium
Technical Field
The present disclosure relates to the field of information security, in particular to the field of financial fraud, and more particularly to a risk identification training method, a risk identification method, apparatus, device, medium, and program product.
Background
With the popularization of information technology and the successive introduction of various financial products, financial activities gradually permeate into the lives of people, and the risk implementation forms are increasingly diversified. When the user remits money into the appointed account, the money can be rapidly transferred layer by layer through a plurality of accounts mastered by the risk user group, so that the risk money transfer is realized. The traditional anti-fraud model only focuses on the individual level of account fund flow transaction, and independent individual account transaction fund is used as a model key feature for screening, so that only the first level of risk fund flow can be captured, and other accounts mastered by a risk user group are difficult to discover.
On the other hand, when the fund chain of the unsafe account is subjected to back-check retrieval, normal transactions or interaction with other platforms exist among the accounts, so that the loss of the information of the fund chain information relationship network is serious, suspicious nodes in the network are difficult to confirm, and effective tracking cannot be performed.
Disclosure of Invention
In view of the above, the present disclosure provides a training method, risk identification method, apparatus, device, medium, and program product of a risk identification model that identifies a community of risk users from an intrinsic associative relationship between accounts.
According to a first aspect of the present disclosure, there is provided a training method of a risk recognition model, including: acquiring m user relationship networks, wherein m is a positive integer greater than 2; based on the risk user identification, marking n user relationship networks in the m user relationship networks to obtain n user relationship networks with marks, wherein n is less than m and is greater than or equal to 1; and training a risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, wherein the risk identification model is used for carrying out risk identification on the unmarked user relationship networks to be identified.
According to an embodiment of the present disclosure, wherein the m user relationship networks include: the m user relation networks are knowledge graph networks, nodes of the m user relation networks comprise users, user mobile phone numbers and user network addresses, and the sides of the m user relation networks are the access times of the users in a preset time period; the target data comprises user associated data and user transaction data, wherein the user associated data comprises a user, a user mobile phone number, a user network address and the access times of the user in a preset time period; the target data includes data generated by a user performing an operation for the payment application within a preset period of time.
According to the embodiment of the disclosure, training a risk identification model to be trained by using m user relationship networks to obtain the risk identification model comprises: deleting the user relationship networks which do not contain the user nodes, and deleting the invalid data in the m user relationship networks to obtain the user relationship networks to be input; inputting a user relationship network to be input into a risk identification model, and outputting a risk identification result; and when the risk identification model to be trained meets the preset conditions, obtaining the risk identification model for carrying out risk identification on the unmarked user relationship network to be identified.
According to the embodiment of the disclosure, the risk identification model to be trained includes a feature extraction module and a prediction module, the user relationship network to be input is input into the risk identification model, and outputting the risk identification result further includes: inputting a user relationship network to be input into a feature extraction module, and outputting first feature data and second feature data; inputting the first characteristic data and the second characteristic data into a prediction module, and outputting a risk identification result; the first characteristic data is network structure characteristic data comprising the number of nodes and the correlation degree between the nodes, and the second characteristic data is network entity characteristic data comprising user transaction data.
According to an embodiment of the present disclosure, wherein marking n user relationship networks of the m user relationship networks based on the risky user identifier comprises: and marking the risk users in the n user relationship networks as risk nodes, wherein the risk users are users in a risk user list.
According to a second aspect of the present disclosure, there is provided a risk identification method, comprising: acquiring a user relationship network; and inputting the user relationship network into a risk identification model, and outputting an identification result, wherein the risk identification model is obtained by training through the risk identification model training method.
According to an embodiment of the present disclosure, the risk identification method further includes: setting a risk grade threshold value based on the risk identification result; obtaining the risk level of the user relationship network according to the risk level threshold; and setting a processing strategy according to the risk level of the user relationship network.
According to a third aspect of the present disclosure, there is provided a training device for a risk recognition model, comprising: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring m user relationship networks, and m is a positive integer greater than 2; the marking module is used for marking n user relationship networks in the m user relationship networks based on the risk user identification to obtain n user relationship networks with marks, wherein n is less than m and is greater than or equal to 1; and the training module is used for training the risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, wherein the risk identification model is used for carrying out risk identification on the unmarked user relationship network to be identified.
According to a fourth aspect of the present disclosure, there is provided a risk identification device comprising: the second acquisition module is used for acquiring the user relationship network; and the recognition module is used for inputting the user relationship network into a risk recognition model and outputting a recognition result, wherein the risk recognition model is obtained by training through the risk recognition model training method.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described risk identification method.
According to a sixth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described risk identification method.
According to a seventh aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-mentioned risk identification method.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of a risk recognition model training method and a risk recognition method according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of training a risk recognition model according to an embodiment of the present disclosure;
FIG. 3A schematically illustrates a schematic diagram of a user relationship network according to an embodiment of the disclosure;
FIG. 3B schematically illustrates a node diagram within a user relationship network according to an embodiment of the disclosure;
FIG. 4 schematically shows a flow chart for deriving a risk identification model according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart for outputting a risk identification result according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a risk identification method according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of a risk identification method according to another embodiment of the present disclosure;
FIG. 8 schematically illustrates a block diagram of a training apparatus for a risk recognition model according to an embodiment of the present disclosure;
FIG. 9 schematically illustrates a block diagram of a risk identification device according to an embodiment of the present disclosure; and
fig. 10 schematically shows a block diagram of an electronic device adapted to implement a risk identification method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a training method of a risk identification model, which includes: acquiring m user relationship networks, wherein m is a positive integer greater than 2; based on the risk user identification, marking n user relationship networks in the m user relationship networks to obtain n user relationship networks with marks, wherein n is less than m and is greater than or equal to 1; and training a risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, wherein the risk identification model is used for carrying out risk identification on the unmarked user relationship networks to be identified.
Fig. 1 schematically illustrates an application scenario diagram of a risk recognition model training method and a risk recognition method according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 according to this embodiment may include terminal devices 101, 102, 103. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the risk identification method and the risk identification model training method provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the risk identification device and the risk identification model training device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The risk identification method and the training method of the risk identification model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the risk identification apparatus and the risk identification model training apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The risk identification method and the training method of the risk identification model of the disclosed embodiment will be described in detail below with fig. 2 to 7 based on the scenario described in fig. 1.
FIG. 2 schematically shows a flow chart of a method of training a risk recognition model according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S203.
In operation S201, m user relationship networks are acquired, where m is a positive integer greater than 2.
According to the embodiment of the disclosure, the user relationship network comprises a plurality of nodes, and the user relationship network can represent the incidence relation among the nodes. For example, the association relationship may be information traffic and transaction traffic between the user and the user, or may be a binding relationship between the user and another platform, a binding relationship between the user's own information, and the like. The m user relationship networks represent m user relationship networks without contact relationship. It should be noted that although there is no relationship between the m user relationship networks, the m user relationship networks may have the same or similar features.
In operation S202, based on the risky user identifier, n user relationship networks in the m user relationship networks are marked to obtain n user relationship networks with marks, where n is less than m and is greater than or equal to 1.
According to the embodiment of the disclosure, each user relationship network comprises a plurality of users, and one or more users may be users with risks in the user relationship network, so that n user relationship networks comprising users with risks can be marked based on the identifiers of the users with risks. The m user relationship networks include n user relationship networks with risk markers, and m-n networks without risk markers. The marking of the n user relationship networks may be, for example, tagging the user relationship networks.
In operation S203, a risk identification model to be trained is trained by using the m user relationship networks to obtain a risk identification model, where the risk identification model is used to perform risk identification on an unmarked user relationship network to be identified.
According to the embodiment of the disclosure, a risk recognition model is trained by using a marked user relationship network and an unmarked user relationship network in m user relationship networks, the risk recognition model can recognize the input unmarked user relationship network and output whether the unmarked user relationship network has risks, and the risk recognition model is a semi-supervised machine learning model.
According to the risk identification method and system, a user relationship network is formed, a small amount of risk user label information is utilized, the user relationship network serves as a unit, a plurality of user relationship networks are input into a risk identification model, a risk user group is identified through a semi-supervised machine learning model, the performance of the risk user group can be more comprehensively depicted, the whole process of risk fund transfer is favorably tracked, and the group risk behaviors are accurately identified. Meanwhile, the unmarked user relationship network is identified, more suspicious accounts which are not discovered yet can be mined, and a comprehensive and effective risk user group detection strategy is provided.
FIG. 3A schematically illustrates a schematic diagram of a user relationship network according to an embodiment of the disclosure; fig. 3B schematically shows a schematic diagram of nodes within a user relationship network according to an embodiment of the present disclosure.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.
In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.
According to the embodiment of the present disclosure, as shown in fig. 3A, the user relationship network of the present disclosure may be a knowledge graph network, and entities in the user relationship network may be users, user mobile phone numbers, and user network addresses. The large circle represents a user node, the small circle represents a non-user node, and can be a user mobile phone number and a user network address, for example, the network address can be a mac address and an ip address. The non-user node may also be an account corresponding to the user on another platform, for example, an account of a certain payment APP, an account of a certain instant messaging APP, and an account of a certain bank APP. The nodes can be interconnected. Each user node is connected with at least one non-user node, and the non-user nodes can also be mutually connected. The edges between the nodes represent the number of accesses by the user within a preset period.
According to the embodiment of the disclosure, the target data is data generated by a user performing an operation on a payment application within a preset time period, for example, the payment application may be a mobile banking APP; the target data acquired in the preset time period comprises user associated data and user transaction data, wherein the user associated data comprises a user, a user mobile phone number, a user network address and the access times of the user in the preset time period; the user transaction data includes attribute data and transaction data of the user. Target data for a predetermined period of time, which may be a day, a month, or an hour, a login period (including background runs), may be obtained from the database.
According to the embodiment of the disclosure, the target data acquired from the database is data in a table segment form, and a target data table is obtained, and the target data table further comprises login time, user age, user assets and the like. And directly extracting the user, the user mobile phone number, the user network address and the access times of the user in a preset time period from the target data table, and forming a user relationship network by taking the user, the user mobile phone number and the user network address as nodes and taking the access times of the user in the preset time period as edges.
According to the embodiment of the disclosure, after the target data in the form of the table segment is obtained, the target data table is divided into two groups of data groups with access attributes, namely user-address and mobile phone number-address. And forming a user relationship network according to the data group and the access times of the users in a preset time period.
According to the embodiment of the disclosure, as shown in fig. 3B, for convenience of understanding of the disclosure, a mobile banking APP is taken as an example for description. Because a plurality of bank accounts can be bound with one mobile phone number, one bank account can be logged in a plurality of mobile phones or a plurality of bank accounts can be logged in a plurality of mobile phones, so that the bank accounts and the mobile phone numbers have a many-to-one relationship, the bank accounts and the network addresses have a many-to-many relationship, and the mobile phone numbers and the network addresses also have a many-to-many relationship. The method is embodied in a user relationship network as follows: one user can connect a plurality of non-user nodes, and the non-user nodes can also be connected with each other. The mobile phone mac address is uniquely given by a manufacturer, has unique specificity, is in the process of mobile phone end business, and when a bank account accesses the associated mac address, a data record of the access of the mobile phone number corresponding to the account number to the mac is bound to exist. One user node is connected with three non-user nodes, wherein the three non-user nodes are a user mobile phone number node, an IP address node and a mac address node respectively. And connecting the nodes in the user relationship network by taking the access times of the users in a preset time period as weights. For example, the user accesses the mobile banking APP 1 time through the user account within a preset time period, and the weight of the connection edge is 1.
According to the method, the knowledge graph network is formed by the user, the user mobile phone number and the network address, a robust and reliable user relation network is obtained, the group performance of a risk user group can be comprehensively depicted, the whole process of risk fund transfer can be tracked, the group risk behavior can be accurately identified, more suspicious accounts which are not found and marked are excavated, and a more comprehensive and effective risk user group detection scheme is provided for banks.
FIG. 4 schematically shows a flow chart for deriving a risk identification model according to an embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S401 to S403.
In operation S401, the user relationship network not including the user node is deleted, and the invalid data in the m user relationship networks is deleted, so as to obtain the user relationship network to be input.
According to the embodiment of the disclosure, it is desirable to construct the fund chain of the air risk process by identifying the air risk user relationship network and then identifying the relationship between the users in the air risk relationship network. The user relationship network needs to represent the relationship between users, and delete the user relationship network that does not contain user nodes. An invalid mobile phone number node and an invalid network address information node also exist in the user relationship network, and at the moment, useful data cannot be obtained between the invalid node and the user node, so that the invalid node in the user relationship network needs to be deleted; the invalid user attribute information and the invalid transaction information of the user node also need to be deleted correspondingly. After deleting the data, the user relationship network to be input can be obtained.
In operation S402, the user relationship network to be input is input into the risk identification model, and a risk identification result is output.
According to an embodiment of the present disclosure, the risk identification model is a semi-supervised machine learning model. The risk identification model can learn by using n user relationship networks with marks and m-n unmarked user relationship networks, and output a risk identification result so as to realize mining of the unmarked user relationship networks and discover more unmarked suspicious objects. For example, the risk identification model may employ a clustering algorithm.
In operation S403, when the risk identification model to be trained satisfies a preset condition, a risk identification model for performing risk identification on the unmarked user relationship network to be identified is obtained.
According to the embodiment of the disclosure, when the risk identification model to be trained meets the preset condition, the trained risk identification model is obtained, and the risk identification model can carry out risk identification on the unmarked network to be identified. The preset condition may be a preset number of training times, a preset condition for the training parameter being met, or no further change in the training parameter.
According to an embodiment of the present disclosure, the risk identification model to be trained may employ a clustering algorithm, such as kmeans, knn algorithm.
According to an embodiment of the present disclosure, a kmeans algorithm is employed for a risk recognition model to be trained. Before clustering, the number k of clusters and the initial cluster center need to be specified. The user relationship network is divided into a marked type and an unmarked type, so that k is equal to 2, the marked type can be divided into a risk type, and the unmarked type is divided into a risk-free type. The initial clustering center of the risk class is the mean value of the marked samples, and the initial clustering center of the risk-free class is the mean value of the unmarked samples. In the clustering process, directly classifying the marked samples into risk classes; for unlabeled samples, traversing the samples classifies the samples into classes that are closer in distance from the center of the cluster. And after traversing the samples, recalculating the clustering centers and dividing again. And when the clustering center is not changed, obtaining a clustering result and obtaining a risk identification model.
Fig. 5 schematically shows a flow chart for outputting a risk identification result according to an embodiment of the present disclosure.
As shown in fig. 5, the method includes operations S501 to S502.
In operation S501, a user relationship network to be input is input to the feature extraction module, and first feature data and second feature data are output.
According to an embodiment of the present disclosure, a risk recognition model to be trained includes a feature extraction module and a prediction module. And deleting the invalid data of the user relationship network to obtain the user relationship network to be input. The network structure characteristics can be obtained through the user relationship network to be input, and the comprehensive characteristics of the whole network group can be obtained by taking the network as a whole. And inputting the user relationship network to be input into the feature extraction module, and outputting feature data of two dimensions. The first characteristic data is network structure characteristic data comprising the number of nodes and the correlation degree between the nodes, and the second characteristic data is network entity characteristic data comprising user transaction data.
In operation S502, the first feature data and the second feature data are input to the prediction module, and a risk identification result is output.
According to the embodiment of the disclosure, the first characteristic data comprises the number of nodes and the association degree between the nodes, and can represent the access close association degree between the nodes in a preset time period. The second characteristic data comprises user transaction data, can characterize the characteristics of the whole user relationship network, reflects the basic attributes of the user relationship network, and the user relationship network can be used for external fund transaction conditions and internal fund transaction conditions in a preset time period. For example, the average age of all users in the user relationship network, the total amount of transactions of users in the network to external users, the total amount of transactions of users in the network, and the total amount of transactions of users in the network can be reflected. And after the first characteristic data and the second characteristic data are input into the prediction module, the prediction module outputs a risk identification result.
According to the embodiment of the disclosure, according to the risk user identifier, the risk users in the n user relationship networks are marked as risk nodes, at this time, the user relationship networks also have marks correspondingly, and the risk user identifier may include a risk user list. The risk user list comprises a marked risk user list and a marked risk user list. For example, after the risk user list is called through the external interface, the risk users are marked. And the risk users can also be marked by acquiring a stored risk user list through a local database.
Fig. 6 schematically shows a flow chart of a risk identification method according to an embodiment of the present disclosure.
As shown in fig. 6, the method includes operations S601 to S602.
In operation S601, a user relationship network is acquired.
According to the embodiment of the disclosure, taking a mobile banking APP as an example, for an unmarked user to be identified, target data of the user and target data of a user related to the user are extracted from a database to obtain a user relationship network.
In operation S602, the user relationship network is input into a risk recognition model, and a recognition result is output, where the risk recognition model is obtained by training the risk recognition model according to the training method.
According to the embodiment of the disclosure, after the user relationship network is input into the risk identification model, the identification result is output. The output identification result can be a numerical value between 0 and 1, when the identification result is close to 1, the user relationship network is considered to have fraud risk, all users in the user relationship network are risk users, and corresponding monitoring measures can be taken for the risk users to carry out management and control.
According to the method, the user relationship network is formed, a small amount of risk user label information is utilized, the user relationship network is taken as a unit, the plurality of user relationship networks are input into the risk identification model, the risk user groups are identified through the semi-supervised machine learning model, the performance of the suspected risk user groups can be more comprehensively described, the whole process of risk fund transfer tracking is facilitated, and the group risk behaviors are accurately identified. Meanwhile, the unmarked user relationship network is identified, more available accounts which are not discovered yet can be mined, and a comprehensive and effective risk user group detection strategy is provided.
Fig. 7 schematically shows a flow chart of a risk identification method according to another embodiment of the present disclosure.
As shown in fig. 7, the method includes operations S701 to S703.
In operation S701, a risk level threshold is set based on the risk recognition result.
According to the embodiment of the disclosure, for the unmarked user relationship network, after the risk identification result is obtained, a risk level threshold value can be set. For example, when the risk identification result is a probability value between 0 and 1, the risk level threshold may be set according to the probability.
In operation S702, a risk level of the user relationship network is obtained according to the risk level threshold.
According to the embodiment of the disclosure, after the risk level threshold is determined, the risk level of the user relationship network is obtained, and the corresponding user in the user relationship network also corresponds to the risk level.
In operation S703, a processing policy is set according to the risk level of the user relationship network.
According to the embodiment of the disclosure, the risk level of a user in a user relationship network is determined according to the risk level of the user relationship network. And after the risk level is obtained, setting a corresponding risk processing strategy corresponding to the risk level.
According to the embodiment of the disclosure, after the risk identification result is obtained, four risk thresholds of 0.2, 0.4, 0.6 and 0.8 are set, and a high risk level, a medium risk level, a low risk level and a low risk level are obtained according to the risk thresholds. For example, the risk identification result of the user relationship network belongs to a high risk level between 0 and 0.2, belongs to a high risk level between 0.2 and 0.4, belongs to a medium risk level between 0.4 and 0.6, belongs to a low risk level between 0.6 and 0.8, and belongs to a low risk level between 0.8 and 1. In contrast to the above setting, the low risk level, the medium risk level, the high risk level, and the high risk level may be set from 0 to 1, respectively. After the risk level of the user relationship network is determined, the users in the corresponding user relationship network also have the same risk level. According to the risk level of the user relationship network, the set processing policy may be, for example: for a user with a higher risk level and a higher risk level, when the user normally transacts with other users, the user can operate the system after clicking the confirmation operation for multiple times; for the users with medium risk level, when the users carry out normal transaction with other users, only reminding other users; and for users with low risk level and lower risk level, only marking is carried out. Five corresponding control measures can be set for the users of five levels.
After risk identification is carried out on the user relationship network, users in the user relationship network are classified into different risk levels, corresponding processing strategies are adopted, refined management and control on the users with different levels of risk can be achieved, and a bank can establish a comprehensive processing system for fraudulent behaviors.
Fig. 8 schematically shows a block diagram of a training apparatus for a risk recognition model according to an embodiment of the present disclosure.
As shown in fig. 8, the training apparatus 800 of the risk identification model of this embodiment includes a first obtaining module 801, a labeling module 802, and a training module 803.
A first obtaining module 801, configured to obtain m user relationship networks, where m is a positive integer greater than 2. In an embodiment, the first obtaining module 801 may be configured to perform operation S201 described in fig. 2.
A marking module 802, configured to mark n user relationship networks in the m user relationship networks based on the risk user identifier, to obtain n marked user relationship networks, where n is less than m, and n is greater than or equal to 1. In an embodiment, the marking module 802 may be configured to perform operation S202 described in fig. 2.
The training module 803 is configured to train a risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, where the risk identification model is configured to perform risk identification on an unmarked user relationship network to be identified. In an embodiment, the training module 803 may be configured to perform operation S203 described in fig. 2.
According to an embodiment of the present disclosure, the first obtaining module 801 includes a building unit.
The building unit is used for building a user relationship network, the m user relationship networks are knowledge graph networks, nodes of the m user relationship networks comprise users, user mobile phone numbers and user network addresses, and the edges of the m user relationship networks are the access times of the users in a preset time period; the target data comprises user associated data and user transaction data, wherein the user associated data comprises a user, a user mobile phone number, a user network address and the access times of the user in a preset time period; the target data includes data generated by a user performing an operation for the payment application within a preset period of time. In an embodiment, the construction unit may construct a user relationship network as in fig. 3A and 3B.
According to an embodiment of the present disclosure, the training module 803 further comprises a first determining unit, a second determining unit, and a third determining unit.
The first determining unit is used for deleting the user relationship network which does not contain the user node, and deleting the invalid data in the m user relationship networks to obtain the user relationship network to be input. In an embodiment, the first determining unit may be configured to perform operation S401 as described in fig. 4.
The second determining unit is used for inputting the user relationship network to be input into the risk identification model and outputting a risk identification result. In an embodiment, the second determining unit may be configured to perform operation S402 as described in fig. 4.
And the third determining unit is used for obtaining a risk identification model for carrying out risk identification on the unmarked user relationship network to be identified when the risk identification model to be trained meets the preset condition. In an embodiment, the third determining unit may be configured to perform operation S403 as described in fig. 4.
According to an embodiment of the present disclosure, the second determining unit further includes a first output subunit and a second output subunit.
The first output subunit is used for inputting the user relationship network to be input into the feature extraction module and outputting the first feature data and the second feature data. In an embodiment, the first output subunit may be configured to perform operation S501 as described in fig. 5.
And the second output subunit is used for inputting the first characteristic data and the second characteristic data into the prediction module and outputting a risk identification result. In an embodiment, the first output subunit may be configured to perform operation S501 as described in fig. 5.
According to an embodiment of the present disclosure, the marking module 802 further comprises a marking unit.
The marking unit is used for marking the risk users in the n user relationship networks as risk nodes.
Fig. 9 schematically shows a block diagram of a risk identification device according to an embodiment of the present disclosure.
As shown in fig. 9, the training apparatus 900 for a risk recognition model of this embodiment includes a second obtaining module 901 and a recognition module 902.
A second obtaining module 901, configured to obtain a user relationship network. In an embodiment, the second obtaining module 901 may be configured to perform operation S601 described in fig. 6.
And the identification module 902 is configured to input the user relationship network into a risk identification model and output an identification result, where the risk identification model is obtained by training through the risk identification model training method. In an embodiment, the identifying module 902 may be configured to perform operation S602 described in fig. 6.
According to an embodiment of the present disclosure, the risk identification device further includes a first setting module, a second setting module, and a third setting module.
And the first setting module is used for setting a risk grade threshold value based on the risk identification result. In an embodiment, the first setting module may be configured to perform operation S701 described in fig. 7.
And the second setting module is used for obtaining the risk level of the user relationship network according to the risk level threshold value. In an embodiment, the second setting module may be configured to perform operation S702 described in fig. 7.
And the third setting module is used for setting a processing strategy according to the risk level of the user relationship network. In an embodiment, the third setting module may be configured to perform operation S703 described in fig. 7.
According to the embodiment of the present disclosure, any plurality of modules among the first obtaining module 801, the labeling module 802, the training module 803, the second obtaining module 901, the identifying module 902, the first setting module, the second setting module, the third setting module, the constructing unit, the first determining unit, the second determining unit, the third determining unit, the first outputting subunit, the second outputting subunit and the labeling unit may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first obtaining module 801, the labeling module 802, the training module 803, the second obtaining module 901, the identifying module 902, the first setting module, the second setting module, the third setting module, the building unit, the first determining unit, the second determining unit, the third determining unit, the first output sub-unit, the second output sub-unit, and the labeling unit may be at least partially implemented as a hardware circuit, such as Field Programmable Gate Arrays (FPGAs), Programmable Logic Arrays (PLAs), systems on a chip, systems on a substrate, systems on a package, Application Specific Integrated Circuits (ASICs), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging circuits, or in any one of three implementations, software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first obtaining module 801, the labeling module 802, the training module 803, the second obtaining module 901, the identifying module 902, the first setting module, the second setting module, the third setting module, the constructing unit, the first determining unit, the second determining unit, the third determining unit, the first output sub-unit, the second output sub-unit and the labeling unit may be at least partially implemented as a computer program module which, when executed, may perform a corresponding function.
Fig. 10 schematically shows a block diagram of an electronic device adapted to implement a risk identification method according to an embodiment of the present disclosure.
As shown in fig. 10, an electronic device 1000 according to an embodiment of the present disclosure includes a processor 1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. Processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1001 may also include onboard memory for caching purposes. The processor 1001 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the present disclosure.
In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are stored. The processor 1001, ROM1002, and RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM1002 and/or the RAM 1003. Note that the programs may also be stored in one or more memories other than the ROM1002 and the RAM 1003. The processor 1001 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 1000 may also include an input/output (I/O) interface 1005, the input/output (I/O) interface 1005 also being connected to bus 1004, according to an embodiment of the present disclosure. Electronic device 1000 may also include one or more of the following components connected to I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM1002 and/or the RAM 1003 described above and/or one or more memories other than the ROM1002 and the RAM 1003.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the item recommendation method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 1001. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1009, and/or installed from the removable medium 1011. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program performs the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processor 1001. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (12)

1. A method of training a risk recognition model, comprising:
acquiring m user relationship networks, wherein m is a positive integer greater than 2;
based on the risk user identification, marking n user relationship networks in the m user relationship networks to obtain n user relationship networks with marks, wherein n is less than m and is greater than or equal to 1; and
and training the risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, wherein the risk identification model is used for carrying out risk identification on the unmarked user relationship networks to be identified.
2. The method of claim 1, wherein the m user relationship networks comprise:
the m user relationship networks are knowledge graph networks, nodes of the m user relationship networks comprise users, user mobile phone numbers and user network addresses, and the edges of the m user relationship networks are the access times of the users in a preset time period;
the target data comprises user associated data and user transaction data, wherein the user associated data comprises the user, the user mobile phone number, the user network address and the access times of the user in the preset time period;
the target data comprises data generated by a user performing an operation on the payment application within a preset time period.
3. The method of claim 2, wherein training the risk recognition model to be trained using the m user relationship networks, resulting in a risk recognition model comprises:
deleting the user relationship network which does not contain the user node, and deleting the invalid data in the m user relationship networks to obtain the user relationship network to be input;
inputting the user relationship network to be input into the risk identification model, and outputting a risk identification result; and
and when the risk identification model to be trained meets the preset conditions, obtaining a risk identification model for carrying out risk identification on the unmarked user relationship network to be identified.
4. The method of claim 3, wherein the risk recognition model to be trained comprises a feature extraction module and a prediction module, the inputting the user relationship network to be input into the risk recognition model, and the outputting the risk recognition result further comprises:
inputting the user relationship network to be input into the feature extraction module, and outputting first feature data and second feature data;
inputting the first characteristic data and the second characteristic data into the prediction module, and outputting a risk identification result;
the first characteristic data is network structure characteristic data comprising the number of nodes and the association degree between the nodes, and the second characteristic data is network entity characteristic data comprising the user transaction data.
5. The method of claim 1, wherein tagging n of the m user relationship networks based on a risky user identification comprises: and marking the risk users in the n user relationship networks as risk nodes.
6. A risk identification method, comprising:
acquiring a user relationship network;
inputting the user relationship network into a risk recognition model and outputting a recognition result, wherein the risk recognition model is obtained by training through a training method of the risk recognition model according to any one of claims 1 to 5.
7. The method of claim 6, wherein the method further comprises:
setting a risk level threshold value based on the risk identification result;
obtaining the risk level of the user relationship network according to the risk level threshold;
and setting a processing strategy according to the risk level of the user relationship network.
8. A training apparatus for a risk recognition model, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring m user relationship networks, and m is a positive integer greater than 2;
a marking module, configured to mark n user relationship networks in the m user relationship networks based on the risk user identifier to obtain n marked user relationship networks, where n is less than m and is greater than or equal to 1; and
and the training module is used for training the risk identification model to be trained by using the m user relationship networks to obtain a risk identification model, wherein the risk identification model is used for carrying out risk identification on the unmarked user relationship network to be identified.
9. A risk identification device comprising:
the second acquisition module is used for acquiring the user relationship network;
and the recognition module is used for inputting the user relationship network into a risk recognition model and outputting a recognition result, wherein the risk recognition model is obtained by training through the training method of the risk recognition model according to any one of claims 1 to 5.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.
12. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7.
CN202111520883.5A 2021-12-13 2021-12-13 Training method, risk identification device, computer equipment and medium Pending CN114169897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111520883.5A CN114169897A (en) 2021-12-13 2021-12-13 Training method, risk identification device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111520883.5A CN114169897A (en) 2021-12-13 2021-12-13 Training method, risk identification device, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN114169897A true CN114169897A (en) 2022-03-11

Family

ID=80486081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111520883.5A Pending CN114169897A (en) 2021-12-13 2021-12-13 Training method, risk identification device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN114169897A (en)

Similar Documents

Publication Publication Date Title
US20200389495A1 (en) Secure policy-controlled processing and auditing on regulated data sets
US10438297B2 (en) Anti-money laundering platform for mining and analyzing data to identify money launderers
US10715550B2 (en) Method and device for application information risk management
US10970188B1 (en) System for improving cybersecurity and a method therefor
CN111107048B (en) Phishing website detection method and device and storage medium
AU2022204452B2 (en) Verification of electronic identity components
US11411973B2 (en) Identifying security risks using distributions of characteristic features extracted from a plurality of events
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
US11593811B2 (en) Fraud detection based on community change analysis using a machine learning model
US11574360B2 (en) Fraud detection based on community change analysis
KR102144126B1 (en) Apparatus and method for providing information for enterprise
US20220417221A1 (en) Digital identity network alerts
WO2016188334A1 (en) Method and device for processing application access data
US11093528B2 (en) Automated data supplementation and verification
CN114358147A (en) Training method, identification method, device and equipment of abnormal account identification model
CN114462532A (en) Model training method, device, equipment and medium for predicting transaction risk
CN111383072A (en) User credit scoring method, storage medium and server
Wass et al. Prediction of cyber attacks during coronavirus pandemic by classification techniques and open source intelligence
CN113128773B (en) Training method of address prediction model, address prediction method and device
US20220253509A1 (en) Network-based customized browsing notifications
CN114169897A (en) Training method, risk identification device, computer equipment and medium
CN113159937A (en) Method and device for identifying risks and electronic equipment
KR102471731B1 (en) A method of managing network security for users
KR102416805B1 (en) Apparatus and method for scrapping a data
US20240111892A1 (en) Systems and methods for facilitating on-demand artificial intelligence models for sanitizing sensitive data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination