CN114677202A - Type identification method, training method and device, electronic equipment and storage medium - Google Patents

Type identification method, training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114677202A
CN114677202A CN202210455575.7A CN202210455575A CN114677202A CN 114677202 A CN114677202 A CN 114677202A CN 202210455575 A CN202210455575 A CN 202210455575A CN 114677202 A CN114677202 A CN 114677202A
Authority
CN
China
Prior art keywords
node
user
identified
sub
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210455575.7A
Other languages
Chinese (zh)
Inventor
訾晨杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210455575.7A priority Critical patent/CN114677202A/en
Publication of CN114677202A publication Critical patent/CN114677202A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a type identification method, a training method and device, an electronic device and a storage medium, which can be applied to the technical field of artificial intelligence and can also be applied to the technical field of finance. The type identification method comprises the following steps: after obtaining the authorization of the user to be identified on the personal information, obtaining the identity characteristic information of the user to be identified and the behavior characteristic information of the user to be identified aiming at the target account; inputting identity characteristic information and behavior characteristic information of a user to be identified into a trained decision tree; the decision tree comprises a root node and at least one layer of sub node layers, wherein each layer of sub node layer comprises at least one sub node, the node type of each sub node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub node layer through a decision condition, and the decision condition is constructed according to identity characteristic information and behavior characteristic information; and outputting a recognition result for representing the type of the user to be recognized through the decision tree.

Description

Type identification method, training method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a type identification method and apparatus, a training method and apparatus, an electronic device, a storage medium, and a program product.
Background
When a financial institution faces an abnormal transaction, users participating in the abnormal transaction need to be screened to determine whether the users belong to suspicious users of the abnormal transaction.
In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: the traditional identification technology usually performs identification and judgment only according to the transaction behaviors of the user, but in recent years, along with the increase of complexity of customer behaviors and the increase of abnormal transaction channels, the accuracy of the method for identifying the user is lower and lower, and a suspicious user cannot be identified.
Disclosure of Invention
In view of the above, the present disclosure provides a type identification method and apparatus, a training method and apparatus, an electronic device, a storage medium, and a program product.
In one aspect of the present disclosure, there is provided a type identifying method including:
after obtaining the authorization of the user to be identified on the personal information, obtaining the identity characteristic information of the user to be identified and the behavior characteristic information of the user to be identified aiming at the target account;
inputting identity characteristic information and behavior characteristic information of a user to be identified into a trained decision tree; the decision tree comprises a root node and at least one layer of sub node layers, wherein each layer of sub node layer comprises at least one sub node, the node type of each sub node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub node layer through a decision condition, and the decision condition is constructed according to identity characteristic information and behavior characteristic information;
and outputting a recognition result for representing the type of the user to be recognized through the decision tree.
According to an embodiment of the present disclosure, wherein:
the identification result comprises that the user to be identified is a first class user or a second class user;
the identity characteristic information comprises age and the place of the household, and the behavior characteristic information comprises the number of account opening or account canceling aiming at the target account and the number of internet bank login aiming at the target account;
the decision condition is constructed according to the age, the place of the household register, the number of times of opening or canceling the account and the number of times of logging in the internet bank.
According to an embodiment of the present disclosure, wherein:
the at least one layer of child node layer comprises a first layer of child node layer, and the first layer of child node layer comprises a first child node with a node type being a leaf node, a second child node with a node type being a non-leaf node and a third child node;
the decision condition comprises a first decision condition, wherein the first decision condition is constructed according to age, and the first decision condition comprises:
under the condition that the age is larger than a preset upper age limit, the user to be identified is associated with a first child node, and the first child node is used for representing that the user to be identified is a first class user;
under the condition that the age is greater than or equal to the preset lower age limit and less than or equal to the preset upper age limit, the user to be identified is associated with the second child node;
and in the case that the age is smaller than the preset lower age limit, associating the user to be identified with the third child node.
According to an embodiment of the present disclosure, wherein:
the at least one layer of sub-node layer comprises a second layer of sub-node layer, and the second layer of sub-node layer comprises a fourth sub-node with a node type being a leaf node, a fifth sub-node with a node type being a non-leaf node, a sixth sub-node with a node type being a leaf node and a seventh sub-node;
the decision conditions comprise a second decision condition and a third decision condition, wherein the second decision condition is constructed according to the household location, the third decision condition is constructed according to the number of account opening or account selling times, and the third decision condition is constructed according to the number of account opening or account selling times
The second decision condition includes:
under the condition that the household registration location is not the target region, the user to be identified is associated with a fourth sub-node, wherein the fourth sub-node is used for representing that the user to be identified is a second type of user;
under the condition that the household registration location is the target region, associating the user to be identified with the fifth child node;
the third decision condition includes:
under the condition that the number of account opening or account canceling times for the target account is smaller than a first preset number threshold, the user to be identified is associated with a sixth child node, and the sixth child node is used for representing that the user to be identified is a second class user;
and under the condition that the number of account opening or account canceling times for the target account is greater than or equal to a first preset number threshold, associating the user to be identified with a seventh child node, wherein the seventh child node is used for representing that the user to be identified is the first-class user.
According to an embodiment of the present disclosure, wherein:
the at least one sub-node layer comprises a third sub-node layer, and the third sub-node layer comprises an eighth sub-node and a ninth sub-node, wherein the node types of the eighth sub-node and the ninth sub-node are leaf nodes;
the decision conditions comprise a fourth decision condition, wherein the fourth decision condition is constructed according to the number of the internet bank logins, and the fourth decision condition comprises the following steps:
under the condition that the number of internet bank login times for the target account is smaller than a second preset number threshold, the user to be identified is associated with an eighth child node, and the eighth child node is used for representing that the user to be identified is a second type of user;
and under the condition that the number of internet banking login times aiming at the target account is larger than or equal to a second preset number threshold, associating the user to be identified with a ninth child node, wherein the ninth child node is used for representing that the user to be identified is a first-class user.
According to an embodiment of the present disclosure, further comprising:
and generating early warning information containing the identification result under the condition that the identification result is that the user to be identified is the first type of user.
According to an embodiment of the present disclosure, further comprising:
and sending the early warning information to a service identification system, wherein the service identification system is used for determining whether the user to be identified is the target type user according to the identification result.
Another aspect of the present disclosure provides a method of training a decision tree for identifying a user type, comprising:
after obtaining the authorization of the multiple users to the respective personal information, obtaining the identity characteristic information of the multiple users and the behavior characteristic information of the multiple users aiming at the respective target accounts;
determining a training sample set according to the identity characteristic information of a plurality of users and the behavior characteristic information of the plurality of users aiming at respective target accounts;
constructing a decision tree framework model, wherein the decision tree framework model comprises a plurality of framework nodes, each framework node comprises preset node information, and the preset node information comprises preset node type information of each framework node, preset position information of each framework node in the decision tree framework model respectively, and preset characteristic type information for segmenting each framework node;
and training a decision tree skeleton model by using a preset algorithm and a training sample set to obtain a trained decision tree, wherein the trained decision tree is used for executing the type recognition method.
According to an embodiment of the present disclosure, wherein:
the preset node information is generated according to the user type results of a plurality of users.
According to the embodiment of the present disclosure, the training of the decision tree skeleton model by using the preset algorithm and the training sample set to obtain the trained decision tree includes:
determining a decision value of a preset feature for segmenting each skeleton node by using a preset algorithm and a training sample set;
and combining the decision value of the preset characteristics for segmenting each skeleton node with a decision tree skeleton model to obtain a trained decision tree.
According to an embodiment of the present disclosure, determining a training sample set according to identity feature information of a plurality of users and behavior feature information of the plurality of users for respective target accounts includes:
and carrying out data preprocessing on the identity characteristic information of a plurality of users and the behavior characteristic information of the plurality of users aiming at respective target accounts respectively to obtain a training sample set.
Another aspect of the present disclosure provides a type identification apparatus including a first obtaining module, an input module, and an output module.
The first acquisition module is used for acquiring the identity characteristic information of the user to be identified and the behavior characteristic information of the user to be identified aiming at the target account after the authorization of the user to be identified for the personal information is obtained;
the input module is used for inputting the identity characteristic information and the behavior characteristic information of the user to be identified into the trained decision tree; the decision tree comprises a root node and at least one layer of sub node layers, wherein each layer of sub node layer comprises at least one sub node, the node type of each sub node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub node layer through a decision condition, and the decision condition is constructed according to identity characteristic information and behavior characteristic information;
and the output module is used for outputting the identification result for representing the type of the user to be identified through the decision tree.
According to the embodiment of the disclosure, the identification result includes that the user to be identified is a first type user or a second type user; the identity characteristic information comprises age and the place of the household register, and the behavior characteristic information comprises the number of account opening or account canceling aiming at the target account and the number of internet bank login aiming at the target account; the decision condition is constructed according to the age, the place of the household register, the number of times of opening or canceling the account and the number of times of logging in the internet bank.
According to an embodiment of the present disclosure, wherein:
the at least one layer of child node layer comprises a first layer of child node layer, and the first layer of child node layer comprises a first child node with a node type being a leaf node, a second child node with a node type being a non-leaf node and a third child node;
the decision condition comprises a first decision condition, wherein the first decision condition is constructed according to age, and the first decision condition comprises:
under the condition that the age is larger than a preset upper age limit, the user to be identified is associated with a first child node, and the first child node is used for representing that the user to be identified is a first class user;
under the condition that the age is greater than or equal to the preset lower age limit and less than or equal to the preset upper age limit, the user to be identified is associated with the second child node;
and in the case that the age is smaller than the preset lower age limit, associating the user to be identified with the third child node.
According to an embodiment of the present disclosure, wherein:
the at least one layer of child node layer comprises a second layer of child node layer, wherein the second layer of child node layer comprises a fourth child node with a node type being a leaf node, a fifth child node with a node type being a non-leaf node, a sixth child node with a node type being a leaf node and a seventh child node;
the decision conditions comprise a second decision condition and a third decision condition, wherein the second decision condition is constructed according to the household location, the third decision condition is constructed according to the number of account opening or account selling, and the third decision condition is constructed according to the number of account opening or account selling
The second decision condition includes:
under the condition that the household registration location is not the target region, the user to be identified is associated with a fourth sub-node, wherein the fourth sub-node is used for representing that the user to be identified is a second type of user;
under the condition that the household registration location is the target region, associating the user to be identified with the fifth child node;
the third decision condition includes:
under the condition that the number of account opening or account canceling times for the target account is smaller than a first preset number threshold, the user to be identified is associated with a sixth child node, and the sixth child node is used for representing that the user to be identified is a second-class user;
and under the condition that the number of account opening or account canceling times for the target account is greater than or equal to a first preset number threshold, associating the user to be identified with a seventh child node, wherein the seventh child node is used for representing that the user to be identified is the first-class user.
According to an embodiment of the present disclosure, wherein:
the at least one sub-node layer comprises a third sub-node layer, and the third sub-node layer comprises an eighth sub-node and a ninth sub-node, wherein the node types of the eighth sub-node and the ninth sub-node are leaf nodes;
the decision conditions comprise a fourth decision condition, wherein the fourth decision condition is constructed according to the number of the internet bank logins, and the fourth decision condition comprises the following steps:
under the condition that the number of internet bank login times for the target account is smaller than a second preset number threshold, the user to be identified is associated with an eighth child node, and the eighth child node is used for representing that the user to be identified is a second type of user;
and under the condition that the number of internet bank login times for the target account is greater than or equal to a second preset number threshold, associating the user to be identified with a ninth child node, wherein the ninth child node is used for representing that the user to be identified is the first type user.
According to the embodiment of the disclosure, the device further comprises a generating module, configured to generate the warning information including the identification result when the identification result is that the user to be identified is the first type of user.
According to the embodiment of the disclosure, the device further comprises a sending module, which is used for sending the early warning information to a service identification system, wherein the service identification system is used for determining whether the user to be identified is the target type user according to the identification result.
Another aspect of the present disclosure provides an apparatus for training a decision tree for identifying a user type, including a second obtaining module, a determining module, a constructing module, and a training module.
The second obtaining module is used for obtaining the identity characteristic information of the users and the behavior characteristic information of the users aiming at the target accounts respectively after obtaining the authorization of the users to the personal information respectively;
the determining module is used for determining a training sample set according to the identity characteristic information of a plurality of users and the behavior characteristic information of the plurality of users aiming at respective target accounts;
the decision tree framework model comprises a construction module and a segmentation module, wherein the construction module is used for constructing the decision tree framework model, the decision tree framework model comprises a plurality of framework nodes, each framework node comprises preset node information, and the preset node information comprises preset node type information of each framework node, preset position information of each framework node in the decision tree framework model respectively and preset characteristic type information for segmenting each framework node;
and the training module is used for training the decision tree framework model by utilizing a preset algorithm and a training sample set to obtain a trained decision tree, wherein the trained decision tree is used for executing the type recognition method.
According to the embodiment of the disclosure, the preset node information is generated according to the user type results of a plurality of users.
According to an embodiment of the present disclosure, wherein the training module includes a determination unit and a combination unit.
The determining unit is used for determining a decision value of a preset feature for segmenting each skeleton node by using a preset algorithm and a training sample set;
and the combination unit is used for combining the decision value of the preset characteristics for segmenting each skeleton node with the decision tree skeleton model to obtain the trained decision tree.
According to the embodiment of the disclosure, the determining module includes a preprocessing unit, which is used for preprocessing the identity characteristic information of the users and the behavior characteristic information of the users aiming at the respective target accounts respectively to obtain the training sample set.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described type identification method.
Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described type identification method.
Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described type identification method.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of a type recognition method, apparatus, device, medium and program product according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a type identification method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a structural diagram of a decision tree according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram of a method of training a decision tree for identifying a user type in accordance with an embodiment of the present disclosure;
fig. 5 schematically shows a block diagram of the structure of a type recognition apparatus according to an embodiment of the present disclosure;
FIG. 6 is a block diagram schematically illustrating an apparatus for training a decision tree for identifying a user type according to an embodiment of the present disclosure; and
fig. 7 schematically shows a block diagram of an electronic device adapted to implement a type recognition method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
When a financial institution faces an abnormal transaction, users participating in the abnormal transaction need to be screened to determine whether the users belong to suspicious users of the abnormal transaction.
The traditional identification technology generally carries out identification and judgment only according to the transaction behavior of a user, screens the transactions which accord with a hit suspicious model according to the transaction characteristics of a certain client, carries out early warning according to the client, and provides early warning information for the client for screening. The method ignores the identity characteristics of the client, such as age, gender and region, and also ignores the behavior characteristics of the client, such as abnormal transaction behaviors of frequent account invocations and the like. Therefore, in recent years, as the complexity of customer behaviors increases and abnormal transaction channels increase, the accuracy of user identification by the method is lower and lower, and suspicious users cannot be identified.
In view of this, an embodiment of the present disclosure provides a type identification method, including:
after obtaining the authorization of the user to be identified on the personal information, obtaining the identity characteristic information of the user to be identified and the behavior characteristic information of the user to be identified aiming at the target account;
inputting identity characteristic information and behavior characteristic information of a user to be identified into a trained decision tree; the decision tree comprises a root node and at least one layer of sub node layers, wherein each layer of sub node layer comprises at least one sub node, the node type of each sub node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub node layer through a decision condition, and the decision condition is constructed according to identity characteristic information and behavior characteristic information;
and outputting a recognition result for representing the type of the user to be recognized through the decision tree.
Fig. 1 schematically illustrates an application scenario diagram of a type recognition method, apparatus, device, medium, and program product according to embodiments of the present disclosure.
As shown in fig. 1, the application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
According to the embodiment of the present disclosure, a user may use the terminal devices 101, 102, 103 to initiate an identification result for obtaining type identification of a user to be identified to the server 105 through the network 104. In response to a user request, server 105 may perform the type identification methods of the disclosed embodiments, such as: firstly, after obtaining the authorization of the user to be identified for the personal information, obtaining the identity characteristic information (such as age and place of membership) of the user to be identified and the behavior characteristic information (such as number of account opening and account closing times and number of online banking and login times) of the user to be identified for a target account, then inputting the identity characteristic information and the behavior characteristic information of the user to be identified into a trained decision tree, and then outputting an identification result for representing the type of the user to be identified through the decision tree (for example, the user to be identified is a suspicious user or an unsuspectable user and is used for screening whether the user to be identified possibly has abnormal transaction behaviors).
It should be noted that the type identification method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the type recognition apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The type identification method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the type identification apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be noted that the type identification method, the training method and the device disclosed by the present disclosure may be used in the technical field of artificial intelligence, may also be used in the technical field of finance, and may also be used in any fields other than the technical field of artificial intelligence and the technical field of finance.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.
In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.
The embodiments of the present disclosure will be described in detail below with reference to fig. 2 to 7 based on the scenario described in fig. 1.
Fig. 2 schematically shows a flow chart of a type identification method according to an embodiment of the present disclosure.
As shown in fig. 2, the type recognition method of this embodiment includes operations S201 to S203.
In operation S210, after obtaining the authorization of the to-be-identified user for the personal information, obtaining the identity characteristic information of the to-be-identified user and the behavior characteristic information of the to-be-identified user for the target account;
in operation S220, the identity characteristic information and the behavior characteristic information of the user to be identified are input into the trained decision tree; the decision tree comprises a root node and at least one layer of sub node layers, wherein each layer of sub node layer comprises at least one sub node, the node type of each sub node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub node layer through a decision condition, and the decision condition is constructed according to identity characteristic information and behavior characteristic information;
in operation S230, a recognition result for characterizing the type of the user to be recognized is output through the decision tree.
According to the embodiments of the present disclosure, before acquiring personal information of a user, consent or authorization of the user may be obtained. For example, in operation S210, a request for obtaining user information may be issued to the user, and the personal information of the user may be obtained in case the user agrees or authorizes that the user information may be obtained. In an application scenario of the present disclosure, the personal information of the user to be identified may include identity characteristic information and behavior characteristic information of the user to be identified for the target account, where the identity characteristic information may include, but is not limited to, name, age, nationality, place of residence, occupation, registered fund of company (in case that the user to be identified is a public customer), and the like; the behavior characteristic information may include, but is not limited to, the number of times of opening and closing accounts of the user (which may be statistically obtained according to opening time and closing time of the user in a financial institution for different accounts), the number of times of online banking login of the user to a target account (which may be any locked account), for example, the number of times of online banking login of a user at multiple IP addresses using the user's account number is 10.
According to the embodiment of the disclosure, the decision tree for executing the type of recognition method is obtained by training according to the identity characteristic information and the behavior characteristic information of the user, and therefore, the decision condition of the decision tree is constructed according to the identity characteristic information and the behavior characteristic information. In the decision tree, a root node is associated with at least one child node level by a decision condition. When data in each node to be classified (including a root node and a child node of a non-leaf node class) is classified, the data in the node is classified according to decision conditions set for each node to be classified, and after the data in the node of the current layer is classified, the data is divided into child nodes which are associated with the node to be classified in the next layer.
According to the embodiment of the disclosure, different attributes are selected as nodes in a decision tree according to a certain rule to construct the relationship between the attributes and the categories, the attribute type relationship tree is constructed in an attribute mode with information gain by adopting a top-down recursion mode, leaf nodes of the tree are all the categories, non-leaf nodes are the attributes, and connecting lines between the nodes are different value ranges of the node attributes. After the decision tree is constructed, the attribute values of the instances needing to be subjected to category labeling are compared from top to bottom from the root node of the decision tree, and finally a certain leaf node is reached, wherein the category corresponding to the leaf node is the category of the instance.
According to an embodiment of the present disclosure, the foregoing scenario is, for example: the decision condition for a certain node to be classified may be that, when the age is greater than a preset upper age limit (or the number of account opening or account canceling times for a target account is greater than or equal to a first preset number threshold), the user to be identified is classified as a suspicious user, and the suspicious user is classified into a sub-node of a certain leaf type in the next layer, where the sub-node of the leaf type is used to represent a classification result of the user to be identified: a suspicious user. In the case that the age is smaller than the preset lower age limit (or the number of times of opening or canceling the account for the target account is smaller than the first preset number threshold), the type of the user to be identified cannot be determined, and the user is divided into the sub-nodes of a certain non-leaf type in the next layer, so that the data in the node is further divided … … into the nodes in the next layer through the decision conditions set for the sub-nodes of the non-leaf type.
According to the embodiment of the disclosure, after the identity characteristic information and the behavior characteristic information of the user to be identified are input into the trained decision tree, the identification result used for representing the type of the user to be identified can be output through the decision tree, the type of the user to be identified can be a suspicious user or a non-suspicious user, for example, and the identification result can be used for screening whether the user to be identified possibly has abnormal transaction behaviors.
According to the embodiment of the disclosure, because the decision condition is constructed according to the identity characteristic information and the behavior characteristic information in the decision tree, the classification result of the user can be obtained by utilizing the trained decision tree and aiming at the behavior characteristic information of the target account according to the identity characteristic information of the user.
According to an embodiment of the present disclosure, in the above method, the recognition result output by the decision tree and used for characterizing the type of the user to be recognized may include that the user to be recognized is a first type user or a second type user; the first class of users and the second class of users are different user types, for example, the first class of users may be suspicious users (for characterizing that the user to be identified may have abnormal transaction behaviors), and the second class of users may be non-suspicious users (for characterizing that the user to be identified may not have abnormal transaction behaviors).
According to the embodiment of the disclosure, the identity characteristic information of the user to be identified may specifically include age and place of residence, and the behavior characteristic information may specifically include the number of account opening or account canceling times for the target account and the number of internet banking login times for the target account. Furthermore, in the decision tree, the decision condition can be constructed according to the age, the location of the household register, the number of opening or canceling households and the number of logging in the online bank.
According to an embodiment of the present disclosure, the characteristic information: age, household registration location, account opening or account selling times and online banking login times are features with strong relevance to user classification results, decision conditions are constructed by adopting the features, and when user types are identified by utilizing a decision tree containing the decision conditions, the accuracy of prediction can be further improved, and the decision tree is simple in structure, has strong purpose and directivity and is high in prediction speed.
The embodiment of the disclosure also provides a decision tree structure suitable for user type identification in abnormal transaction scenes.
Fig. 3 schematically shows a structural diagram of a decision tree according to an embodiment of the present disclosure. The structure of the decision tree of the embodiment of the present disclosure is described in detail below with reference to fig. 3.
As shown in FIG. 3, the decision tree includes a root node and at least one level of child nodes, which may include three levels of child nodes, for example. Each layer of sub-node layer comprises at least one sub-node, the node type of the sub-node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub-node layer through decision conditions, and the decision conditions are constructed according to the age, the place of household registration, the number of account opening or account cancellation and the number of internet banking login.
According to an embodiment of the present disclosure, in the decision tree, the child node layers include a first-layer child node layer, and the first-layer child node layer includes a first child node whose node type is a leaf node, a second child node whose node type is a non-leaf node, and a third child node.
The decision conditions include a first decision condition, wherein the first decision condition is constructed according to age, and the first decision condition is used for associating the root node with the first child node, the second child node and the third child node.
Specifically, the first decision condition includes: under the condition that the age is larger than the preset upper age limit, the user to be identified is associated with a first child node, the first child node is used for representing that the user to be identified is a first-class user, under the condition, the user to be identified is classified into a first-class user (suspicious user) and is divided into the first child node, and the child node of the leaf type is used for representing the classification result of the user to be identified: a suspicious user.
The first decision condition further comprises: under the condition that the age is greater than or equal to the preset lower age limit and less than or equal to the preset upper age limit, the user to be identified is associated with the second child node; and in the case that the age is smaller than the preset lower age limit, associating the user to be identified with the third child node. In this case, the classification result of the user to be identified cannot be obtained, and the user needs to be further judged by the next decision condition, and then the classification result is divided into the second child node or the third child node of the non-leaf type according to the numerical range of the age, so that the data in the node is further divided into the nodes of the next layer by the decision condition set for the child node of the non-leaf type.
According to an embodiment of the present disclosure, in the decision tree, the child node layers include a second layer of child node layers, and the second layer of child node layers includes a fourth child node whose node type is a leaf node, a fifth child node whose node type is a non-leaf node, a sixth child node whose node type is a leaf node, and a seventh child node.
The decision conditions comprise a second decision condition and a third decision condition, wherein the second decision condition is constructed according to the place of the household register, the third decision condition is constructed according to the number of times of opening or canceling the household, and the second decision condition is used for associating the second child node with the fourth child node and the fifth child node; the third decision condition is for associating the third child node with the sixth child node and the seventh child node.
The second decision condition includes: under the condition that the household registration location is not the target region, the user to be identified is associated with a fourth sub-node, wherein the fourth sub-node is used for representing that the user to be identified is a second type of user; in this case, the user to be identified is classified into a second class of users (non-suspicious users), and the second class of users is divided into fourth child nodes, where the child nodes of the leaf type are used to represent classification results of the user to be identified: and (4) non-suspicious users. Under the condition that the household registration location is the target region, associating the user to be identified with the fifth child node; in this case, the classification result of the user to be identified cannot be obtained, and if a next decision condition is needed to be further determined, the classification result is divided into the fifth child node of the non-leaf type, so that the data in the node is further divided into the next layer of nodes according to the decision condition set for the child node of the non-leaf type.
The third decision condition may include: under the condition that the number of account opening or account canceling times for the target account is smaller than a first preset number threshold, the user to be identified is associated with a sixth child node, and the sixth child node is used for representing that the user to be identified is a second-class user; and under the condition that the number of account opening or account canceling times for the target account is greater than or equal to a first preset number threshold, associating the user to be identified with a seventh child node, wherein the seventh child node is used for representing that the user to be identified is the first-class user. In this case, the user to be identified is classified into a first class of user (suspicious user) or a second class of user (non-suspicious user) according to the number of account opening and account cancellation times, and is classified into one of two child nodes (leaf node types) representing different classification results in the next layer.
According to an embodiment of the present disclosure, in the decision tree, the at least one sub-node layer includes a third sub-node layer, and the third sub-node layer includes an eighth sub-node and a ninth sub-node, which have node types of leaf nodes.
The decision conditions comprise a fourth decision condition, wherein the fourth decision condition is constructed according to the number of the internet bank login times, and the fourth decision condition is used for associating the fifth child node with the eighth child node and the ninth child node.
The fourth decision condition may specifically include: under the condition that the number of internet bank login times for the target account is smaller than a second preset number threshold, the user to be identified is associated with an eighth child node, and the eighth child node is used for representing that the user to be identified is a second type of user; and under the condition that the number of internet banking login times aiming at the target account is larger than or equal to a second preset number threshold, associating the user to be identified with a ninth child node, wherein the ninth child node is used for representing that the user to be identified is a first-class user. In this case, the user to be identified is classified into a first class of user (suspicious user) or a second class of user (non-suspicious user) according to the number of internet banking logins, and is classified into one of two child nodes (leaf node types) representing different classification results in the next layer.
According to the embodiment of the disclosure, in the decision tree with the above structure, the decision conditions are constructed according to the age, the location of the household register, the number of times of opening or closing an account, and the number of times of logging in the online bank, and these features are features with strong relevance to the classification result of the user. In the decision tree, the age is used as an initial classification characteristic, the household location, the number of times of opening or canceling an account and the number of times of logging in the internet bank are used as further classification characteristics, the decision tree constructed by the method has strong result directivity, classification results can be obtained quickly, and the decision tree with the structure has a good recognition effect in user type recognition under an abnormal transaction scene.
According to the embodiment of the disclosure, after the recognition result for representing the type of the user to be recognized is output through the decision tree, the following steps can be executed: if the identification result is that the user to be identified is a first-class user (a suspicious user with abnormal transaction behavior), generating early warning information containing the identification result, for example, combining the identification result of the user with the identity characteristic information and the behavior characteristic information of the user to generate early warning information so as to send the early warning information to a service system for screening.
According to the embodiment of the disclosure, the early warning information is generated, so that the purpose of early warning in time can be achieved, and users with problems can be found in time conveniently.
According to the embodiment of the disclosure, after the early warning information containing the identification result is generated, the early warning information can be sent to a service identification system, wherein the service identification system is used for determining whether the user to be identified is a target type user (a user who does not normally have a transaction) or not according to the identification result.
According to the embodiment of the disclosure, since the recognition result output through the decision tree is only the result predicted by the model, and only represents a possibility that the user has the abnormal transaction behavior, it cannot be completely determined that the user really has the abnormal transaction behavior, and further confirmation is required. Therefore, the early warning information is sent to the service identification system for further discrimination, abnormal transaction users can be found in time, and the situation of misidentification is avoided.
Another aspect of the present disclosure provides a method for training the decision tree for identifying a user type, and fig. 4 schematically illustrates a flowchart of a method for training a decision tree for identifying a user type according to an embodiment of the present disclosure.
As shown in fig. 4, the training method of this embodiment includes operations S401 to S404.
In operation S401, after obtaining the authorization of the respective personal information by the multiple users, obtaining the identity characteristic information of the multiple users and the behavior characteristic information of the multiple users for the respective target accounts;
in operation S402, determining a training sample set according to the identity feature information of the plurality of users and the behavior feature information of the plurality of users for respective target accounts;
in operation S403, constructing a decision tree skeleton model, where the decision tree skeleton model includes a plurality of skeleton nodes, where each skeleton node includes preset node information, where the preset node information includes preset node type information of each skeleton node, preset position information of each skeleton node in the decision tree skeleton model, and preset feature type information for segmenting each skeleton node;
in operation S404, a decision tree skeleton model is trained using a preset algorithm and a training sample set to obtain a trained decision tree, where the trained decision tree is used to execute the type recognition method.
According to the embodiment of the present disclosure, in operation S401, the personal information of the plurality of users is obtained, and the authorization of the user needs to be obtained, for example, a request for obtaining the user information is sent to the user before obtaining the personal information, and the personal information of the plurality of users is obtained when the plurality of users agree or authorize to obtain the user information. In the application scenario of the present disclosure, the personal information of the user may include identity characteristic information and behavior characteristic information, where the identity characteristic information may include, but is not limited to, name, age, nationality, place of residence, occupation, registered funds of a company (in case that the user to be identified is a public client), and the like; the behavior characteristic information may include, but is not limited to, the number of times of opening and closing the account of the user (which may be statistically obtained according to the opening time and closing time of the user at a certain financial institution for different accounts), the number of internet banking login times of the user to a certain target account (which may be any locked account), and the like.
According to the embodiment of the disclosure, the training sample set may be formed by directly combining the acquired identity characteristic information of a plurality of users and behavior characteristic information of the plurality of users for respective target accounts, or may be obtained by preprocessing the data after acquiring the information data.
According to the embodiment of the present disclosure, the decision tree skeleton model is constructed in operation S403, which may be constructed according to node information of preset skeleton nodes, for example, according to preset node type information (root node, leaf node, non-leaf node) of each skeleton node, preset position information (which nodes in the several layers and nodes associated with the upper and lower layers) of each skeleton node in the decision tree skeleton model, and preset feature type information (what feature each node performs classification) for segmenting each skeleton node, the skeleton model is constructed in advance.
The decision tree framework model may refer to the decision tree framework shown in fig. 3, for example, the decision tree framework model includes a framework root node and 3 layers of framework sub-node layers, the first layer of framework node layer includes a first framework sub-node whose node type is a leaf node, a second framework sub-node whose node type is a non-leaf node, and a third framework sub-node, the framework root node is associated with the first framework sub-node, the second framework sub-node, and the third framework sub-node, the characteristic age characteristic of classification is performed on the root node, the characteristic of classification is performed on the second framework sub-node is a place of residence, and the characteristic of classification is performed on the third framework sub-node is an opening or closing number of a target account.
Further, the second layer of skeleton sub-node layer includes a fourth skeleton sub-node with a node type of a leaf node, a fifth skeleton sub-node with a node type of a non-leaf node, a sixth skeleton sub-node with a node type of a leaf node, and a seventh skeleton sub-node. The second skeleton sub-node is associated with a fourth skeleton sub-node and a fifth skeleton sub-node; the third skeleton sub-node is associated with the sixth skeleton sub-node and the seventh skeleton sub-node, and the node information of the second layer of skeleton sub-node layer refers to the diagram, which is not described herein again.
According to the embodiment of the present disclosure, a decision value of a preset feature for segmenting each skeleton node is not determined in the decision tree skeleton model, and in operation S404, the decision tree skeleton model is trained by using a preset algorithm and a training sample set, for example, the ID3 algorithm may be used to train the decision tree skeleton model by using the training sample set, so as to obtain a decision value of a preset feature for segmenting each skeleton node, and finally obtain a trained decision tree.
According to the embodiment of the disclosure, the training sample set comprises the identity characteristic information and the behavior characteristic information of the user, the decision tree obtained by training the training set is used, the identity characteristic information and the behavior characteristic information of the user are used as the basis for classifying the user, and compared with the prior art that the identification and judgment are carried out only according to the transaction behavior of the user, the identity characteristic information and the behavior characteristic information of the user are considered, the method has high accuracy in user identification in a complex scene, can accurately identify suspicious users with abnormal transaction behaviors, provides technical support for a financial institution to identify the abnormal transaction users, and improves the coverage integrity and the effectiveness of an abnormal transaction identification model.
According to an embodiment of the present disclosure, in the training method, the preset node information is generated according to user type results of a plurality of users.
According to the embodiment of the disclosure, when a decision tree skeleton model is constructed, the preset node information of each skeleton node needs to be obtained through preselection, and the preset node information can be generated according to the user type results of a plurality of users. Since the classification result for each user is known in the training sample set, it may be determined which features of the user are used as the classified features under the guidance of the result, and specifically, for example, it may be determined what the features of the first primary classification are performed, what the features of the further classification are performed on the basis of the primary classification, and the like, so as to determine the preset node information of each skeleton node.
According to the embodiment of the disclosure, the preset node information is generated according to the user type results of a plurality of users, so that a skeleton model constructed by the preset node information is guided by the actual results, and a better primary classification effect is achieved.
According to an embodiment of the present disclosure, in the training process, training the decision tree skeleton model by using a preset algorithm and a training sample set to obtain a trained decision tree includes:
determining a decision value of a preset feature for segmenting each skeleton node by using a preset algorithm and a training sample set; and combining the decision value of the preset characteristics for segmenting each skeleton node with a decision tree skeleton model to obtain a trained decision tree.
According to the embodiment of the disclosure, since the decision value of the preset feature for segmenting each skeleton node is not determined in the decision tree skeleton model, the decision value of each preset feature is determined by adopting a preset algorithm and a training sample set, and the decision value of the preset feature for segmenting each skeleton node is further combined with the decision tree skeleton model, so that the trained decision tree can be obtained. The preset algorithm may be any decision threo algorithm applicable to the present scenario, for example, an ID3 algorithm may be adopted, an ID3 algorithm takes a falling speed of information entropy as a standard for selecting test attributes, an attribute with the highest information gain that has not been used for partitioning is selected as a partitioning standard at each node, and then the process is continued until the generated decision tree can perfectly classify training examples.
According to an embodiment of the present disclosure, when training is performed by using the ID3 algorithm, the tree starts with a single node representing a training sample, if the samples are all in the same class, the node becomes a leaf node, otherwise, the algorithm selects the attribute with the highest classification capability as the current node of the decision tree. According to the difference of the attribute values of the current decision nodes, a training sample data set is divided into a plurality of subsets, each value forms a branch, and a plurality of values form a plurality of branches. And repeating the previous step aiming at the subset obtained in the previous step, and recursively forming a decision tree on each divided sample.
According to the embodiment of the disclosure, the decision tree framework model is pre-constructed and trained by combining the preset algorithm, so that the prediction precision is higher compared with that of a fully artificial decision tree, and the training process is shorter and the training speed is higher on the basis of ensuring the prediction precision compared with that of a fully algorithm training decision tree.
According to the embodiment of the disclosure, determining the training sample set according to the identity characteristic information of the plurality of users and the behavior characteristic information of the plurality of users respectively aiming at the respective target accounts comprises: and carrying out data preprocessing on the identity characteristic information of a plurality of users and the behavior characteristic information of the plurality of users aiming at respective target accounts respectively to obtain a training sample set.
According to embodiments of the present disclosure, data preprocessing may include data cleansing, data integration, data transformation.
The data cleaning is, for example, to fill missing attributes of the collected user identity characteristic data and behavior characteristic data (for example, a default value is assigned if the client age information is missing), identify and delete an abnormal or outlier in the data (for example, delete discrete account opening data before the dormant account is sold), and then incorporate the cleaned valid data into a subsequent data integration process.
The data integration can be, for example, uniformly storing data in a plurality of data sources (personal information data of a user may be from different business system tables), establishing an overall customer characteristic warehouse including an individual customer, a public customer and the like, and transferring the integrated data stream to a data transformation process.
The data transformation may be, for example, converting the data into a form suitable for data mining by smoothly aggregating the respective attributes of the data, generalizing the data, normalizing the data, and the like, such as renminbi standardization processing for different registered fund currencies of customers.
According to the embodiment of the disclosure, after the collected identity characteristic data and behavior characteristic data of the user are subjected to data cleaning, data integration and data transformation to generate a training sample set, the training sample set is transferred to a decision tree training process.
According to the embodiment of the disclosure, the availability of training samples is improved by performing data preprocessing on the obtained original data, and the decision tree classification effect of sequential sample training is more accurate.
Based on the type identification method, the disclosure also provides a type identification device. Fig. 5 schematically shows a block diagram of the structure of a type recognition apparatus according to an embodiment of the present disclosure.
The apparatus will be described in detail below with reference to fig. 5.
As shown in fig. 5, the type identifying apparatus 500 of this embodiment includes a first obtaining module 501, an input module 502 and an output module 503.
The first obtaining module 501 is configured to obtain identity characteristic information of a user to be identified and behavior characteristic information of the user to be identified for a target account after obtaining authorization of the user to be identified for personal information;
an input module 502, configured to input the identity characteristic information and the behavior characteristic information of the user to be identified into a trained decision tree; the decision tree comprises a root node and at least one layer of sub node layers, wherein each layer of sub node layer comprises at least one sub node, the node type of each sub node comprises leaf nodes and/or non-leaf nodes, the root node is associated with at least one layer of sub node layer through a decision condition, and the decision condition is constructed according to identity characteristic information and behavior characteristic information;
and an output module 503, configured to output, through the decision tree, a recognition result for characterizing the type of the user to be recognized.
According to the embodiment of the disclosure, since the decision condition in the decision tree is constructed according to the identity characteristic information and the behavior characteristic information, the trained decision tree is utilized, the identity characteristic information and the behavior characteristic information of the user are acquired through the first acquisition module 501, and the acquired information data is input into the decision tree through the input module 502, so that the classification result of the user can be obtained.
According to the embodiment of the disclosure, the identification result comprises that the user to be identified is a first type user or a second type user; the identity characteristic information comprises age and the place of the household register, and the behavior characteristic information comprises the number of account opening or account canceling aiming at the target account and the number of internet bank login aiming at the target account; the decision condition is constructed according to the age, the place of the household register, the number of times of opening or canceling the account and the number of times of logging in the internet bank.
According to an embodiment of the present disclosure, the at least one sub-node layer includes a first sub-node layer, and the first sub-node layer includes a first sub-node whose node type is a leaf node, a second sub-node whose node type is a non-leaf node, and a third sub-node.
The decision condition comprises a first decision condition, wherein the first decision condition is constructed according to age, and the first decision condition comprises: under the condition that the age is larger than a preset upper age limit, the user to be identified is associated with a first child node, and the first child node is used for representing that the user to be identified is a first class user; under the condition that the age is greater than or equal to the preset lower age limit and less than or equal to the preset upper age limit, the user to be identified is associated with the second child node; and in the case that the age is smaller than the preset lower age limit, associating the user to be identified with the third child node.
According to an embodiment of the present disclosure, the at least one sub-node layer includes a second sub-node layer, and the second sub-node layer includes a fourth sub-node whose node type is a leaf node, a fifth sub-node whose node type is a non-leaf node, a sixth sub-node whose node type is a leaf node, and a seventh sub-node.
The decision conditions comprise a second decision condition and a third decision condition, wherein the second decision condition is constructed according to the household location, the third decision condition is constructed according to the number of account opening or account cancellation times, and the second decision condition comprises: under the condition that the household registration location is not the target region, the user to be identified is associated with a fourth sub-node, wherein the fourth sub-node is used for representing that the user to be identified is a second type of user; and under the condition that the household location is the target region, the user to be identified is associated with the fifth sub-node.
The third decision condition includes: under the condition that the number of account opening or account canceling times for the target account is smaller than a first preset number threshold, the user to be identified is associated with a sixth child node, and the sixth child node is used for representing that the user to be identified is a second-class user; and under the condition that the number of account opening or account canceling times for the target account is greater than or equal to a first preset number threshold, associating the user to be identified with a seventh child node, wherein the seventh child node is used for representing that the user to be identified is the first-class user.
According to an embodiment of the present disclosure, the at least one sub-node layer includes a third sub-node layer, and the third sub-node layer includes an eighth sub-node and a ninth sub-node whose node types are leaf nodes.
The decision conditions comprise a fourth decision condition, wherein the fourth decision condition is constructed according to the number of the internet bank logins, and the fourth decision condition comprises the following steps: under the condition that the number of internet bank login times for the target account is smaller than a second preset number threshold, the user to be identified is associated with an eighth child node, and the eighth child node is used for representing that the user to be identified is a second type of user; and under the condition that the number of internet bank login times for the target account is greater than or equal to a second preset number threshold, associating the user to be identified with a ninth child node, wherein the ninth child node is used for representing that the user to be identified is the first type user.
According to the embodiment of the disclosure, the device further comprises a generating module, configured to generate the warning information including the identification result when the identification result is that the user to be identified is the first type of user.
According to the embodiment of the disclosure, the device further comprises a sending module, which is used for sending the early warning information to a service identification system, wherein the service identification system is used for determining whether the user to be identified is the target type user according to the identification result.
Based on the above method for training a decision tree for identifying a user type, another aspect of the present disclosure provides an apparatus for training a decision tree for identifying a user type.
Fig. 6 schematically shows a block diagram of an apparatus for training a decision tree for identifying a user type according to an embodiment of the present disclosure.
The apparatus will be described in detail below with reference to fig. 6.
As shown in fig. 6, the training apparatus 600 of this embodiment includes a second obtaining module 601, a determining module 602, a constructing module 603, and a training module 604.
The second obtaining module 601 is configured to obtain identity feature information of multiple users and behavior feature information of the multiple users for respective target accounts after obtaining respective entitlements of the multiple users to respective personal information;
a determining module 602, configured to determine a training sample set according to identity feature information of multiple users and behavior feature information of the multiple users for respective target accounts;
a building module 603, configured to build a decision tree framework model, where the decision tree framework model includes a plurality of framework nodes, where each framework node includes preset node information, where the preset node information includes preset node type information of each framework node, preset position information of each framework node in the decision tree framework model, and preset feature type information for segmenting each framework node;
the training module 604 is configured to train a decision tree skeleton model by using a preset algorithm and a training sample set to obtain a trained decision tree, where the trained decision tree is used to execute the type recognition method.
According to the embodiment of the disclosure, the training sample set obtained by the second obtaining module 601 and the determining module 602 includes the identity characteristic information and the behavior characteristic information of the user, and the decision tree obtained by training the training set takes the identity characteristic information and the behavior characteristic information of the user as the basis for classifying the user through the constructing module 603 and the training module 604.
According to the embodiment of the disclosure, the preset node information is generated according to the user type results of a plurality of users.
According to an embodiment of the present disclosure, a training module includes a determination unit and a combination unit.
The determining unit is used for determining a decision value of a preset feature for segmenting each skeleton node by using a preset algorithm and a training sample set; and the combination unit is used for combining the decision value of the preset characteristics for segmenting each skeleton node with the decision tree skeleton model to obtain the trained decision tree.
According to the embodiment of the disclosure, the determining module includes a preprocessing unit, which is used for preprocessing the identity characteristic information of the users and the behavior characteristic information of the users aiming at the respective target accounts respectively to obtain the training sample set.
According to the embodiment of the present disclosure, any plurality of the first obtaining module 501, the input module 502, the output module 503, the second obtaining module 601, the determining module 602, the constructing module 603, and the training module 604 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first obtaining module 501, the input module 502, the output module 503, the second obtaining module 601, the determining module 602, the constructing module 603, and the training module 604 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them. Alternatively, at least one of the first obtaining module 501, the input module 502, the output module 503, the second obtaining module 601, the determining module 602, the constructing module 603 and the training module 604 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Fig. 7 schematically shows a block diagram of an electronic device adapted to implement a type recognition method according to an embodiment of the present disclosure.
As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. It is noted that the programs may also be stored in one or more memories other than the ROM 702 and RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 700 may also include input/output (I/O) interface 705, which input/output (I/O) interface 705 is also connected to bus 704, according to an embodiment of the present disclosure. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 702 and/or the RAM 703 and/or one or more memories other than the ROM 702 and the RAM 703 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated by the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the type identification method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 701. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the processor 701, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (16)

1. A type identification method, comprising:
after obtaining the authorization of the user to be identified on the personal information, obtaining the identity characteristic information of the user to be identified and the behavior characteristic information of the user to be identified aiming at the target account;
inputting the identity characteristic information and the behavior characteristic information of the user to be identified into a trained decision tree; the decision tree comprises a root node and at least one layer of sub-node layers, wherein each layer of sub-node layer comprises at least one sub-node, the node type of the sub-node comprises leaf nodes and/or non-leaf nodes, the root node is associated with the at least one layer of sub-node layer through a decision condition, and the decision condition is constructed according to the identity characteristic information and the behavior characteristic information; and
and outputting a recognition result for representing the type of the user to be recognized through the decision tree.
2. The type recognition method of claim 1, wherein:
the identification result comprises that the user to be identified is a first class user or a second class user;
the identity characteristic information comprises age and a place of membership, and the behavior characteristic information comprises the number of account opening or account selling times for the target account and the number of internet banking login times for the target account;
and the decision condition is constructed according to the age, the place of the household registration, the number of opening or canceling the household and the number of logging in the online bank.
3. The type recognition method of claim 2, wherein:
the at least one layer of child node layer comprises a first layer of child node layer, and the first layer of child node layer comprises a first child node with a node type being a leaf node, a second child node with a node type being a non-leaf node and a third child node;
the decision condition comprises a first decision condition, wherein the first decision condition is constructed based on the age, the first decision condition comprises:
when the age is larger than a preset upper age limit, the user to be identified is associated with the first child node, and the first child node is used for representing that the user to be identified is the first class user;
under the condition that the age is greater than or equal to a preset lower age limit and less than or equal to a preset upper age limit, the user to be identified is associated with the second child node;
and under the condition that the age is smaller than the preset lower age limit, the user to be identified is associated with the third child node.
4. The type recognition method of claim 3, wherein:
the at least one layer of sub-node layer comprises a second layer of sub-node layer, and the second layer of sub-node layer comprises a fourth sub-node with a node type being a leaf node, a fifth sub-node with a node type being a non-leaf node, a sixth sub-node with a node type being a leaf node, and a seventh sub-node;
the decision conditions comprise a second decision condition and a third decision condition, wherein the second decision condition is constructed according to the household location, the third decision condition is constructed according to the number of account opening or account cancellation, and the second decision condition and the third decision condition are constructed according to the number of account opening or account cancellation times, wherein the second decision condition and the third decision condition are obtained by the construction
The second decision condition comprises:
under the condition that the household location is not a target region, the user to be identified is associated with the fourth sub-node, wherein the fourth sub-node is used for representing that the user to be identified is the second type user;
under the condition that the household registration location is the target region, the user to be identified is associated with the fifth sub-node;
the third decision condition comprises:
when the number of times of opening or canceling the account for the target account is smaller than a first preset number threshold, associating the user to be identified with the sixth child node, where the sixth child node is used to represent that the user to be identified is the second type of user;
and under the condition that the number of times of opening or canceling the account for the target account is greater than or equal to the first preset number threshold, associating the user to be identified with the seventh child node, wherein the seventh child node is used for representing that the user to be identified is the first class user.
5. The type recognition method of claim 4, wherein:
the at least one layer of sub-node layer comprises a third layer of sub-node layer, and the third layer of sub-node layer comprises an eighth sub-node and a ninth sub-node, wherein the node types of the eighth sub-node and the ninth sub-node are leaf nodes;
the decision conditions comprise a fourth decision condition, wherein the fourth decision condition is constructed according to the internet bank login times, and the fourth decision condition comprises the following steps:
when the number of internet banking login times aiming at the target account is smaller than a second preset number threshold, the user to be identified is associated with the eighth child node, and the eighth child node is used for representing that the user to be identified is the second type user;
and under the condition that the number of internet banking login times aiming at the target account is greater than or equal to the second preset number threshold, associating the user to be identified with the ninth child node, wherein the ninth child node is used for representing that the user to be identified is the first class user.
6. The type recognition method of claim 2, further comprising:
and generating early warning information containing the identification result under the condition that the identification result is that the user to be identified is the first type of user.
7. The type recognition method of claim 6, further comprising:
and sending the early warning information to a service identification system, wherein the service identification system is used for determining whether the user to be identified is a target type user according to the identification result.
8. A method of training a decision tree for identifying a user type, comprising:
after obtaining the authorization of each personal information of a plurality of users, respectively, obtaining the identity characteristic information of the plurality of users and the behavior characteristic information of the plurality of users aiming at each target account;
determining a training sample set according to the identity characteristic information of the users and the behavior characteristic information of the users aiming at respective target accounts;
constructing a decision tree framework model, wherein the decision tree framework model comprises a plurality of framework nodes, each framework node comprises preset node information, and the preset node information comprises preset node type information of each framework node, preset position information of each framework node in the decision tree framework model respectively, and preset characteristic type information for segmenting each framework node; and
training the decision tree skeleton model by using a preset algorithm and the training sample set to obtain a trained decision tree, wherein the trained decision tree is used for executing the type recognition method of any one of claims 1 to 7.
9. The method of claim 8, wherein:
and the preset node information is generated according to the user type results of the plurality of users.
10. The method of claim 9, wherein training the decision tree framework model using a predetermined algorithm and the set of training samples to obtain a trained decision tree comprises:
determining a decision value of a preset feature for segmenting each skeleton node by using the preset algorithm and the training sample set;
and combining the decision value of the preset feature for segmenting each skeleton node with the decision tree skeleton model to obtain the trained decision tree.
11. The method of claim 8, wherein the determining a training sample set according to the identity feature information of the plurality of users and the behavior feature information of the plurality of users for respective target accounts comprises:
and carrying out data preprocessing on the identity characteristic information of the users and the behavior characteristic information of the users aiming at respective target accounts respectively to obtain the training sample set.
12. A type identifying apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring the identity characteristic information of a user to be identified and the behavior characteristic information of the user to be identified aiming at a target account after obtaining the authorization of the user to be identified on personal information;
the input module is used for inputting the identity characteristic information and the behavior characteristic information of the user to be identified into a trained decision tree; the decision tree comprises a root node and at least one layer of sub-node layers, wherein each layer of sub-node layer comprises at least one sub-node, the node type of the sub-node comprises leaf nodes and/or non-leaf nodes, the root node is associated with the at least one layer of sub-node layer through a decision condition, and the decision condition is constructed according to the identity characteristic information and the behavior characteristic information; and
and the output module is used for outputting a recognition result for representing the type of the user to be recognized through the decision tree.
13. An apparatus for training a decision tree for identifying a user type, comprising:
the second obtaining module is used for obtaining the identity characteristic information of a plurality of users and the behavior characteristic information of the plurality of users aiming at respective target accounts after obtaining the authorization of the plurality of users to the respective personal information;
the determining module is used for determining a training sample set according to the identity characteristic information of the users and the behavior characteristic information of the users aiming at respective target accounts;
the decision tree framework model comprises a plurality of framework nodes, wherein each framework node comprises preset node information, the preset node information comprises preset node type information of each framework node, preset position information of each framework node in the decision tree framework model respectively, and preset characteristic type information for segmenting each framework node; and
a training module, configured to train the decision tree skeleton model using a preset algorithm and the training sample set to obtain a trained decision tree, where the trained decision tree is used to perform the type recognition method according to any one of claims 1 to 7.
14. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
15. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.
16. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7.
CN202210455575.7A 2022-04-27 2022-04-27 Type identification method, training method and device, electronic equipment and storage medium Pending CN114677202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210455575.7A CN114677202A (en) 2022-04-27 2022-04-27 Type identification method, training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210455575.7A CN114677202A (en) 2022-04-27 2022-04-27 Type identification method, training method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114677202A true CN114677202A (en) 2022-06-28

Family

ID=82080521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210455575.7A Pending CN114677202A (en) 2022-04-27 2022-04-27 Type identification method, training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114677202A (en)

Similar Documents

Publication Publication Date Title
US20200218931A1 (en) Representative-Based Metric Learning for Classification and Few-Shot Object Detection
CN110869962A (en) Data collation based on computer analysis of data
EP3549050B1 (en) Method and computer product and methods for generation and selection of access rules
CN114638695A (en) Credit evaluation method, device, equipment and medium
CN115310510A (en) Target safety identification method and device based on optimization rule decision tree and electronic equipment
CN114579878A (en) Training method of false news discrimination model, false news discrimination method and device
CN113128773B (en) Training method of address prediction model, address prediction method and device
CN114358147A (en) Training method, identification method, device and equipment of abnormal account identification model
US11190470B2 (en) Attachment analytics for electronic communications
CN115760013A (en) Operation and maintenance model construction method and device, electronic equipment and storage medium
CN115795345A (en) Information processing method, device, equipment and storage medium
CN114677202A (en) Type identification method, training method and device, electronic equipment and storage medium
CN115048561A (en) Recommendation information determination method and device, electronic equipment and readable storage medium
CN114723548A (en) Data processing method, apparatus, device, medium, and program product
CN113785321A (en) Company scale estimation system
CN112734352A (en) Document auditing method and device based on data dimensionality
CN116760638B (en) Information processing method, system, electronic device and storage medium
US20210295379A1 (en) System and method for detecting fraudulent advertisement traffic
CN118172142A (en) Resource occupation method, device, apparatus, storage medium, and program product
CN118096170A (en) Risk prediction method and apparatus, device, storage medium, and program product
CN118013506A (en) Test user switching method, device, equipment, storage medium and program product
CN116797024A (en) Service processing method, device, electronic equipment and storage medium
US20190180367A1 (en) Retaining a set of accountholders within a ceiling number radius
CN117114874A (en) Method, device, equipment and storage medium for generating key transaction network
CN118153959A (en) Risk identification method, apparatus, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination