CN115082079B - Method and device for identifying associated user, computer equipment and storage medium - Google Patents

Method and device for identifying associated user, computer equipment and storage medium Download PDF

Info

Publication number
CN115082079B
CN115082079B CN202211003077.5A CN202211003077A CN115082079B CN 115082079 B CN115082079 B CN 115082079B CN 202211003077 A CN202211003077 A CN 202211003077A CN 115082079 B CN115082079 B CN 115082079B
Authority
CN
China
Prior art keywords
user
transaction
label
association
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211003077.5A
Other languages
Chinese (zh)
Other versions
CN115082079A (en
Inventor
黄军文
李俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huafu Technology Co ltd
Original Assignee
Shenzhen Huafu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huafu Information Technology Co ltd filed Critical Shenzhen Huafu Information Technology Co ltd
Priority to CN202211003077.5A priority Critical patent/CN115082079B/en
Publication of CN115082079A publication Critical patent/CN115082079A/en
Application granted granted Critical
Publication of CN115082079B publication Critical patent/CN115082079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/405Establishing or using transaction specific rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Abstract

The embodiment of the application discloses a method and a device for identifying a related user, computer equipment and a storage medium. The method comprises the following steps: acquiring a user set to be identified; extracting transaction characteristic data of each user in the user set to be identified and label characteristic data of each user; then, acquiring transaction association characteristic data between every two users according to a preset transaction association rule and the transaction characteristic data; meanwhile, label association characteristic data between every two users are obtained according to a preset label association rule and the label characteristic data; and finally, inputting the transaction associated characteristic data and the label associated characteristic data into the trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user. According to the scheme, the associated user identification result of each user is obtained, manual participation is not needed, and the efficiency and accuracy of associated user identification are improved.

Description

Method and device for identifying associated user, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for identifying a relevant user, a computer device, and a storage medium.
Background
With the increasing demands on risk control in the financial field, banks need to identify not only individual but also group partner crime risks.
In a traditional mode, a group partner in a suspicious list is manually screened by using a series of expert rules through associated transaction data and basic information data, whether the transaction between every two users in the suspicious list belongs to the group partner transaction (namely, whether the two users in the suspicious list are associated users or not) is judged, the process is time-consuming and labor-consuming, the accuracy depends on the professional quality of a checker, and the identification efficiency and accuracy are low.
Disclosure of Invention
The embodiment of the application provides a method and a device for identifying a related user, computer equipment and a storage medium, which can improve the efficiency and accuracy of identification of the related user.
In a first aspect, an embodiment of the present application provides an identification method for an associated user, including:
acquiring a user set to be identified;
extracting transaction characteristic data of each user and label characteristic data of each user in the user set to be identified;
for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to preset transaction association rules and the transaction characteristic data to obtain transaction association characteristic data of each user, wherein the target user is a detection subject needing transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified;
for each user, respectively performing label association calculation on label association between the target user and each non-target user according to a preset label association rule and the label characteristic data to obtain the label association characteristic data of each user;
and inputting the transaction correlation characteristic data and the label correlation characteristic data into a trained scoring model to perform correlation user scoring processing on each user respectively to obtain a correlation user identification result of each user.
In a second aspect, an embodiment of the present application further provides an identification apparatus for associating a user, including:
the device comprises an acquisition unit, a recognition unit and a recognition unit, wherein the acquisition unit is used for acquiring a user set to be recognized;
the processing unit is used for extracting transaction characteristic data of each user and label characteristic data of each user in the user set to be identified;
the processing unit is further configured to perform, for each user, transaction association calculation on transaction associations between target users and non-target users respectively according to preset transaction association rules and the transaction feature data to obtain transaction association feature data of each user, where the target users are detection subjects currently needing transaction association calculation in the user set to be identified, and the non-target users are users in the user set to be identified except for the target users;
the processing unit is further configured to perform label association calculation on label associations between the target users and the non-target users respectively according to preset label association rules and the label feature data for the users to obtain label association feature data of the users;
the processing unit is further configured to input the transaction association feature data and the tag association feature data into the trained scoring model to perform associated user scoring processing on each user, so as to obtain an associated user identification result of each user.
In a third aspect, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which when executed by a processor, implement the above method.
The embodiment of the application provides a method and a device for identifying a related user, computer equipment and a storage medium. Wherein the method comprises the following steps: firstly, acquiring a user set to be identified; extracting transaction characteristic data of each user in the user set to be identified and label characteristic data of each user; then, for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to a preset transaction association rule and the transaction characteristic data to obtain transaction association characteristic data of each user, wherein the target user is a detection subject which needs transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified; meanwhile, for each user, according to a preset label association rule and the label characteristic data, label association calculation is respectively carried out on label association between the target user and each non-target user to obtain the label association characteristic data of each user; and finally, inputting the transaction associated characteristic data and the label associated characteristic data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user. According to the scheme, the transaction associated characteristic data of each user and the label associated characteristic data of each user are generated, and then the transaction associated characteristic data of each user and the label associated characteristic data of each user are input into the scoring model, so that the associated user identification result of each user can be obtained, manual participation is not needed, and the efficiency and accuracy of associated user identification are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of an identification method for an associated user according to an embodiment of the present application;
fig. 2 is a schematic flowchart of an identification method for an associated user according to an embodiment of the present application;
FIG. 3 is a diagram illustrating some of the correlation profiles of transaction correlation feature data provided by embodiments of the present application;
fig. 4 is a schematic sub-flow diagram of an identification method for an associated user according to an embodiment of the present application;
fig. 5 is a flowchart illustrating an identification method for associated users according to another embodiment of the present application;
FIG. 6 is a schematic block diagram of an identification apparatus associated with a user according to an embodiment of the present application;
fig. 7 is a schematic block diagram of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
The embodiment of the application provides a method and a device for identifying a related user, computer equipment and a storage medium.
The execution main body of the identification method of the associated user may be the identification device of the associated user provided in the embodiment of the present application, or a computer device integrated with the identification device of the associated user, where the identification device of the associated user may be implemented in a hardware or software manner, the computer device may be a terminal or a server, and the terminal may be a smart phone, a tablet computer, a palm computer, a notebook computer, or the like.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an identification method for an associated user according to an embodiment of the present application. The identification method of the associated user is applied to the computer device 10 in fig. 1, and the computer device 10 acquires a set of users to be identified; extracting transaction characteristic data of each user and label characteristic data of each user in the user set to be identified; for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to a preset transaction association rule and the transaction characteristic data to obtain transaction association characteristic data of each user, the target user is a detection subject which needs to carry out the transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified; for each user, respectively performing label association calculation on label association between the target user and each non-target user according to a preset label association rule and the label characteristic data to obtain the label association characteristic data of each user; inputting the transaction associated feature data and the label associated feature data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user.
Fig. 2 is a schematic flowchart of an identification method for an associated user according to an embodiment of the present application. As shown in fig. 2, the method comprises the following steps S110-S150.
And S110, acquiring a user set to be identified.
In some embodiments, the set of users to be identified is a list of suspicious users, the list of suspicious users includes a plurality of suspicious users, and the suspicious users are users who need to perform a group judgment, that is, the association between the suspicious users needs to be judged.
And S120, extracting the transaction characteristic data of each user and the label characteristic data of each user in the user set to be identified.
In some embodiments, the user set to be identified includes account numbers of a plurality of suspicious users, transaction feature data corresponding to each suspicious user, and tag feature data corresponding to each suspicious user.
The transaction characteristic data includes a transaction flow, a transaction mode (including an internet bank, an Automatic Teller Machine (ATM), a third party payment tool, and the like), a transaction address, account characteristics (public, private, and company name characteristics, and the like), and the like.
At this time, step S120 specifically includes: extracting initial transaction characteristic data of each user and initial label characteristic data of each user from the user set to be identified according to a preset data acquisition type; and performing data cleaning processing on the initial transaction characteristic data and the initial label characteristic data to obtain the transaction characteristic data and the label characteristic data.
The data type included in the data acquisition type is a data type required for performing transaction correlation calculation and tag correlation calculation, and the data type includes a type corresponding to the initial transaction characteristic data and a type corresponding to the initial tag characteristic data.
In other embodiments, the user set to be identified only includes account numbers of a plurality of suspicious users, and then transaction characteristic data and label characteristic data corresponding to each suspicious user are extracted from other databases storing transaction characteristic data and label characteristic data of the suspicious accounts according to the account numbers of the suspicious users.
In order to ensure the quality of data, after the initial transaction characteristic data and the initial tag characteristic data are obtained, data cleaning processing is further performed on the initial transaction characteristic data and the initial tag characteristic data.
The data cleaning is mainly used for processing dirty data, missing values and abnormal values in original data (initial transaction characteristic data and initial label characteristic data).
Regarding the processing of the deletion value, the deletion rate used in the present application exceeds a certain threshold (the threshold is set by itself, and may be 30%,50%,90%, or the like).
Regarding the processing of the abnormal value, the abnormal value sample is directly filtered, and it is mainly considered that the abnormal value has a great influence on the model, and the sample data volume is considerable, and the filtering of the abnormal sample does not have a great influence on the training sample volume.
And S130, aiming at each user, respectively carrying out transaction association calculation on the transaction association between the target user and each non-target user according to a preset transaction association rule and the transaction characteristic data to obtain the transaction association characteristic data of each user.
The target user is a detection subject which needs transaction correlation calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified.
In some embodiments, the target user is a part of the designated users in the set of users to be identified, and not all users in the set of users to be identified, at this time, each user in the part of the designated users is a detection subject currently needing transaction association calculation, that is, only the calculation in steps S130 and S140 needs to be performed for the part of the designated users.
In other embodiments, the target user is any one user in the set of users to be identified, at this time, each user in the set of users to be identified is a detection subject that needs to perform transaction correlation calculation currently, that is, the calculation in step S130 and the calculation in step S140 need to be performed for each user in the set of users to be identified.
In this embodiment, the transaction related feature data of each user includes transaction related feature data between every two users in the user set to be identified, for example, if the user set to be identified includes user a, user B, …, and user F, at this time, the transaction related feature data of user a includes: the user A respectively associates characteristic data with the transactions between the user B, the user C, the user D, the user E and the user F; the transaction-related characteristic data of the user B includes: the user B respectively associates characteristic data with the transactions between the user A, the user C, the user D, the user E and the user F; … …; the transaction-related characteristic data of user F includes: and the user F respectively associates the characteristic data with the transactions between the user A, the user B, the user C, the user D and the user E.
In some embodiments, specifically, step S130 includes: extracting transaction flow data in the transaction characteristic data; and respectively carrying out graph association calculation on the transaction association between the target user and each non-target user according to a distributed graph processing framework sparklraphx and the transaction running data to obtain the transaction association characteristic data.
Specifically, in some embodiments, sparkGraphx translates the processed data into a Graph structure Graph composed of nodes and directed edges. The node represents the fund transfer-out party and the fund transfer-in party in the transaction record; the side represents a transaction relationship, the starting point is a fund transfer-out party, the end point is a fund transfer-in party, namely the direction of the side represents the flow direction of the fund; the side weight is a transaction weight, and the transaction weight can adopt the total amount of transaction funds, the transaction times or the average amount of the transaction funds according to requirements, so that a transaction network is established.
At this time, the transaction related feature data in this embodiment is a correlation diagram constructed according to the user flow, please refer to fig. 3, and fig. 3 shows some correlation diagrams when the user a and the user B have transaction relations.
And S140, aiming at each user, respectively carrying out label association calculation on the label association between the target user and each non-target user according to a preset label association rule and the label characteristic data to obtain the label association characteristic data of each user.
In some embodiments, the tag feature data includes a plurality of sub-tag feature data, and specifically, step S140 includes: determining sub-label association rules respectively corresponding to the sub-label characteristic data from the label association rules; and for each sub-label feature data, respectively performing label association calculation on label association between the target user and each non-target user according to the sub-label feature data and the corresponding sub-label association rule to obtain a plurality of sub-label association feature data respectively corresponding to each user.
Specifically, a calculation engine Spark calculation engine is used to calculate the relevance of the label features between users, and the label features of the users are generally classified features and continuous features.
In this embodiment, the tag association feature data of each user includes tag association feature data between every two users in the user set to be identified, for example, if the user set to be identified includes user a, user B, …, and user F, at this time, the tag association feature data of user a includes: the user A associates feature data with labels of the user B, the user C, the user D, the user E and the user F respectively; the label-delivery associated characteristic data of the user B comprises the following data: the user B associates characteristic data with labels of the user A, the user C, the user D, the user E and the user F respectively; … …; the tag association characteristic data of the user F includes: and the user F associates the characteristic data with the labels of the user A, the user B, the user C, the user D and the user E in pairs respectively.
For ease of understanding, the present step is described in detail below with a specific example.
In some embodiments, the sub-tag feature data comprises: surname, identification number, ethnicity, occupation, reserved calls, device login times (approximately 1, 7, 30, 60, 180 days), city residence, internet Protocol (IP) address for transaction, transaction address, transaction time, central client type, transaction amount, transaction times, transaction type, and the like.
In some embodiments, when performing the tag association comparison between two users, the sub-tag association rule of each tag is as follows:
the association rule of the sub-labels corresponding to the surnames is as follows: judging whether the surnames are consistent, if so, associating, otherwise, not associating;
the sub-label association rule corresponding to the identification number is as follows: whether the front 6 digits of the identity card are consistent or not, if so, associating, otherwise, not associating;
the nationality: whether the nationalities are consistent or not, if so, associating, otherwise, not associating;
occupation: whether the careers are consistent or not, if so, associating, otherwise, not associating;
reserving the telephone: whether the reserved telephones are the same or not is judged, if yes, association is carried out, otherwise, association is not carried out;
number of device logins: whether the times of logging in the equipment in the last 1, 7, 30, 60 or 180 days are the same or not is judged, if yes, association is carried out, otherwise, no association is carried out;
resident city: whether the resident cities are the same or not, if so, associating, otherwise, not associating;
and (4) transaction IP: the transaction ip is consistent in proportion, if the proportion does not exceed a preset proportion threshold value, association is carried out, otherwise, the association is not carried out;
transaction address: the transaction addresses are consistent in proportion, if the proportion does not exceed a preset proportion threshold value, association is carried out, otherwise, the association is not carried out;
transaction time: if the transaction time is consistent with the occupation ratio, if the occupation ratio does not exceed a preset occupation ratio threshold, association is carried out, otherwise, the association is not carried out;
center customer type: whether the central customer is a public account is judged, if yes, association is carried out, and if not, association is not carried out;
transaction amount: the transaction amount is in proportion, if the proportion does not exceed a preset proportion threshold value, association is carried out, otherwise, no association is carried out;
transaction times are as follows: the transaction times are compared, if the ratio does not exceed a preset ratio threshold value, association is carried out, otherwise, no association is carried out;
the transaction type: and (4) transaction type ratio, if the ratio does not exceed a preset ratio threshold value, associating, otherwise, not associating.
The proportion threshold is different for each seed label feature data, and the specific numerical value can be set by a user, and is not limited in the specific situation.
It should be noted that, in this embodiment, the execution sequence of step S130 and step S140 is not limited, that is, step S140 may be executed before step S130, or may be executed simultaneously with step S130, and the specific details are not limited herein.
S150, inputting the transaction correlation characteristic data and the label correlation characteristic data into the trained scoring model to perform correlation user scoring processing on each user respectively to obtain a correlation user identification result of each user.
The scoring model in this embodiment may be a scoring card model, or may be another model having a scoring function, and is not limited herein.
In some embodiments, when the scoring model is a scoring card model, the tag associated feature data includes a plurality of sub-tag associated feature data, please refer to fig. 4, specifically, step S150 includes:
s1501, determining a first total score of the transaction related characteristic data and a first weight corresponding to the transaction related characteristic data according to the trained score card model.
In this embodiment, if it is determined that there is a transaction association between two users corresponding to the transaction association feature data according to the association diagram of the transaction association feature data, the total score of the two users is determined to be 1, otherwise, the total score is 0.
For example, if it is determined that there is a transaction association between the user a and the user B according to the transaction association feature data, it is determined that the total score of the corresponding transaction association feature data between the user a and the user B is 1, otherwise, it is 0.
The trained scoring card model is provided with a first weight corresponding to the transaction associated feature data and a second weight corresponding to each sub-label associated feature data, and each type of sub-label associated feature data is provided with the corresponding weight.
S1502, determining a first score corresponding to the transaction association feature data according to the first total score and the first weight.
Specifically, the first total is multiplied by a first weight to obtain a first score, for example, the first total is 1, the first weight is 0.2, and in this case, the first score is 1 × 0.3=0.3.
And S1503, determining a second total score of the associated feature data of each sub-label and a second weight corresponding to the associated feature data of each sub-label according to the trained scoring card model.
In this embodiment, if the sub-tag associated feature data between two users corresponding to the sub-tag associated feature data has an association relationship (that is, conforms to the corresponding sub-tag association rule), it is determined that the total score corresponding to the sub-tag associated feature data is 1, and otherwise, it is 0.
For example, if there is a transaction association in the sub-tag a between the user a and the user B, the second total of the sub-tag association feature data corresponding to the sub-tag a between the user a and the user B at this time is 1; if the sub-label B between the user A and the user B does not have transaction correlation, the second total score of the sub-label correlation characteristic data corresponding to the sub-label B between the user A and the user B is 0; … …; if the sub-label g between the user A and the user B has transaction correlation, the second total of the sub-label correlation characteristic data corresponding to the sub-label B between the user A and the user B is 0 ….
In this embodiment, the second total score of each sub-tag associated feature data and the second weight corresponding to each sub-tag associated feature data need to be determined respectively.
S1504, determining a second score corresponding to each sub-label associated characteristic data according to the second total score and the second weight.
Specifically, the second total score is multiplied by a second weight to obtain a second score, for example, when the second total score corresponding to a certain sub-label related feature data is 1 and the second weight is 0.04, the second score is 1 × 0.04=0.04.
In this embodiment, the second scores corresponding to the associated feature data of the sub-tags between two users need to be determined.
S1505, determining the associated user identification result of each user according to the first score and the second score.
Specifically, the associated user identification result of the corresponding user is determined according to the sum of the first score and the second score.
In a specific embodiment, the correlation identification result of the user a includes identification results between the user a and each of the user B, the user C, the user D, the user E, and the user F;
for example, for user a and user B, if the corresponding first score is 0.3, the corresponding second score includes the second score corresponding to each sub-label associated feature data, and the second score includes 0.04,0.05, 0.07, 0.09, 0.1, and 0.15, at this time, the association score between user a and user B is: 0.3+0.04+0.05+0.07+0.09+0.1+0.15=0.8, and then through similar calculation, the association scores between the user A and the user C, …, and the user A and the user F are obtained, respectively.
It can be seen that the associated user identification result of each user includes an association score between every two users in the user set to be identified, the association degree reflects the association degree between the users, and the higher the score is, the higher the association degree is.
Further, the present application may extract a group member list associated with the detecting subject from the associated user identification result, for example, extract a group member list associated with user a according to the association score in the associated user identification result, or extract a group member list associated with user B.
In some embodiments, a user with an association score greater than a preset association threshold may be determined as an associated user, where the association threshold may be 0.5, or may be another numerical value, which is not limited herein.
In some embodiments, the user may further check for associated users with an association score greater than an association threshold, improving screening efficiency.
It should be noted that, in some embodiments, the associated user identification result includes not only the association score between two users, but also a score description corresponding to the association score, where the score description includes a running association description and a tag association description.
The identification method of the associated user provided by the application can be applied to identification of suspicious transaction groups, such as financial group fraud, group fraud and the like.
In summary, the present application first obtains a set of users to be identified; extracting the transaction characteristic data of each user and the label characteristic data of each user in the user set to be identified; then, for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to a preset transaction association rule and the transaction characteristic data to obtain transaction association characteristic data of each user, wherein the target user is a detection subject which needs transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified; meanwhile, for each user, according to a preset label association rule and the label characteristic data, label association calculation is respectively carried out on label association between the target user and each non-target user to obtain the label association characteristic data of each user; and finally, inputting the transaction associated characteristic data and the label associated characteristic data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user. According to the scheme, the transaction association characteristic data of each user and the label association characteristic data of each user are generated, and then the transaction association characteristic data of each user and the label association characteristic data of each user are input into the scoring model, so that the associated user identification result of each user can be obtained, manual participation is not needed, and the efficiency and the accuracy of associated user identification are improved.
In some embodiments, two users with an association are defined as a group, and the application has the following advantages:
1. the data, flow, standardization and intellectualization of the group identification process convert the expert knowledge into the calculation logic, so that the data, flow and standardization of the group identification process are realized, and the high efficiency and accuracy of the group identification are greatly improved.
2. The service party only needs to provide a detection subject (a set of users to be identified) to obtain the associated group list, and the whole group identification process is completed by a group identification system (an identification device of the associated user).
3. The method reduces the participation of experts and workers, and avoids the conditions of misjudgment, missed judgment and the like caused by the experience defects of the identification personnel.
4. The system has expandability and extensibility, the calculation logic of the group recognition is determined by the expert knowledge, a new group judgment condition can be fused into a model, and the efficiency and the accuracy of the group recognition are continuously iterated.
Fig. 5 is a flowchart illustrating an identification method for an associated user according to another embodiment of the present application. In this embodiment, the scoring model is a scoring card model, and as shown in fig. 5, the identification method for associated users of this embodiment includes steps S210 to S280. Steps S240 to S280 are similar to steps S110 to S150 in the above embodiments, and are not described herein again. The following describes in detail steps S210 to S230 added in this embodiment, and steps S210 to S230 mainly describe a training process of the score card model, in which:
and S210, acquiring a plurality of positive samples and a plurality of negative samples.
The positive exemplar consists of two users with a transaction between the group account cases and the negative exemplar consists of two users without a transaction between them.
I.e., historical data collection, in some embodiments:
the system collects relevant information of bank transactions through a big data collection framework, and the collected data comprises the following steps:
1. past group data (i.e., users who have an association relationship with each other) including group account information and group IDentity (ID);
2. the longer the transaction data of the group account is, the more accurate the suspicious transaction characteristics are calculated. Transaction data is typically required over the past 12 to 18 months.
3. Data related to transaction characteristics of the group account, such as transaction modes (online banking, ATM, third-party payment tools and the like), transaction places (login IP, ATM addresses and the like), account characteristics (company name characteristics of public, private and transaction parties and the like).
4. Base dimension data for a group account.
Typically, construction of the sample is done using the group account case, the transaction flow meter, and the savings table.
Wherein:
group account case: the method is a basis formed by samples of the rating card model training, every two accounts with transactions form a sample, the sample is the same group (namely related) to form a positive sample, and the other accounts without the relationship form a negative sample;
transaction water flow meter: the method is used for subsequently establishing a user graph (namely transaction correlation characteristic data samples) according to transaction flow data, wherein the transaction relation between main users in the graph is used as sample data, namely the unique identifier of the sample is related or not (positive sample), such as 'user A-user B', the transaction correlation characteristic data samples of two users are sample characteristics, and the transaction correlation characteristic data samples among all samples form a part of the sample;
and the deposit table acquires the account related information of the user, such as label information, as basic data of sample characteristic calculation, the relevance degree of the label data between the users is calculated subsequently by the user as a label relevance characteristic data sample, and the label relevance characteristic data sample also forms a part of the sample.
It should be noted that after the data in the transaction flow meter and the deposit table are acquired, data cleaning is performed on the data, wherein the data cleaning is mainly to process dirty data, missing values and abnormal values in the original data.
Regarding the processing of the missing values, we adopt a method for deleting variables in which the missing rate exceeds a certain threshold (the threshold is set by itself and may be 30%,50%,90%, etc.), and regarding the processing of the abnormal values, the abnormal value samples are filtered.
S220, obtaining a first transaction correlation characteristic data sample among users in each positive sample, a first label correlation characteristic data sample among users in each positive sample, a second transaction correlation characteristic data sample among users in each negative sample and a second label correlation characteristic data sample among users in each negative sample.
The steps are divided into graph calculation characteristics and non-graph calculation characteristics according to calculation types, wherein the graph calculation characteristics are transaction associated characteristic data, and the non-graph calculation characteristics are label associated characteristic data.
In this embodiment, graph calculation and non-graph calculation are respectively performed on the positive sample and the negative sample, and a first transaction related feature data sample among users of each positive sample, a first label related feature data sample among users of each positive sample, a second transaction related feature data sample among users of each negative sample, and a second label related feature data sample among users of each negative sample are respectively obtained.
Graph computation features:
the graph calculation features are mainly characterized in that the incidence relation among users on different features is calculated through the graph, and the incidence relation of each feature becomes a component of a feature matrix.
When graph computation is performed, most of the main implementation algorithms use a distributed graph computation model (Pregel) algorithm, and in some embodiments, the ring association computation specifically includes:
1) Computing a graph using a Pregel iterative algorithm
2) At each iteration, each edge in the graph is sent to the attribute (Array, including all paths to the src vertex) of the source (source, src) of the destination (dst), each path is a Tuple (Tuple) (srcID, dstID) of the edge stored in the Array, and whether dstID occurs in each path is judged, and if yes, the path is not sent to the dst vertex again
3) The dst vertex receives all the attributes (Attr) sent by the src, adds own (srcID, dstID) to each path, and sends the attribute (srcID, dstID) to the dst vertex from each outgoing edge
4) Obtaining the result of ring fetching and Pregel calculation, obtaining Attr of each vertex (including all paths reaching the vertex), traversing each path, and filtering out the path of the src ID of the first tuple in the path and the dstID of the last tuple
5) Calculating a ring association index: a number of rings in which the sample is located. (a is an impact factor, adjusted according to traffic needs).
In some embodiments, as shown in fig. 3, the graph after graph calculation corresponding to the user a and the user B may be a Connection graph, a ring graph, or a special graph, or may not have a graph association relationship, where the Connection graph uses a distributed graph processing framework Connection component (sparkGraphX Connection component) to query a Connection condition of the user, and determines a Connection index according to the Connection condition, the ring graph uses a sparkGraphX Pregel iterative algorithm to obtain a ring structure of the user, and determines a ring association index according to the ring association condition, and the special graph uses a sparkGraphX Pregel iterative algorithm to obtain other special structure credentials, and determines the special association index according to service determination.
Non-graph computation feature:
the main feature of the non-graph computation features is that a Spark computation engine is used to compute the label features, generally classification features, and continuous features between users.
For example, two trading opponents are taken, the trading characteristics are calculated, and the trading characteristics are scored, as shown in table 1:
TABLE 1
Figure 128430DEST_PATH_IMAGE001
S230, training a preset scoring card model according to the first transaction associated feature data sample, the first label associated feature data sample, the second transaction associated feature data sample and the second label associated feature data sample to obtain a trained scoring card model.
All features in the score card model should be discrete features, since the score card model assigns scores to each level value of each feature, continuous features in this case being not favorable for score assignment;
therefore, before performing the training, the continuous features are discretized first, in which case step S230 includes:
1. and performing box separation operation on the first transaction related characteristic data sample, the first label related characteristic data sample, the second transaction related characteristic data sample and the second label related characteristic data sample respectively.
Specifically, in some embodiments, the eigenvalue box operation is performed by:
calculating the Evidence Weight (WOE) of each characteristic by equally or optimally binning discrete raw data and combining historical samples for binning of each characteristic:
Figure 473960DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 827581DEST_PATH_IMAGE003
the number of negative samples under the value item,
Figure 861659DEST_PATH_IMAGE004
the number of positive samples under the value term is represented, B represents the sum of all negative samples, and G represents the sum of all positive samples.
For example, by analyzing the status of the customer by age, the following results and calculated values of WOE are shown in table 2:
TABLE 2
Figure 591717DEST_PATH_IMAGE005
The WOE can reflect the proportion of positive and negative samples to the whole under a certain type of value-taking item, if the negative sample under the certain type of item is higher than the whole level, the WOE value is positive, and the larger the negative sample is, the higher the WOE value is
The higher the WOE, the closer the evaluation of the evaluation term to the negative sample decision.
In addition, the index of poor prediction ability is eliminated by calculation of the Information Value (IV).
Figure 322913DEST_PATH_IMAGE006
Empirical parameters generally exist in the IV value, wherein the IV value is less than 0.02, which indicates that the index has no prediction capability, 0.02-0.1 indicates that the prediction capability is weak, 0.1-0.3 indicates that the prediction capability is medium, and more than 0.3 indicates that the prediction capability is strong.
By calculating the IV value, the score box index with strong prediction capability can be obtained.
2. And fitting the first transaction associated characteristic data sample after the box separation operation, the first label associated characteristic data sample after the box separation operation, the second transaction associated characteristic data sample after the box separation operation and the second label associated characteristic data sample after the box separation operation by using a logistic regression model to obtain the trained grading card model.
Specifically, in some embodiments, the WOE value is used to replace the original characteristic value, the input value of each variable obtained by the above method is the corresponding WOE value, and then the WOE value of each variable is substituted into the logistic regression algorithm, so as to obtain the output value of the scoring card model.
More specifically, the n +1 parameters of the following formula are found by a logistic regression algorithm:
Figure 644173DEST_PATH_IMAGE007
wherein x is i For the WOE value, x, of each feature 0 =0,p represents positive sample probability and 1-p represents negative sample probability, with a larger value indicating a larger negative sample probability.
Finally, model effect evaluation is performed, for example, by using the prediction capability of an index evaluation model such as a model evaluation index (AUC), a discrimination capability index (Kolmogorov-Smirnov, KS), and the like.
In the embodiment, continuous data is not standardized by using data, the value range of the data is limited, mainly considering that the value range is determined by the score of the score card model characteristic, the characteristic score is determined as the last step of the score card model construction, and considering the logical continuous type, the data binning is used for solving the problem of continuous type variables.
In some embodiments, the scoring card model is a core model for partnership identification, the data input is a feature matrix, and the output is a sample partnership label (i.e., a correlation user identification result). The construction of the scoring card model is mainly divided into 3 steps: 1. determining the characteristic weight of each characteristic of the scoring card: determining a regression coefficient through a logistic regression model; 2. the score of each characteristic of the rating card is as follows: determining the total score and the feature weight, and obtaining a feature score by the relative proportion of the weight of each feature to the total score; 3. determining the score of each level of the feature: positive scores are obtained by the feature score/positive horizontal number + positive horizontal grade, and negative scores are obtained by the feature score/negative horizontal number + negative horizontal grade.
In this embodiment, a logistic regression model is used to fit training sample data, and the weight of the feature of the scoring card model is determined according to the regression coefficient of the model.
Considering the generalization capability of the scoring card model, in the training process, the solution selects 5 models with the best fitting effect, averages the regression coefficients of the models, obtains the characteristic weight of the scoring card model, and selects the logistic regression model mainly because the model has the advantages of simplicity, stability, strong interpretability, mature technology, easiness in detection and deployment and the like.
Fig. 6 is a schematic block diagram of an identification apparatus associated with a user according to an embodiment of the present application. As shown in fig. 6, the present application also provides an identification apparatus for associated users, corresponding to the above identification method for associated users. The device for identifying the associated user includes a unit for executing the method for identifying the associated user, and the device may be configured in a desktop computer, a tablet computer, a laptop computer, or the like. Specifically, referring to fig. 6, the device 600 for identifying an associated user includes an obtaining unit 601 and a processing unit 602.
An obtaining unit 601, configured to obtain a set of users to be identified;
the processing unit 602 is configured to extract transaction feature data of each user and tag feature data of each user in the to-be-identified user set;
the processing unit 602 is further configured to perform, for each user, transaction association calculation on transaction associations between target users and non-target users respectively according to preset transaction association rules and the transaction feature data, so as to obtain transaction association feature data of each user, where the target users are detection subjects currently needing transaction association calculation in the user set to be identified, and the non-target users are users in the user set to be identified except for the target users;
the processing unit 602 is further configured to perform, for each user, tag association calculation on the tag associations between the target users and the non-target users according to a preset tag association rule and the tag feature data, so as to obtain tag association feature data of each user;
the processing unit 602 is further configured to input the transaction related feature data and the label related feature data into a trained scoring model to perform related user scoring processing on each user respectively, so as to obtain a related user identification result of each user.
In some embodiments, when the processing unit 602 executes the step of performing transaction association calculation on the transaction associations between the target users and the non-target users respectively according to the preset transaction association rule and the transaction characteristic data to obtain the transaction association characteristic data of each user, the processing unit is specifically configured to:
extracting transaction flow data in the transaction characteristic data;
and respectively carrying out graph association calculation on the transaction association between the target user and each non-target user according to a distributed graph processing framework sparklraphx and the transaction running data to obtain the transaction association characteristic data.
In some embodiments, the tag feature data includes a plurality of sub-tag feature data, and when the step of performing the tag association calculation on the tag association between the target user and each non-target user according to the preset tag association rule and the tag feature data to obtain the tag association feature data of each user is executed by the processing unit 602, the processing unit is specifically configured to:
determining sub-label association rules respectively corresponding to the sub-label characteristic data from the label association rules;
and for each sub-label feature data, respectively performing label association calculation on label association between the target user and each non-target user according to the sub-label feature data and the corresponding sub-label association rule to obtain a plurality of sub-label association feature data respectively corresponding to each user.
In some embodiments, the scoring model is a scoring card model, the tag association feature data includes a plurality of sub-tag association feature data, and the processing unit 602 is specifically configured to, when the step of inputting the transaction association feature data and the tag association feature data into the trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user:
determining a first total score of the transaction associated characteristic data and a first weight corresponding to the transaction associated characteristic data according to the trained scoring card model;
determining a first score corresponding to the transaction association characteristic data according to the first total score and the first weight;
determining a second total score of the associated feature data of each sub-label and a second weight corresponding to the associated feature data of each sub-label according to the trained scoring card model;
determining a second score corresponding to each sub-label associated feature data according to the second total score and the second weight;
and determining the associated user identification result of each user according to the first score and the second score.
In some embodiments, when the step of extracting the transaction characteristic data of each user and the tag characteristic data of each user in the set of users to be identified is executed by the processing unit 602, the processing unit is specifically configured to:
extracting initial transaction characteristic data of each user and initial label characteristic data of each user from the user set to be identified according to a preset data acquisition type;
and performing data cleaning processing on the initial transaction characteristic data and the initial label characteristic data to obtain the transaction characteristic data and the label characteristic data.
In some embodiments, the scoring model is a scoring card model, and the processing unit 602, before performing the step of inputting the transaction related feature data and the tag related feature data into the trained scoring model to perform related user scoring processing on each user respectively to obtain a related user identification result of each user, is further configured to:
obtaining a plurality of positive examples and a plurality of negative examples, wherein the positive examples comprise two users having transactions between the positive examples and the negative examples comprise two users having no transactions between the negative examples;
acquiring a first transaction associated characteristic data sample among users in each positive sample, a first label associated characteristic data sample among users in each positive sample, a second transaction associated characteristic data sample among users in each negative sample and a second label associated characteristic data sample among users in each negative sample;
and training a preset scoring card model according to the first transaction associated characteristic data sample, the first label associated characteristic data sample, the second transaction associated characteristic data sample and the second label associated characteristic data sample to obtain a trained scoring card model.
In some embodiments, when the step of training a preset score card model according to the first transaction related feature data sample, the first label related feature data sample, the second transaction related feature data sample, and the second label related feature data sample to obtain a trained score card model is executed by the processing unit 602, the processing unit is specifically configured to:
performing binning operation on the first transaction related characteristic data sample, the first label related characteristic data sample, the second transaction related characteristic data sample and the second label related characteristic data sample respectively;
and fitting the first transaction associated characteristic data sample after the box separation operation, the first label associated characteristic data sample after the box separation operation, the second transaction associated characteristic data sample after the box separation operation and the second label associated characteristic data sample after the box separation operation by using a logistic regression model to obtain the trained grading card model.
It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation process of the identification apparatus and each unit related to the user may refer to the corresponding description in the foregoing method embodiment, and for convenience and conciseness of description, no further description is provided herein.
The above-mentioned identification means of the associated user may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 7.
Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 700 may be a terminal or a server, where the terminal may be an electronic device with a communication function, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, and a wearable device. The server may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 7, the computer device 700 includes a processor 702, memory, and a network interface 705 coupled via a system bus 701, where the memory may include a non-volatile storage medium 703 and an internal memory 704.
The non-volatile storage medium 703 may store an operating system 7031 and computer programs 7032. The computer program 7032 comprises program instructions that, when executed, cause the processor 702 to perform an identification method associated with a user.
The processor 702 is configured to provide computing and control capabilities to support the operation of the overall computer device 700.
The internal memory 704 provides an environment for the execution of a computer program 7032 on the non-volatile storage medium 703, which computer program 7032, when executed by the processor 702, causes the processor 702 to perform a method of identifying an associated user.
The network interface 705 is used for network communication with other devices. It will be appreciated by those skilled in the art that the configuration shown in fig. 7 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the scope of the computer device 700 to which the present application may be applied, as a particular computer device 700 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 702 is configured to run a computer program 7032 stored in the memory to perform the steps of:
acquiring a user set to be identified;
extracting the transaction characteristic data of each user and the label characteristic data of each user in the user set to be identified;
for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to a preset transaction association rule and the transaction characteristic data to obtain transaction association characteristic data of each user, the target user is a detection subject which needs to carry out the transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified;
for each user, respectively performing label association calculation on label association between the target user and each non-target user according to a preset label association rule and the label characteristic data to obtain the label association characteristic data of each user;
inputting the transaction associated feature data and the label associated feature data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user.
In some embodiments, when the processor 702 implements the preset transaction association rule and the transaction characteristic data, and performs transaction association calculation on the transaction associations between the target users and the non-target users respectively to obtain the transaction association characteristic data of each user, the following steps are specifically implemented:
extracting transaction flow data in the transaction characteristic data;
and respectively carrying out graph association calculation on the transaction association between the target user and each non-target user according to a distributed graph processing framework sparklraphx and the transaction running data to obtain the transaction association characteristic data.
In some embodiments, the tag feature data includes a plurality of sub-tag feature data, and when the processor 702 implements the step of performing tag association calculation on the tag association between the target user and each non-target user according to the preset tag association rule and the tag feature data to obtain the tag association feature data of each user, the following steps are specifically implemented:
determining sub-label association rules respectively corresponding to the sub-label characteristic data from the label association rules;
and for each sub-label feature data, respectively performing label association calculation on label association between the target user and each non-target user according to the sub-label feature data and the corresponding sub-label association rule to obtain a plurality of sub-label association feature data respectively corresponding to each user.
In some embodiments, the scoring model is a scoring card model, the tag association feature data includes a plurality of sub-tag association feature data, and when the scoring model after inputting the transaction association feature data and the tag association feature data into training is implemented by the processor 702 to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user, the following steps are specifically implemented:
determining a first total score of the transaction correlation characteristic data and a first weight corresponding to the transaction correlation characteristic data according to the trained scoring card model;
determining a first score corresponding to the transaction association feature data according to the first total score and the first weight;
determining a second total score of the associated feature data of each sub-label and a second weight corresponding to the associated feature data of each sub-label according to the trained scoring card model;
determining a second score corresponding to each sub-label associated feature data according to the second total score and the second weight;
and determining the associated user identification result of each user according to the first score and the second score.
In some embodiments, when the processor 702 implements the step of extracting the transaction characteristic data of each user and the tag characteristic data of each user in the set of users to be identified, the following steps are implemented:
extracting initial transaction characteristic data of each user and initial label characteristic data of each user from the user set to be identified according to a preset data acquisition type;
and performing data cleaning processing on the initial transaction characteristic data and the initial label characteristic data to obtain the transaction characteristic data and the label characteristic data.
In some embodiments, the scoring model is a scoring card model, and before the step of inputting the transaction related feature data and the tag related feature data into the trained scoring model to score the associated users respectively to obtain the associated user identification results of the users, the processor 702 further implements the following steps:
obtaining a plurality of positive examples and a plurality of negative examples, wherein the positive examples comprise two users having transactions between the positive examples and the negative examples comprise two users having no transactions between the negative examples;
acquiring a first transaction associated characteristic data sample among users in each positive sample, a first label associated characteristic data sample among users in each positive sample, a second transaction associated characteristic data sample among users in each negative sample and a second label associated characteristic data sample among users in each negative sample;
and training a preset scoring card model according to the first transaction associated characteristic data sample, the first label associated characteristic data sample, the second transaction associated characteristic data sample and the second label associated characteristic data sample to obtain a trained scoring card model.
In some embodiments, when the processor 702 implements the step of training a preset score card model according to the first transaction-related feature data sample, the first label-related feature data sample, the second transaction-related feature data sample, and the second label-related feature data sample to obtain a trained score card model, the following steps are specifically implemented:
performing binning operation on the first transaction related characteristic data sample, the first label related characteristic data sample, the second transaction related characteristic data sample and the second label related characteristic data sample respectively;
and fitting the first transaction associated characteristic data sample after the binning operation, the first label associated characteristic data sample after the binning operation, the second transaction associated characteristic data sample after the binning operation and the second label associated characteristic data sample after the binning operation by using a logistic regression model to obtain the trained score card model.
It should be understood that, in the embodiment of the present Application, the Processor 702 may be a Central Processing Unit (CPU), and the Processor 702 may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present application also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program comprises program instructions. The program instructions, when executed by the processor, cause the processor to perform the steps of:
acquiring a user set to be identified;
extracting the transaction characteristic data of each user and the label characteristic data of each user in the user set to be identified;
for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to a preset transaction association rule and the transaction characteristic data to obtain transaction association characteristic data of each user, the target user is a detection subject which needs to carry out the transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified;
for each user, respectively performing label association calculation on label association between the target user and each non-target user according to a preset label association rule and the label characteristic data to obtain the label association characteristic data of each user;
inputting the transaction associated feature data and the label associated feature data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user.
In some embodiments, when the processor executes the program instruction to implement the step of performing transaction association calculation on the transaction associations between the target users and the non-target users respectively according to the preset transaction association rule and the transaction characteristic data to obtain the transaction association characteristic data of each user, the following steps are specifically implemented:
extracting transaction flow data in the transaction characteristic data;
and respectively carrying out graph association calculation on the transaction association between the target user and each non-target user according to a distributed graph processing framework sparklgraphX and the transaction flow data to obtain transaction association characteristic data.
In some embodiments, the tag feature data includes multiple sub-tag feature data, and when the processor executes the program instruction to implement the step of performing tag association calculation on tag associations between the target users and the non-target users respectively according to a preset tag association rule and the tag feature data to obtain the tag association feature data of each user, the following steps are specifically implemented:
determining sub-label association rules respectively corresponding to the sub-label characteristic data from the label association rules;
and for each sub-label feature data, respectively performing label association calculation on label association between the target user and each non-target user according to the sub-label feature data and the corresponding sub-label association rule to obtain a plurality of sub-label association feature data respectively corresponding to each user.
In some embodiments, the scoring model is a scoring card model, the tag association feature data includes a plurality of sub-tag association feature data, and the processor implements, when executing the program instruction, the scoring model after inputting the transaction association feature data and the tag association feature data into training to perform associated user scoring processing on each user, so as to obtain an associated user identification result of each user, the following steps are specifically implemented:
determining a first total score of the transaction correlation characteristic data and a first weight corresponding to the transaction correlation characteristic data according to the trained scoring card model;
determining a first score corresponding to the transaction association feature data according to the first total score and the first weight;
determining a second total score of the associated feature data of each sub-label and a second weight corresponding to the associated feature data of each sub-label according to the trained scoring card model;
determining a second score corresponding to each sub-label associated feature data according to the second total score and the second weight;
and determining the associated user identification result of each user according to the first score and the second score.
In some embodiments, when the step of extracting the transaction characteristic data of each user and the tag characteristic data of each user in the set of users to be identified is implemented by the processor by executing the program instructions, the following steps are specifically implemented:
extracting initial transaction characteristic data of each user and initial label characteristic data of each user from the user set to be identified according to a preset data acquisition type;
and performing data cleaning processing on the initial transaction characteristic data and the initial label characteristic data to obtain the transaction characteristic data and the label characteristic data.
In some embodiments, the scoring model is a scoring card model, and the processor, before executing the program instructions to implement the step of inputting the transaction-related feature data and the tag-related feature data into the trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user, further implements the following steps:
obtaining a plurality of positive examples and a plurality of negative examples, wherein the positive examples comprise two users having transactions between the positive examples and the negative examples comprise two users having no transactions between the negative examples;
acquiring a first transaction correlation characteristic data sample among users in each positive sample, a first label correlation characteristic data sample among users in each positive sample, a second transaction correlation characteristic data sample among users in each negative sample and a second label correlation characteristic data sample among users in each negative sample;
and training a preset scoring card model according to the first transaction correlation characteristic data sample, the first label correlation characteristic data sample, the second transaction correlation characteristic data sample and the second label correlation characteristic data sample to obtain a trained scoring card model.
In some embodiments, when the processor executes the program instructions to implement the step of training a preset score card model according to the first transaction-related feature data sample, the first label-related feature data sample, the second transaction-related feature data sample, and the second label-related feature data sample, and obtaining a trained score card model, the following steps are specifically implemented:
performing binning operation on the first transaction related characteristic data sample, the first label related characteristic data sample, the second transaction related characteristic data sample and the second label related characteristic data sample respectively;
and fitting the first transaction associated characteristic data sample after the box separation operation, the first label associated characteristic data sample after the box separation operation, the second transaction associated characteristic data sample after the box separation operation and the second label associated characteristic data sample after the box separation operation by using a logistic regression model to obtain the trained grading card model.
The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media of program codes.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially implemented in the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. An identification method for associated users, comprising:
acquiring a user set to be identified;
extracting the transaction characteristic data of each user and the label characteristic data of each user in the user set to be identified;
for each user, transaction association calculation is respectively carried out on transaction association between a target user and each non-target user according to preset transaction association rules and the transaction characteristic data to obtain transaction association characteristic data of each user, wherein the target user is a detection subject needing transaction association calculation currently in the user set to be identified, and the non-target user is a user except the target user in the user set to be identified;
for each user, performing label association calculation on label association between the target user and each non-target user according to a preset label association rule and the label characteristic data to obtain the label association characteristic data of each user;
inputting the transaction association characteristic data and the label association characteristic data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user, wherein the associated user identification result of each user comprises an association score between every two users in the user set to be identified;
the scoring model is a scoring card model, the tag association feature data comprises a plurality of sub-tag association feature data, and the scoring model after inputting the transaction association feature data and the tag association feature data into training respectively performs associated user scoring on each user to obtain an associated user identification result of each user, and the method comprises the following steps:
determining a first total score of the transaction correlation characteristic data and a first weight corresponding to the transaction correlation characteristic data according to the trained scoring card model;
determining a first score corresponding to the transaction association characteristic data according to the first total score and the first weight;
determining a second total score of the associated feature data of each sub-label and a second weight corresponding to the associated feature data of each sub-label according to the trained scoring card model;
determining a second score corresponding to each sub-label associated feature data according to the second total score and the second weight;
and determining the associated user identification result of each user according to the first score and the second score.
2. The method according to claim 1, wherein the performing transaction association calculation on the transaction association between the target user and each non-target user according to a preset transaction association rule and the transaction feature data to obtain the transaction association feature data of each user comprises:
extracting transaction flow data in the transaction characteristic data;
and respectively carrying out graph association calculation on the transaction association between the target user and each non-target user according to a distributed graph processing framework sparklraphx and the transaction running data to obtain the transaction association characteristic data.
3. The method according to claim 1, wherein the tag feature data includes a plurality of sub-tag feature data, and the tag association between the target user and each non-target user is respectively calculated according to a preset tag association rule and the tag feature data to obtain the tag association feature data of each user, including:
determining sub-label association rules respectively corresponding to the sub-label characteristic data from the label association rules;
and for each sub-label feature data, respectively performing label association calculation on label association between the target user and each non-target user according to the sub-label feature data and the corresponding sub-label association rule to obtain a plurality of sub-label association feature data respectively corresponding to each user.
4. The method of claim 1, wherein the extracting transaction characteristic data of each user and tag characteristic data of each user in the set of users to be identified comprises:
extracting initial transaction characteristic data of each user and initial label characteristic data of each user from the user set to be identified according to a preset data acquisition type;
and performing data cleaning processing on the initial transaction characteristic data and the initial label characteristic data to obtain the transaction characteristic data and the label characteristic data.
5. The method according to any one of claims 1 to 4, wherein the scoring model is a scoring card model, and before the scoring model which inputs the transaction related feature data and the tag related feature data into the training carries out related user scoring processing on each user respectively and obtains a related user identification result of each user, the method further comprises:
obtaining a plurality of positive examples and a plurality of negative examples, wherein the positive examples comprise two users having transactions between the positive examples and the negative examples comprise two users having no transactions between the negative examples;
acquiring a first transaction correlation characteristic data sample among users in each positive sample, a first label correlation characteristic data sample among users in each positive sample, a second transaction correlation characteristic data sample among users in each negative sample and a second label correlation characteristic data sample among users in each negative sample;
and training a preset scoring card model according to the first transaction correlation characteristic data sample, the first label correlation characteristic data sample, the second transaction correlation characteristic data sample and the second label correlation characteristic data sample to obtain a trained scoring card model.
6. The method of claim 5, wherein the training a preset scorecard model according to the first transaction-related feature data sample, the first tag-related feature data sample, the second transaction-related feature data sample, and the second tag-related feature data sample to obtain a trained scorecard model comprises:
performing box separation operation on the first transaction related characteristic data sample, the first label related characteristic data sample, the second transaction related characteristic data sample and the second label related characteristic data sample respectively;
and fitting the first transaction associated characteristic data sample after the binning operation, the first label associated characteristic data sample after the binning operation, the second transaction associated characteristic data sample after the binning operation and the second label associated characteristic data sample after the binning operation by using a logistic regression model to obtain the trained score card model.
7. An apparatus for identifying an associated user, comprising:
the device comprises an acquisition unit, a recognition unit and a recognition unit, wherein the acquisition unit is used for acquiring a user set to be recognized;
the processing unit is used for extracting the transaction characteristic data of each user and the label characteristic data of each user in the user set to be identified;
the processing unit is further configured to perform transaction association calculation on transaction associations between target users and non-target users respectively according to preset transaction association rules and the transaction characteristic data for the users to obtain transaction association characteristic data of the users, where the target users are detection subjects needing to perform the transaction association calculation currently in the user set to be identified, and the non-target users are users except the target users in the user set to be identified;
the processing unit is further configured to perform label association calculation on label associations between the target users and the non-target users respectively according to preset label association rules and the label feature data for the users to obtain label association feature data of the users;
the processing unit is further configured to input the transaction association feature data and the label association feature data into a trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user, where the associated user identification result of each user includes an association score between every two users in the set of users to be identified;
the scoring model is a scoring card model, the tag associated feature data includes multiple sub-tag associated feature data, and the processing unit is specifically configured to, when executing the step of inputting the transaction associated feature data and the tag associated feature data into the trained scoring model to perform associated user scoring processing on each user respectively to obtain an associated user identification result of each user:
determining a first total score of the transaction associated characteristic data and a first weight corresponding to the transaction associated characteristic data according to the trained scoring card model;
determining a first score corresponding to the transaction association feature data according to the first total score and the first weight;
determining a second total score of the associated feature data of each sub-label and a second weight corresponding to the associated feature data of each sub-label according to the trained scoring card model;
determining a second score corresponding to each sub-label associated feature data according to the second total score and the second weight;
and determining the associated user identification result of each user according to the first score and the second score.
8. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-6 when executing the computer program.
9. A computer-readable storage medium, characterized in that the storage medium stores a computer program comprising program instructions which, when executed by a processor, implement the method according to any one of claims 1-6.
CN202211003077.5A 2022-08-22 2022-08-22 Method and device for identifying associated user, computer equipment and storage medium Active CN115082079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211003077.5A CN115082079B (en) 2022-08-22 2022-08-22 Method and device for identifying associated user, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211003077.5A CN115082079B (en) 2022-08-22 2022-08-22 Method and device for identifying associated user, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115082079A CN115082079A (en) 2022-09-20
CN115082079B true CN115082079B (en) 2022-12-09

Family

ID=83244305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211003077.5A Active CN115082079B (en) 2022-08-22 2022-08-22 Method and device for identifying associated user, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115082079B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118316B (en) * 2018-06-25 2022-04-26 创新先进技术有限公司 Method and device for identifying authenticity of online shop
US10922390B1 (en) * 2018-08-02 2021-02-16 Facebook, Inc. Training a classifier to identify unknown users of an online system
CN109949154A (en) * 2018-12-17 2019-06-28 深圳平安综合金融服务有限公司 Customer information classification method, device, computer equipment and storage medium
CN112348520A (en) * 2020-10-21 2021-02-09 上海淇玥信息技术有限公司 XGboost-based risk assessment method and device and electronic equipment
CN113177585B (en) * 2021-04-23 2024-04-05 上海晓途网络科技有限公司 User classification method, device, electronic equipment and storage medium
CN113850669A (en) * 2021-09-29 2021-12-28 平安科技(深圳)有限公司 User grouping method and device, computer equipment and computer readable storage medium
CN113987182A (en) * 2021-10-28 2022-01-28 深圳永安在线科技有限公司 Fraud entity identification method, device and related equipment based on security intelligence
CN114463119A (en) * 2022-02-14 2022-05-10 中国工商银行股份有限公司 Credit assessment method and device and electronic equipment
CN114926282A (en) * 2022-05-27 2022-08-19 平安银行股份有限公司 Abnormal transaction identification method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN115082079A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN110188198B (en) Anti-fraud method and device based on knowledge graph
CN110334737B (en) Customer risk index screening method and system based on random forest
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
US8768914B2 (en) System and method for searching and matching databases
JP5735969B2 (en) System and method for analyzing social graph data for determining connections within a community
WO2015135321A1 (en) Method and device for mining social relationship based on financial data
US20160364794A1 (en) Scoring transactional fraud using features of transaction payment relationship graphs
WO2011106897A1 (en) Systems and methods for conducting more reliable assessments with connectivity statistics
WO2011134086A1 (en) Systems and methods for conducting reliable assessments with connectivity information
WO2022199185A1 (en) User operation inspection method and program product
CN112488716B (en) Abnormal event detection system
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN115378629A (en) Ether mill network anomaly detection method and system based on graph neural network and storage medium
CN113988613A (en) Decision system and method based on enterprise credit
KR20180089479A (en) User data sharing method and device
CN110348516B (en) Data processing method, data processing device, storage medium and electronic equipment
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
US20180101913A1 (en) Entropic link filter for automatic network generation
WO2021213069A1 (en) Account identification method, device, electronic apparatus, and computer readable medium
WO2019023406A1 (en) System and method for detecting and responding to transaction patterns
CN111582722B (en) Risk identification method and device, electronic equipment and readable storage medium
CN115082079B (en) Method and device for identifying associated user, computer equipment and storage medium
CN111277433A (en) Network service abnormity detection method and device based on attribute network characterization learning
US11348115B2 (en) Method and apparatus for identifying risky vertices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Patentee after: Shenzhen Huafu Technology Co.,Ltd.

Address before: 518000 Room 201, building A, 1 front Bay Road, Shenzhen Qianhai cooperation zone, Shenzhen, Guangdong

Patentee before: SHENZHEN HUAFU INFORMATION TECHNOLOGY Co.,Ltd.