CN110046783B - Method and device for identifying fraudulent account, electronic equipment and storage medium - Google Patents

Method and device for identifying fraudulent account, electronic equipment and storage medium Download PDF

Info

Publication number
CN110046783B
CN110046783B CN201811525692.6A CN201811525692A CN110046783B CN 110046783 B CN110046783 B CN 110046783B CN 201811525692 A CN201811525692 A CN 201811525692A CN 110046783 B CN110046783 B CN 110046783B
Authority
CN
China
Prior art keywords
account
value
sample
variable
woe
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811525692.6A
Other languages
Chinese (zh)
Other versions
CN110046783A (en
Inventor
杨丹
洪丹
徐轶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201811525692.6A priority Critical patent/CN110046783B/en
Publication of CN110046783A publication Critical patent/CN110046783A/en
Application granted granted Critical
Publication of CN110046783B publication Critical patent/CN110046783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses an imposter account identification method, an imposter account identification device, electronic equipment and a storage medium. Wherein the method comprises the following steps: acquiring account address conflict variables of a plurality of sample accounts; scoring the account address conflict variable by using a scoring card model, and determining a sample learning scoring value of the sample account according to the WOE value of the obtained account address conflict variable; correcting the sample learning score value according to the importance degree of the account address conflict variable and the WOE value to obtain a sample learning score correction value; and determining a first target value interval and a second target value interval according to the known result of whether the sample account is falsely used, the sample learning score value and the sample learning score correction value, so that the proportion of falsely used accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion. According to the embodiment of the disclosure, the risk of fraudulent use of the account can be rapidly and accurately identified, and the identification performance of the fraudulent use of the account is improved.

Description

Method and device for identifying fraudulent account, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of internet, in particular to a method and a device for identifying an imposter account, electronic equipment and a storage medium.
Background
The identity impersonation can be understood as that the black product collects the identity information of all places of the country in batches, and the identity verification information mastered by hands is utilized to impersonate the identity of other people to carry out non-face signature authentication operation on the Internet. The limitation of the resource limitation and the cost profit rate can lead the black product to be unable to run to each place one by one and forge a whole set of highly imitated information for the counterfeited identity, so that the mutual position attributes of the authentication material and the identity information have a certain conflict. The account authenticated by the fake information of other people is an identity fraudulent use account, accurately identifies the fraudulent use account, and has important value for businesses such as wind control, credit and the like. For example, many overdue accounts are not counterfeit accounts (black products generally do not overdue their own credit, and therefore use accounts that other people are imposed, overdue others credit, and value of the escrow credit, and can be escrowed in large quantities).
The identity fraudulent use account can identify the fraudulent use account which is authenticated by collecting information of all nationwide places by the black party through the position attribute conflict identification system, and the result is stored. The recognition result may be used to: 1. actively improving the access threshold of the authentication account; 2. and providing result risk consultation in other business scenes, such as credit admission, merchant subscription and marketing activity cheating risk prevention and control.
Based on location attribute conflicting account scoring, there are currently mainly two ways: scoring the identity impersonation model: in addition to the conflict of the position attribute, a multidimensional characteristic system is established by combining the behavior characteristics of the user and a relation network, and the possibility of the fraudulent use of the account is estimated through data mining modeling; misappropriation location conflict score: the risk of account theft is identified taking account of the alternating of account users.
The inventors of the present disclosure found that the two types of approaches to account scoring based on location attribute conflicts have the following drawbacks:
scoring the identity impersonation model: the factors of all parties are comprehensively considered for scoring, the interpretability is poor, and the score cannot be produced in real time due to huge and complex dependent data, so that the requirements of real-time prevention and control of identity impersonation and the interpretability of the derivative risk prevention and control requirements cannot be met.
Misappropriation location conflict score: the risk of account theft is identified taking account of the alternating of account users. For an imposter account, the account is always in the black hand, and there is no change of the user, and the risk of the imposter account cannot be described.
Disclosure of Invention
The embodiment of the disclosure provides an imposter account identification method, an imposter account identification device, electronic equipment and a computer readable storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for identifying an fraudulent account.
Specifically, the fraudulent account identification method includes:
acquiring account address conflict variables of a plurality of sample accounts;
obtaining WOE values of the account address conflict variables according to variable processing modes in the scoring card model, and determining sample learning scoring values of the sample accounts according to the WOE values;
correcting the sample learning score value according to the importance degree of the account address conflict variable and the WOE value to obtain a sample learning score correction value;
and determining a first target value interval and a second target value interval according to a known result of whether the sample account is falsely used, the sample learning score value and the sample learning score correction value, so that the proportion of falsely used accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion.
Further, the account address conflict variable includes at least one of:
identifying a first type of variable of conflict between a location element of the sample account that is related to an account identity and a location element that is related to an account operator;
A second type of variable that identifies a conflict between a location element of the sample account that is associated with an account operator and a location element that is associated with the account operator;
a third type of variable that identifies a conflict between a location element of the sample account that is related to an account operator and a location element that is related to a different account identity.
Further, the obtaining the WOE value of the account address conflict variable according to the variable processing mode in the scoring card model, and determining the sample learning scoring value of the sample account according to the WOE value includes:
determining WOE values and IV values of the account address conflict variables by using a variable processing mode in a scoring card model;
screening at least one characteristic significant variable from the account address conflict variables according to the IV value;
and determining the sample learning scoring value of the sample account according to the WOE value of the characteristic significant variable.
Further, the correcting the sample learning score value according to the importance degree of the account address conflict variable and the WOE value to obtain a sample learning score correction value includes:
determining a weight value of the feature significant variable;
and weighting WOE values of the feature significant variables by using the weight values, and determining the sample learning score correction value.
Further, the determining a first target value interval and a second target value interval according to the known result of whether the sample account is falsified, the sample learning score value and the sample learning score correction value, so that the proportion of falsified accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion, includes:
determining a plurality of candidate value interval pairs according to the known results of a plurality of sample accounts, the sample learning score values and the sample learning score correction values; each candidate value interval pair comprises a first candidate value interval and a second candidate value interval, the sample learning score value is in the first candidate value interval in the candidate value interval pair, and the proportion of the sample learning score correction value in the sample account of the second target value interval in the candidate value interval pair reaches the preset proportion;
and selecting the first target value interval and the second target value interval with the largest risk coverage rate from the plurality of candidate value interval pairs.
Further, the fraudulent account identification method further includes:
Determining a to-be-identified grading value and a to-be-identified grading correction value of an account to be identified;
and determining whether the account to be identified is an imposter account according to whether the score to be identified and the score correction value to be identified are respectively located in the first target value interval and the second target value interval.
Further, the determining the learning score value to be identified and the learning score correction value to be identified for the account to be identified includes:
determining WOE values of account address conflict variables of the accounts to be identified;
and determining the score value to be identified according to the WOE value of the account address conflict variable of the account to be identified, and further determining the score correction value to be identified according to the WOE value of the account address conflict variable of the account to be identified and the importance of the account address conflict variable of the account to be identified.
In a second aspect, an embodiment of the disclosure provides an apparatus for identifying an fraudulent account.
Specifically, the fraudulent account identification apparatus includes:
an acquisition module configured to acquire account address conflict variables of a plurality of sample accounts;
the first determining module is configured to obtain the WOE value of the account address conflict variable according to the variable processing mode in the scoring card model, and determine the sample learning scoring value of the sample account according to the WOE value;
The correction module is configured to correct the sample learning score value according to the importance degree of the account address conflict variable and the WOE value so as to obtain a sample learning score correction value;
the second determining module is configured to determine a first target value interval and a second target value interval according to a known result of whether the sample account is falsified or not, the sample learning score value and the sample learning score correction value, so that the proportion of falsified accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion.
Further, the account address conflict variable includes at least one of:
identifying a first type of variable of conflict between a location element of the sample account that is related to an account identity and a location element that is related to an account operator;
a second type of variable that identifies a conflict between a location element of the sample account that is associated with an account operator and a location element that is associated with the account operator;
a third type of variable that identifies a conflict between a location element of the sample account that is related to an account operator and a location element that is related to a different account identity.
Further, the first determining module includes:
the first determining submodule is configured to determine WOE values and IV values of the account address conflict variables by utilizing a variable processing mode in a grading card model;
a screening sub-module configured to screen at least one feature significant variable from the account address conflict variables according to the IV value;
a second determination submodule configured to determine the sample learning score value of the sample account according to the WOE value of the feature significant variable.
Further, the correction module includes:
a third determination submodule configured to determine a weight value of the feature significant variable;
a fourth determination submodule configured to determine the sample learning score correction value after weighting the WOE value of the feature significant variable with the weight value.
Further, the second determining module includes:
a fifth determination submodule configured to determine a plurality of candidate value interval pairs according to the known results of a plurality of the sample accounts, the sample learning score values, and the sample learning score correction values; each candidate value interval pair comprises a first candidate value interval and a second candidate value interval, the sample learning score value is in the first candidate value interval in the candidate value interval pair, and the proportion of the sample learning score correction value in the sample account of the second target value interval in the candidate value interval pair reaches the preset proportion;
A sixth determination submodule configured to select, from the plurality of candidate value interval pairs, the first target value interval and the second target value interval with the largest risk coverage as the first target value interval and the second target value interval.
Further, the fraudulent account identification apparatus further includes:
the third determining module is configured to determine a to-be-identified grading value and a to-be-identified grading correction value of the to-be-identified account;
and the fourth determining module is configured to determine whether the account to be identified is an imposter account according to whether the score value to be identified and the score correction value to be identified are respectively positioned in the first target value interval and the second target value interval.
Further, the third determining module includes:
determining WOE values of account address conflict variables of the accounts to be identified;
and determining the score value to be identified according to the WOE value of the account address conflict variable of the account to be identified, and further determining the score correction value to be identified according to the WOE value of the account address conflict variable of the account to be identified and the importance of the account address conflict variable of the account to be identified.
The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In one possible design, the structure of the fraudulent account identification apparatus includes a memory for storing one or more computer instructions for supporting the fraudulent account identification apparatus to perform the above-described method of the first aspect, and a processor configured to execute the computer instructions stored in the memory. The fraudulent account identification apparatus may also include a communications interface for communicating with other devices or communications networks using the fraudulent account identification apparatus.
In a third aspect, embodiments of the present disclosure provide an electronic device comprising a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer readable storage medium storing computer instructions for use by a fraudulent account identification apparatus, comprising computer instructions for performing the method of fraudulent account identification of the first aspect described above.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
According to the embodiment of the disclosure, the account conflict characteristic variable of the sample account is obtained, the scoring card model is utilized to score the account conflict characteristic variable, so that the sample learning scoring value of the sample account is obtained, the sample learning scoring value is corrected according to the importance degree of the account address conflict variable, the sample learning scoring correction value is obtained, and then a target value interval pair is found according to the result of whether the sample account is an imposter account or not and the sample learning scoring value and the sample learning scoring correction value, so that the imposter account identification rate of the sample account of which the sample learning scoring value and the sample learning scoring correction value fall into the target value interval pair reaches a preset threshold value. The target value interval pair determined by the embodiment of the invention can score real-time risk of the account to be identified, can rapidly and accurately identify the risk of fraudulent use of the account, and is convenient for rapidly receiving information of business reaction and derivative risk identification; meanwhile, in the embodiment of the disclosure, a real-time scoring combination mode combining sample learning and expert experience correction is adopted in the process of determining the target value interval pair, so that the identification performance of the fraudulent use of the account is greatly improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 illustrates a flow chart of an imposter account identification method according to an embodiment of the present disclosure;
FIG. 2 shows a flow chart of step S102 according to the embodiment shown in FIG. 1;
fig. 3 shows a flow chart of step S103 according to the embodiment shown in fig. 1;
FIG. 4 shows a flow chart of step S104 according to the embodiment shown in FIG. 1;
FIG. 5 illustrates a flow chart of an imposter account identification method according to another embodiment of the present disclosure;
fig. 6 shows a flow chart of step S501 according to the embodiment shown in fig. 5;
FIG. 7 shows a block diagram of an imposter account identification apparatus according to an embodiment of the present disclosure;
FIG. 8 illustrates a block diagram of a first determination module 702 according to the embodiment illustrated in FIG. 7;
FIG. 9 shows a block diagram of the modification module 703 according to the embodiment shown in FIG. 7;
FIG. 10 shows a block diagram of a second determination module 704 according to the embodiment shown in FIG. 7;
FIG. 11 shows a block diagram of an imposter account identification apparatus according to another embodiment of the present disclosure;
FIG. 12 shows a block diagram of a third determination module 1101 according to the embodiment shown in FIG. 11;
fig. 13 is a schematic diagram of an electronic device suitable for use in implementing the fraudulent account identification method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.
In this disclosure, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in this specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, acts, components, portions, or combinations thereof are present or added.
In addition, it should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates a flow chart of an imposter account identification method according to one embodiment of the present disclosure. As shown in fig. 1, the fraudulent account identification method includes the following steps S101 to S104:
in step S101, account address conflict variables of a plurality of sample accounts are acquired;
in step S102, obtaining a WOE value of the account address conflict variable according to a variable processing mode in the scoring card model, and determining a sample learning scoring value of the sample account according to the WOE value;
in step S103, the sample learning score value is corrected according to the importance degree of the account address conflict variable and the WOE value, so as to obtain a sample learning score correction value;
in step S104, a first target value interval and a second target value interval are determined according to the known result of whether the sample account is falsified, the sample learning score value and the sample learning score correction value, so that the proportion of falsified accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion.
In this embodiment, the account address conflict variable refers to a variable that indicates whether a conflict occurs between every two preset position elements in a series of preset position elements related to an account, and the occurrence of the conflict between every two preset position elements can indicate that the account has a risk of being falsely used; such as a variable indicating whether a conflict occurs between the user's current address for the same account and the current login address for that account. Account address conflict means that the addresses should be identical every other but not actually identical. Because the same account has a plurality of preset position elements, such as a plurality of position elements representing the current position, the historical position, the common position and the like of the user of the same account, each login position of the same account, each position element related to different accounts with different identities and the like, the risk of the fraudulent use of the account caused by conflict among the position elements can be determined, and then the account address conflict variables are pre-constructed for the position elements.
The account address conflict variable of the sample account can be obtained by collecting each preset position element of each sample account in the network within a period of time, so as to determine the value of the pre-constructed account address conflict variable, the method for constructing the value of the account address conflict variable can comprise a plurality of modes, for example, when conflict occurs between two preset position elements in the account address conflict variable A, the account address conflict variable A can be set to be 1 or 0, and the like, in addition, a plurality of account address conflict variables of the sample account can be classified, the number (positive integer) of conflicts occurring between every two address elements in the plurality of account address conflict variables in each class can be counted as the value of the account address conflict variable, or the proportion (for example, 0-100%) of the account address conflict variable which conflicts between every two address elements in each class in all the account address conflict variables in the class can be counted.
After the multiple account address conflict variables of all the sample accounts are obtained, the account address conflict variables can be scored through a variable processing mode of a scoring card model, so that WOE values of each account address conflict variable in each WOE bin are obtained, and the sample learning scoring value of each sample account is further determined according to the WOE values. In one embodiment, the sample learning score value for each sample account is the sum of the WOE values of the bins in which the respective account address conflict variables of the sample account are located.
The scoring card model is a mature prediction method, is widely used in the fields of credit risk assessment and financial risk control, and is a generalized linear model of two kinds of variables by using a logistic regression model after discretizing a WOE coding mode of the model variables. WOE is called evidence weight (Weight of Evidence) and represents an effect on the offending proportion when the model quantity takes a certain value. After the scoring card model is applied to the embodiment of the disclosure, the model variable is an account address conflict variable in the embodiment of the disclosure, and the WOE box-dividing operation of the account address conflict variable can be performed by discretizing the value range of the account address conflict variable in the same sample account, namely, dividing the value range into different intervals; for example, the value of the account address mutation is discrete, such as 0 or 1, and can be divided into two intervals (i.e. two WOE bins), namely 0 or 1; if the value of the account address conflict variable is a continuous proportion range, for example, the value range of the account address conflict variable in the same account is the proportion range occupied by the account address conflict variable with conflict between two position elements in all account address conflict variables, the minimum proportion (0) and the maximum proportion (100%) of the proportion can be preset, and the range from the minimum proportion to the maximum proportion is discretized, namely, divided into a plurality of different intervals, namely, a plurality of WOE boxes, for example, 0-30%, 30-50% and 50-100% of the three WOE boxes; the value range of the account address conflict variable can also be the number of the account address conflict variables which conflict with each other in the two-position elements in one type of account address conflict variable in the same sample account, then a maximum number can be set, and the range from the minimum number to the maximum number is discretized, namely the account address conflict variables are divided into a plurality of different intervals, namely a plurality of WOE (WOE) sub-boxes. WOE binning is a technique known in the art and is not described in detail herein.
The calculation of the WOE value of the account address conflict variable may be performed by the following formula:
Figure BDA0001904395000000091
wherein WOE is as follows i WOE value, py, in the ith WOE bin for the current account address conflict variable i The value of the conflict variable for the current account addresses of a plurality of sample accounts is positioned in the ith WOE box, and the number of the sample accounts with conflicts is in proportion to the total number of the sample accounts, pn i The value of the conflict variable for the current account addresses of a plurality of sample accounts is positioned in the ith WOE sub-box, and the number of sample accounts which are not in conflict is in proportion to the total number of sample accounts, # y i The value of the current account address conflict variable for a plurality of sample accounts is positioned in the ith WOE bin, and the number of the sample accounts which are in conflict is #n i The value of the current account address conflict variable for a plurality of sample accounts is positioned in the ith WOE bin, and the number of sample accounts which are not in conflict is #y T Number of conflicts occurring in current account address conflict variables for multiple sample accounts, #n T The number of conflicts that do not occur in the current account address conflict variable for the plurality of sample accounts.
The WOE corresponding to each account address conflict variable can be calculated through the formula i The WOE value of each account address conflict variable of the sample account is actually the WOE corresponding to the WOE bin where the value of the account address conflict variable is located i . The sample learning score value of the sample account may be based on WOE i Determining, for example, WOE corresponding to WOE sub-boxes where each account address conflict variable of the sample account is located i And adding to obtain the product.
The sample learning score correction value may be obtained by weighting a WOE value corresponding to an account address conflict variable of the sample account, for example, a weight coefficient may be set for importance of the account address conflict variable, and then the weight coefficient is multiplied by the WOE value of the account address conflict variable, and then the sample learning score correction value is added to determine the sample learning score correction value. The importance of the account address conflict variable may be determined based on expert experience, e.g., account address conflict variable 1 may be considered important based on business expert experience, the weight may be set to 100, while account address conflict variable 2 may be considered less important than feature 1 based on business expert experience, the weight may be set to 10.
Sample accounts are typically collected from historical data, e.g., accounts that are spoofed by a jettisoned user may be collected as negative sample accounts, accounts of some normal users may be collected as positive sample accounts, etc., so that it is known whether these sample accounts are spoofed, i.e., the known result of whether the sample accounts are spoofed is used to indicate whether the sample accounts are spoofed. According to the known result, the first target value interval and the second target value interval which are determined by the sample learning score value and the sample learning score correction value of each sample account respectively correspond to the sample learning score value and the sample learning score correction value, and the combination of the first target value interval and the second target value interval, in which the false account identification accuracy reaches a preset proportion, can be counted through the sample learning score values and the sample learning score correction values of a plurality of sample accounts, wherein the false account identification accuracy is the proportion occupied by the false account in the sample accounts in the first target value interval and the second target value interval at the same time. The preset proportion can be preset according to actual conditions and risk control force, and can be set to be 99% for example.
After the first target value interval and the second target value interval are determined, the first target value interval and the second target value interval can be utilized for online identification, and when the current identification account grading value and the correction grading value respectively fall into the first target value interval and the second target value interval, the current identification account can be considered to be a risk account, namely an imposter account, and subsequent related processing can be performed.
According to the embodiment of the disclosure, the account conflict characteristic variable of the sample account is obtained, the scoring card model is utilized to score the account conflict characteristic variable, so that the sample learning scoring value of the sample account is obtained, the sample learning scoring value is corrected according to the importance degree of the account address conflict variable, the sample learning scoring correction value is obtained, and then a target value interval pair is found according to the result of whether the sample account is an imposter account or not and the sample learning scoring value and the sample learning scoring correction value, so that the imposter account identification rate of the sample account of which the sample learning scoring value and the sample learning scoring correction value fall into the target value interval pair reaches a preset threshold value. The target value interval pair determined by the embodiment of the invention can score real-time risk of the account to be identified, can rapidly and accurately identify the risk of fraudulent use of the account, and is convenient for rapidly receiving information of business reaction and derivative risk identification; meanwhile, in the embodiment of the disclosure, a real-time scoring combination mode combining sample learning and expert experience correction is adopted in the process of determining the target value interval pair, so that the identification performance of the fraudulent use of the account is greatly improved.
In an alternative implementation of this embodiment, the account address conflict variable includes at least one of:
identifying variables of the sample account that conflict between the account identity-related location element and the account operator-related location element;
identifying variables of the sample account that conflict between the account operator-related location element and the account operator-related location element;
a variable identifying a conflict between a location element of the sample account associated with an account operator and a location element associated with a different identity account identity.
In this alternative implementation, the location elements related to the identity of the account include, but are not limited to, the location related to the operator of the account (e.g., IP address, etc.), the location related to the operator currently operating the account (e.g., mobile phone location address, sim card location address, authenticated bank card issuer home, etc.), the location elements related to the identity of the different account include, but are not limited to, a series of location elements related to other accounts of the user having the account, etc.
In an optional implementation manner of this embodiment, as shown in fig. 2, the step S102, that is, the step of obtaining the WOE value of the account address conflict variable according to the variable processing manner in the scoring card model, and determining the sample learning scoring value of the sample account according to the WOE value, further includes the following steps S201-S203:
in step S201, determining the WOE value and IV value of the account address conflict variable by using a variable processing mode in the scoring card model;
in step S202, at least one feature significant variable is selected from the account address conflict variables according to the IV value;
in step S203, the sample learning score value of the sample account is determined according to the WOE value of the feature significant variable.
In this alternative implementation, the IV (Information Value) value, commonly referred to as an information value, may be used to represent the importance of the account address conflict variable. The IV value may be determined according to the WOE value of the account address conflict variable under each bin after the WOE bin, and may be specifically calculated according to the following formula:
Figure BDA0001904395000000121
Figure BDA0001904395000000122
wherein IV i And (3) for the IV value of the current account address conflict variable in the ith WOE bin, n is the number of bins after WOE bin, and IV is the IV value of the current account address conflict variable.
Because the IV value can represent the importance degree of the account address conflict variable, the importance of the account address conflict variable can be ordered according to the IV value of the account address conflict variable, a part of more important account address conflict variables are reserved to calculate a sample learning grading value and a sample learning grading correction value, the other part of unimportant account address conflict variables are removed, and the interference on grading results is reduced. The significant feature variable is a more important part of all account address conflict variables, and can be determined by a preset threshold value, namely, the account address conflict variable with the IV value larger than the threshold value is selected as the significant feature variable; the threshold value may be set empirically or the like, and is not limited herein.
In an optional implementation manner of this embodiment, as shown in fig. 3, the step S103, that is, the step of correcting the sample learning score value according to the importance level of the account address conflict variable and the WOE value to obtain a sample learning score correction value, further includes the following steps S301 to S302:
in step S301, determining a weight value of the feature significant variable;
in step S302, the WOE value of the feature significant variable is weighted by the weight value, and the sample learning score correction value is determined.
In this alternative implementation manner, the weight value of the feature significant variable in the account address conflict variable may be determined according to expert experience and/or past history data, and the WOE value of the feature significant variable may be weighted according to the weight value to determine the sample learning score correction value. In one embodiment, the sample learning score correction value of the sample account may be obtained by multiplying WOE values of a plurality of significant feature variables with corresponding weight values and adding the multiplied WOE values to obtain the sample learning score correction value.
In an optional implementation manner of this embodiment, as shown in fig. 4, the step S104, that is, the step of determining a first target value interval and a second target value interval according to the known result of whether the sample account is falsified, the sample learning score value and the sample learning score correction value, so that the proportion of falsified accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion, further includes the following steps S401-S402:
in step S401, a plurality of candidate value interval pairs are determined according to the known results of a plurality of sample accounts, the sample learning score values and the sample learning score correction values; each candidate value interval pair comprises a first candidate value interval and a second candidate value interval, the sample learning score value is in the first candidate value interval in the candidate value interval pair, and the proportion of the sample learning score correction value in the sample account of the second target value interval in the candidate value interval pair reaches the preset proportion;
In step S402, the first target value interval and the second target value interval with the largest risk coverage are selected from the plurality of candidate value interval pairs.
In this alternative implementation manner, the maximum value ranges of the sample learning score value and the sample learning score correction value may be divided into intervals, and a candidate value interval pair in which the ratio of the impossibility account in the interval reaches a preset ratio, for example, the preset ratio is 99%, then for a certain interval pair a in which the { sample learning score is equal to or greater than 10 and the sample learning correction score is equal to or greater than 190}, the sample learning score is equal to or greater than 10, and 99.5% of all sample accounts in which the sample learning correction score is equal to or greater than 190 are impossibility accounts, then the { sample learning score is equal to or greater than 10 and the sample learning correction score is equal to or greater than 190} interval pair a may be set as the candidate value interval pair. For a certain interval pair B of { sample learning score equal to or greater than 10, sample learning correction score between 130 and 190}, sample learning score equal to or greater than 10, and 99% of all sample accounts with sample learning correction score between 130 and 190 are fraudulent accounts, then the interval pair B may be set as a candidate value interval pair. The preset proportion can be considered as the identification accuracy of the fraudulent use account, that is, the first candidate value interval (or the first target value interval) and the second candidate value interval (or the second target value interval) selected by the present disclosure can at least identify the fraudulent use account with the preset proportion.
In one embodiment, the candidate value interval pairs may be performed by optimal WOE binning. The WOE bins are three: equal frequency, equal width and optimum. The optimal WOE box division mode is to discretize the value interval of the sample learning grading value and the sample learning grading correction value, namely, to divide the interval, calculate the WOE value of the sample learning grading value and the sample learning grading correction value in each interval, further calculate the IV value of the sample learning grading value and the sample learning grading correction value, select an optimal box division mode through the IV value, and finally select candidate value interval pairs. The WOE is divided into boxes according to the prior art, and will not be described here again.
In the alternative implementation mode, the condition of the identification accuracy of the fraudulent use account is considered, the condition of the risk coverage rate is also considered, and the risk coverage rate is maximized on the basis that the identification accuracy of the fraudulent use account reaches a preset proportion by selecting a pair with the largest risk coverage rate from a plurality of candidate value intervals as a first target value interval and a second target value time. The risk coverage rate is the proportion of the impossibility account of the sample learning score value in the first target value interval and the impossibility account of the sample learning score correction value in the second target value interval to all impossibility accounts in all sample accounts. For example, for the candidate segment pair a, if the risk coverage is 7%, and for the candidate segment pair B, if the risk coverage is 17%, the first candidate segment in the candidate segment pair B may be selected as the first target segment, and the second candidate segment may be selected as the second target segment.
In an alternative implementation manner of this embodiment, as shown in fig. 5, the method further includes the following steps S501 to S502:
in step S501, a score value to be recognized and a score correction value to be recognized of an account to be recognized are determined;
in step S502, whether the account to be identified is an imposter account is determined according to whether the score to be identified and the score correction value to be identified are located in the first target value interval and the second target value interval respectively.
In this alternative implementation manner, after the first target value interval and the second target value interval are determined, online identification may be performed. In the online identification process, firstly determining a to-be-identified learning score value and a to-be-identified learning score correction value of an account to be identified, wherein the calculation modes of the to-be-identified score value and the to-be-identified score correction value of the account to be identified are respectively the same as the sample learning score value and the sample learning score correction value of a sample account; and then determining whether the account to be identified is an impossibility account according to whether the score value to be identified and the score correction value to be identified are simultaneously positioned in a first target value interval and a second target value interval, namely whether the account to be identified is impossibly used by other people, and is not used by a user consistent with the identity of the account to be identified.
In an alternative implementation manner of the present embodiment, as shown in fig. 6, the step S501, that is, the step of determining the learning score value to be recognized and the learning score correction value to be recognized of the account to be recognized, further includes the following steps S601-S602:
in step S601, determining a WOE value of an account address conflict variable of the account to be identified;
in step S602, the value of the score to be identified is determined according to the WOE value of the account address conflict variable of the account to be identified, and the correction value of the score to be identified is further determined according to the WOE value of the account address conflict variable of the account to be identified and the importance of the account address conflict variable of the account to be identified.
In this alternative implementation, the WOE value of the account address conflict variable of the account to be identified is the WOE of the WOE bin into which the value of the account address conflict variable falls i The value of WOEi may be the value determined from the account address conflict variable of the sample account in the embodiment shown in FIG. 1. The to-be-identified grading value of the to-be-identified account is obtained by accumulating WOE values of all account address conflict variables or salient feature variables of the to-be-identified account, the to-be-identified grading correction value can be obtained by accumulating weighted WOE values of all account address conflict variables or salient feature variables of the to-be-identified account, and the weighting is carried out by weighting importance of the account address conflict variables or salient feature variables.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure.
Fig. 7 shows a block diagram of a fraudulent account identification apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 7, the fraudulent account identification apparatus includes:
an acquisition module 701 configured to acquire account address conflict variables of a plurality of sample accounts;
a first determining module 702 configured to obtain a WOE value of the account address conflict variable according to a variable processing manner in a scoring card model, and determine a sample learning scoring value of the sample account according to the WOE value;
a correction module 703 configured to correct the sample learning score value according to the importance level of the account address conflict variable and the WOE value to obtain a sample learning score correction value;
the second determining module 704 is configured to determine a first target value interval and a second target value interval according to the known result of whether the sample account is falsified, the sample learning score value and the sample learning score correction value, so that the proportion of falsified accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion.
In this embodiment, the account address conflict variable refers to a variable that indicates whether a conflict occurs between every two preset position elements in a series of preset position elements related to an account, and the occurrence of the conflict between every two preset position elements can indicate that the account has a risk of being falsely used; such as a variable indicating whether a conflict occurs between the user's current address for the same account and the current login address for that account. Account address conflict means that the addresses should be identical every other but not actually identical. Because the same account has a plurality of preset position elements, such as a plurality of position elements representing the current position, the historical position, the common position and the like of the user of the same account, each login position of the same account, each position element related to different accounts with different identities and the like, the risk of the fraudulent use of the account caused by conflict among the position elements can be determined, and then the account address conflict variables are pre-constructed for the position elements.
The account address conflict variable of the sample account can be obtained by collecting each preset position element of each sample account in the network within a period of time, so as to determine the value of the pre-constructed account address conflict variable, the method for constructing the value of the account address conflict variable can comprise a plurality of modes, for example, when conflict occurs between two preset position elements in the account address conflict variable A, the account address conflict variable A can be set to be 1 or 0, and the like, in addition, a plurality of account address conflict variables of the sample account can be classified, the number (positive integer) of conflicts occurring between every two address elements in the plurality of account address conflict variables in each class can be counted as the value of the account address conflict variable, or the proportion (for example, 0-100%) of the account address conflict variable which conflicts between every two address elements in each class in all the account address conflict variables in the class can be counted.
After obtaining the multiple account address conflict variables of all the sample accounts, the obtaining module 701 may score the account address conflict variables through a variable processing manner in the scoring card model, so as to obtain WOE values of each account address conflict variable in each WOE bin, and further determine sample learning scoring values of each sample account according to the WOE values. In one embodiment, the sample learning score value for each sample account is the sum of the WOE values of the bins in which the respective account address conflict variables of the sample account are located.
The scoring card model is a mature prediction device, is widely used in the fields of credit risk assessment and financial risk control, and is a generalized linear model of two kinds of variables by using a logistic regression model after discretizing a WOE coding mode of the model variables. WOE is called evidence weight (Weight of Evidence) and represents an effect on the offending proportion when the model quantity takes a certain value. The scoring card model is applied to the embodiment of the disclosure, the model variable is an account address conflict variable in the embodiment of the disclosure, and the WOE box-dividing operation of the account address conflict variable can be performed by discretizing the value range of the account address conflict variable in the same sample account, namely, dividing the value range into different sections; for example, the value of the account address mutation is discrete, such as 0 or 1, and can be divided into two intervals (i.e. two WOE bins), namely 0 or 1; if the value of the account address conflict variable is a continuous proportion range, for example, the value range of the account address conflict variable in the same account is the proportion range occupied by the account address conflict variable with conflict between two position elements in all account address conflict variables, the minimum proportion (0) and the maximum proportion (100%) of the proportion can be preset, and the range from the minimum proportion to the maximum proportion is discretized, namely, divided into a plurality of different intervals, namely, a plurality of WOE boxes, for example, 0-30%, 30-50% and 50-100% of the three WOE boxes; the value range of the account address conflict variable can also be the number of the account address conflict variables which conflict with each other in the two-position elements in one type of account address conflict variable in the same sample account, then a maximum number can be set, and the range from the minimum number to the maximum number is discretized, namely the account address conflict variables are divided into a plurality of different intervals, namely a plurality of WOE (WOE) sub-boxes. WOE binning is a technique known in the art and is not described in detail herein.
The calculation of the WOE value of the account address conflict variable may be performed by the following formula:
Figure BDA0001904395000000171
wherein WOE is as follows i WOE value, py, in the ith WOE bin for the current account address conflict variable i The value of the conflict variable for the current account addresses of a plurality of sample accounts is positioned in the ith WOE box, and the number of the sample accounts with conflicts is in proportion to the total number of the sample accounts, pn i The value of the conflict variable for the current account addresses of a plurality of sample accounts is positioned in the ith WOE sub-box, and the number of sample accounts which are not in conflict is in proportion to the total number of sample accounts, # y i The value of the current account address conflict variable for a plurality of sample accounts is positioned in the ith WOE bin, and the number of the sample accounts which are in conflict is #n i The value of the current account address conflict variable for a plurality of sample accounts is positioned in the ith WOE bin, and the number of sample accounts which are not in conflict is #y T Number of conflicts occurring in current account address conflict variables for multiple sample accounts, #n T The number of conflicts that do not occur in the current account address conflict variable for the plurality of sample accounts.
The WOE corresponding to each account address conflict variable can be calculated through the formula i The WOE value of each account address conflict variable of the sample account is actually the WOE corresponding to the WOE bin where the value of the account address conflict variable is located i . The sample learning score value of the sample account may be based on WOE i Determining, for example, WOE corresponding to WOE sub-boxes where each account address conflict variable of the sample account is located i And adding to obtain the product.
The sample learning score correction value may be obtained by weighting a WOE value corresponding to an account address conflict variable of the sample account, for example, a weight coefficient may be set for importance of the account address conflict variable, and then the weight coefficient is multiplied by the WOE value of the account address conflict variable, and then the sample learning score correction value is added to determine the sample learning score correction value. The importance of the account address conflict variable may be determined based on expert experience, e.g., account address conflict variable 1 may be considered important based on business expert experience, the weight may be set to 100, while account address conflict variable 2 may be considered less important than feature 1 based on business expert experience, the weight may be set to 10.
Sample accounts are typically collected from historical data, e.g., accounts that are spoofed by a jettisoned user may be collected as negative sample accounts, accounts of some normal users may be collected as positive sample accounts, etc., so that it is known whether these sample accounts are spoofed, i.e., the known result of whether the sample accounts are spoofed is used to indicate whether the sample accounts are spoofed. According to the known result, the first target value interval and the second target value interval which are determined by the sample learning score value and the sample learning score correction value of each sample account respectively correspond to the sample learning score value and the sample learning score correction value, and the combination of the first target value interval and the second target value interval, in which the false account identification accuracy reaches a preset proportion, can be counted through the sample learning score values and the sample learning score correction values of a plurality of sample accounts, wherein the false account identification accuracy is the proportion occupied by the false account in the sample accounts in the first target value interval and the second target value interval at the same time. The preset proportion can be preset according to actual conditions and risk control force, and can be set to be 99% for example.
After the first target value interval and the second target value interval are determined, the first target value interval and the second target value interval can be utilized for online identification, and when the current identification account grading value and the correction grading value respectively fall into the first target value interval and the second target value interval, the current identification account can be considered to be a risk account, namely an imposter account, and subsequent related processing can be performed.
According to the embodiment of the disclosure, the account conflict characteristic variable of the sample account is obtained, the scoring card model is utilized to score the account conflict characteristic variable, so that the sample learning scoring value of the sample account is obtained, meanwhile, the correction module 703 also corrects the sample learning scoring value according to the importance degree of the account address conflict variable, so that the sample learning scoring correction value is obtained, and then the second determination module 704 finds a target value interval pair according to the result of whether the sample account is an imposter account or not and the sample learning scoring value and the sample learning scoring correction value, so that the imposter account identification rate of the sample account of which the sample learning scoring value and the sample learning scoring correction value fall into the target value interval pair reaches a preset threshold. The target value interval pair determined by the embodiment of the invention can score real-time risk of the account to be identified, can rapidly and accurately identify the risk of fraudulent use of the account, and is convenient for rapidly receiving information of business reaction and derivative risk identification; meanwhile, in the embodiment of the disclosure, a real-time scoring combination mode combining sample learning and expert experience correction is adopted in the process of determining the target value interval pair, so that the identification performance of the fraudulent use of the account is greatly improved.
In an alternative implementation of this embodiment, the account address conflict variable includes at least one of:
identifying variables of the sample account that conflict between the account identity-related location element and the account operator-related location element;
identifying variables of the sample account that conflict between the account operator-related location element and the account operator-related location element;
a variable identifying a conflict between a location element of the sample account associated with an account operator and a location element associated with a different identity account identity.
In this alternative implementation, the location elements related to the identity of the account include, but are not limited to, the location related to the operator of the account (e.g., IP address, etc.), the location related to the operator currently operating the account (e.g., mobile phone location address, sim card location address, authenticated bank card issuer home, etc.), the location elements related to the identity of the different account include, but are not limited to, a series of location elements related to other accounts of the user having the account, etc.
In an alternative implementation manner of this embodiment, as shown in fig. 8, the first determining module 702 includes:
a first determining submodule 801 configured to determine WOE values and IV values of the account address conflict variables by means of variable handling in a scoring card model;
a screening sub-module 802 configured to screen at least one feature significant variable from the account address conflict variables according to the IV value;
a second determination submodule 803 configured to determine the sample learning score value of the sample account from the WOE value of the feature significant variable.
In this alternative implementation, the IV (Information Value) value, commonly referred to as an information value, may be used to represent the importance of the account address conflict variable. The IV value may be determined according to the WOE value of the account address conflict variable under each bin after the WOE bin, and may be specifically calculated according to the following formula:
Figure BDA0001904395000000201
Figure BDA0001904395000000202
wherein IV i And (3) for the IV value of the current account address conflict variable in the ith WOE bin, n is the number of bins after WOE bin, and IV is the IV value of the current account address conflict variable.
Because the IV value can represent the importance degree of the account address conflict variable, the importance of the account address conflict variable can be ordered according to the IV value of the account address conflict variable, a part of more important account address conflict variables are reserved to calculate a sample learning grading value and a sample learning grading correction value, the other part of unimportant account address conflict variables are removed, and the interference on grading results is reduced. The significant feature variable is a more important part of all account address conflict variables, and can be determined by a preset threshold value, namely, the account address conflict variable with the IV value larger than the threshold value is selected as the significant feature variable; the threshold value may be set empirically or the like, and is not limited herein.
In an alternative implementation of the present embodiment, as shown in fig. 9, the correction module 703 includes:
a third determination sub-module 901 configured to determine a weight value of the feature significant variable;
a fourth determination sub-module 902 is configured to determine the sample learning score correction value after weighting the WOE value of the feature significant variable with the weight value.
In this alternative implementation manner, the weight value of the feature significant variable in the account address conflict variable may be determined according to expert experience and/or past history data, and the WOE value of the feature significant variable may be weighted according to the weight value to determine the sample learning score correction value. In one embodiment, the sample learning score correction value of the sample account may be obtained by multiplying WOE values of a plurality of significant feature variables with corresponding weight values and adding the multiplied WOE values to obtain the sample learning score correction value.
In an alternative implementation of this embodiment, as shown in fig. 10, the second determining module 704 includes:
a fifth determination submodule 1001 configured to determine a plurality of candidate value interval pairs according to the known results of a plurality of the sample accounts, the sample learning score values, and the sample learning score correction values; each candidate value interval pair comprises a first candidate value interval and a second candidate value interval, the sample learning score value is in the first candidate value interval in the candidate value interval pair, and the proportion of the sample learning score correction value in the sample account of the second target value interval in the candidate value interval pair reaches the preset proportion;
A sixth determining submodule 1002 is configured to select, from the plurality of candidate value interval pairs, the first target value interval and the second target value interval with the largest risk coverage as the first target value interval and the second target value interval.
In this alternative implementation manner, the maximum value ranges of the sample learning score value and the sample learning score correction value may be divided into intervals, and a candidate value interval pair in which the ratio of the impossibility account in the interval reaches a preset ratio, for example, the preset ratio is 99%, then for a certain interval pair a in which the { sample learning score is equal to or greater than 10 and the sample learning correction score is equal to or greater than 190}, the sample learning score is equal to or greater than 10, and 99.5% of all sample accounts in which the sample learning correction score is equal to or greater than 190 are impossibility accounts, then the { sample learning score is equal to or greater than 10 and the sample learning correction score is equal to or greater than 190} interval pair a may be set as the candidate value interval pair. For a certain interval pair B of { sample learning score equal to or greater than 10, sample learning correction score between 130 and 190}, sample learning score equal to or greater than 10, and 99% of all sample accounts with sample learning correction score between 130 and 190 are fraudulent accounts, then the interval pair B may be set as a candidate value interval pair. The preset proportion can be considered as the identification accuracy of the fraudulent use account, that is, the first candidate value interval (or the first target value interval) and the second candidate value interval (or the second target value interval) selected by the present disclosure can at least identify the fraudulent use account with the preset proportion.
In one embodiment, the candidate value interval pairs may be performed by optimal WOE binning. The WOE bins are three: equal frequency, equal width and optimum. The optimal WOE box division mode is to discretize the value interval of the sample learning grading value and the sample learning grading correction value, namely, to divide the interval, calculate the WOE value of the sample learning grading value and the sample learning grading correction value in each interval, further calculate the IV value of the sample learning grading value and the sample learning grading correction value, select an optimal box division mode through the IV value, and finally select candidate value interval pairs. The WOE is divided into boxes according to the prior art, and will not be described here again.
In the alternative implementation mode, the condition of the identification accuracy of the fraudulent use account is considered, the condition of the risk coverage rate is also considered, and the risk coverage rate is maximized on the basis that the identification accuracy of the fraudulent use account reaches a preset proportion by selecting a pair with the largest risk coverage rate from a plurality of candidate value intervals as a first target value interval and a second target value time. The risk coverage rate is the proportion of the impossibility account of the sample learning score value in the first target value interval and the impossibility account of the sample learning score correction value in the second target value interval to all impossibility accounts in all sample accounts. For example, for the candidate segment pair a, if the risk coverage is 7%, and for the candidate segment pair B, if the risk coverage is 17%, the first candidate segment in the candidate segment pair B may be selected as the first target segment, and the second candidate segment may be selected as the second target segment.
In an optional implementation manner of this embodiment, as shown in fig. 11, the fraudulent account identifying apparatus further includes:
a third determining module 1101 configured to determine a score value to be recognized and a score correction value to be recognized for the account to be recognized;
the fourth determining module 1102 is configured to determine whether the account to be identified is an imposter account according to whether the score to be identified and the score correction value to be identified are located in the first target value interval and the second target value interval respectively.
In this alternative implementation manner, after the first target value interval and the second target value interval are determined, online identification may be performed. In the online identification process, firstly determining a to-be-identified learning score value and a to-be-identified learning score correction value of an account to be identified, wherein the calculation modes of the to-be-identified score value and the to-be-identified score correction value of the account to be identified are respectively the same as the sample learning score value and the sample learning score correction value of a sample account; and then determining whether the account to be identified is an impossibility account according to whether the score value to be identified and the score correction value to be identified are simultaneously positioned in a first target value interval and a second target value interval, namely whether the account to be identified is impossibly used by other people, and is not used by a user consistent with the identity of the account to be identified.
In an alternative implementation manner of the present embodiment, as shown in fig. 12, the third determining module 1101 includes:
a seventh determination submodule 1201 configured to determine a WOE value of an account address conflict variable of the account to be identified;
an eighth determination submodule 1202 is configured to determine the score value to be identified according to the WOE value of the account address conflict variable of the account to be identified, and further determine the score correction value to be identified according to the WOE value of the account address conflict variable of the account to be identified and the importance of the account address conflict variable of the account to be identified.
In this alternative implementation, the WOE value of the account address conflict variable of the account to be identified is the WOE of the WOE bin into which the value of the account address conflict variable falls i The value of WOEi may be the value determined from the account address conflict variable of the sample account in the embodiment shown in FIG. 1. The to-be-identified grading value of the to-be-identified account is obtained by accumulating WOE values of all account address conflict variables or salient feature variables of the to-be-identified account, the to-be-identified grading correction value can be obtained by accumulating weighted WOE values of all account address conflict variables or salient feature variables of the to-be-identified account, and the weighting is carried out by weighting importance of the account address conflict variables or salient feature variables.
Fig. 13 is a schematic diagram of an electronic device suitable for use in implementing the fraudulent account identification method according to an embodiment of the present disclosure.
As shown in fig. 13, the electronic apparatus 1300 includes a Central Processing Unit (CPU) 1301, which can execute various processes in the embodiment shown in fig. 1 described above in accordance with a program stored in a Read Only Memory (ROM) 1302 or a program loaded from a storage section 1308 into a Random Access Memory (RAM) 1303. In the RAM1303, various programs and data necessary for the operation of the electronic apparatus 1300 are also stored. The CPU1301, ROM1302, and RAM1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.
The following components are connected to the I/O interface 1305: an input section 1306 including a keyboard, a mouse, and the like; an output portion 1307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 1308 including a hard disk or the like; and a communication section 1309 including a network interface card such as a LAN card, a modem, or the like. The communication section 1309 performs a communication process via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. Removable media 1311, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1310 so that a computer program read therefrom is installed as needed into storage portion 1308.
In particular, the method described above with reference to fig. 1 may be implemented as a computer software program according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the method shown in fig. 1. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1309 and/or installed from the removable medium 1311.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the apparatus described in the above embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims (14)

1. An imposter account identification method is characterized in that,
comprising the following steps:
acquiring account address conflict variables of a plurality of sample accounts; wherein the account address conflict variable comprises at least one of: identifying a first type of variable of conflict between a location element of the sample account that is related to an account identity and a location element that is related to an account operator; a second type of variable that identifies a conflict between a location element of the sample account that is associated with an account operator and a location element that is associated with the account operator; a third type of variable that identifies a conflict between a location element of the sample account that is related to an account operator and a location element that is related to a different account identity;
obtaining WOE values of the account address conflict variables according to variable processing modes in the scoring card model, and determining sample learning scoring values of the sample accounts according to the WOE values;
correcting the sample learning score value according to the importance degree of the account address conflict variable and the WOE value to obtain a sample learning score correction value;
and determining a first target value interval and a second target value interval according to a known result of whether the sample account is falsely used, the sample learning score value and the sample learning score correction value, so that the proportion of falsely used accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
obtaining the WOE value of the account address conflict variable according to the variable processing mode in the scoring card model, and determining the sample learning scoring value of the sample account according to the WOE value, wherein the method comprises the following steps:
determining WOE values and IV values of the account address conflict variables by using a variable processing mode in a scoring card model;
screening at least one characteristic significant variable from the account address conflict variables according to the IV value;
and determining the sample learning scoring value of the sample account according to the WOE value of the characteristic significant variable.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
correcting the sample learning score value according to the importance degree of the account address conflict variable and the WOE value to obtain a sample learning score correction value, including:
determining a weight value of the feature significant variable;
and weighting WOE values of the feature significant variables by using the weight values, and determining the sample learning score correction value.
4. A method according to claim 1 or 3, wherein,
determining a first target value interval and a second target value interval according to a known result of whether the sample account is falsely used, the sample learning score value and the sample learning score correction value, so that the proportion of falsely used accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion, and the method comprises the following steps:
Determining a plurality of candidate value interval pairs according to the known results of a plurality of sample accounts, the sample learning score values and the sample learning score correction values; each candidate value interval pair comprises a first candidate value interval and a second candidate value interval, the sample learning score value is in the first candidate value interval in the candidate value interval pair, and the proportion of the sample learning score correction value in the sample account of the second target value interval in the candidate value interval pair reaches the preset proportion;
and selecting the first target value interval and the second target value interval with the largest risk coverage rate from the plurality of candidate value interval pairs.
5. The method of claim 1, wherein the step of determining the position of the substrate comprises,
further comprises:
determining a to-be-identified grading value and a to-be-identified grading correction value of an account to be identified;
and determining whether the account to be identified is an imposter account according to whether the score to be identified and the score correction value to be identified are respectively located in the first target value interval and the second target value interval.
6. The method of claim 5, wherein the step of determining the position of the probe is performed,
Determining a to-be-identified learning score value and a to-be-identified learning score correction value of an account to be identified, including:
determining WOE values of account address conflict variables of the accounts to be identified;
and determining the score value to be identified according to the WOE value of the account address conflict variable of the account to be identified, and further determining the score correction value to be identified according to the WOE value of the account address conflict variable of the account to be identified and the importance of the account address conflict variable of the account to be identified.
7. An identification device for an fraudulent use account is characterized in that,
comprising the following steps:
an acquisition module configured to acquire account address conflict variables of a plurality of sample accounts; wherein the account address conflict variable comprises at least one of: identifying a first type of variable of conflict between a location element of the sample account that is related to an account identity and a location element that is related to an account operator; a second type of variable that identifies a conflict between a location element of the sample account that is associated with an account operator and a location element that is associated with the account operator; a third type of variable that identifies a conflict between a location element of the sample account that is related to an account operator and a location element that is related to a different account identity;
The first determining module is configured to obtain the WOE value of the account address conflict variable according to the variable processing mode in the scoring card model, and determine the sample learning scoring value of the sample account according to the WOE value;
the correction module is configured to correct the sample learning score value according to the importance degree of the account address conflict variable and the WOE value so as to obtain a sample learning score correction value;
the second determining module is configured to determine a first target value interval and a second target value interval according to a known result of whether the sample account is falsified or not, the sample learning score value and the sample learning score correction value, so that the proportion of falsified accounts in the sample account covered by the first target value interval and the second target value interval can reach a preset proportion.
8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
the first determining module includes:
the first determining submodule is configured to determine WOE values and IV values of the account address conflict variables by utilizing a variable processing mode in a grading card model;
a screening sub-module configured to screen at least one feature significant variable from the account address conflict variables according to the IV value;
A second determination submodule configured to determine the sample learning score value of the sample account according to the WOE value of the feature significant variable.
9. The apparatus of claim 8, wherein the device comprises a plurality of sensors,
the correction module includes:
a third determination submodule configured to determine a weight value of the feature significant variable;
a fourth determination submodule configured to determine the sample learning score correction value after weighting the WOE value of the feature significant variable with the weight value.
10. The device according to any one of claims 7 and 9, wherein,
a second determination module comprising:
a fifth determination submodule configured to determine a plurality of candidate value interval pairs according to the known results of a plurality of the sample accounts, the sample learning score values, and the sample learning score correction values; each candidate value interval pair comprises a first candidate value interval and a second candidate value interval, the sample learning score value is in the first candidate value interval in the candidate value interval pair, and the proportion of the sample learning score correction value in the sample account of the second target value interval in the candidate value interval pair reaches the preset proportion;
A sixth determination submodule configured to select, from the plurality of candidate value interval pairs, the first target value interval and the second target value interval with the largest risk coverage as the first target value interval and the second target value interval.
11. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
further comprises:
the third determining module is configured to determine a to-be-identified grading value and a to-be-identified grading correction value of the to-be-identified account;
and the fourth determining module is configured to determine whether the account to be identified is an imposter account according to whether the score value to be identified and the score correction value to be identified are respectively positioned in the first target value interval and the second target value interval.
12. The apparatus of claim 11, wherein the device comprises a plurality of sensors,
the third determining module includes:
a seventh determination submodule configured to determine a WOE value of an account address conflict variable of the account to be identified;
and an eighth determination submodule, configured to determine the score value to be identified according to the WOE value of the account address conflict variable of the account to be identified, and further determine the score correction value to be identified according to the WOE value of the account address conflict variable of the account to be identified and the importance of the account address conflict variable of the account to be identified.
13. An electronic device, characterized in that,
comprising a memory and a processor; wherein,,
the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement the method steps of any one of claims 1-6.
14. A computer-readable storage medium having stored thereon computer instructions, characterized in that,
which when executed by a processor carries out the method steps of any of claims 1-6.
CN201811525692.6A 2018-12-13 2018-12-13 Method and device for identifying fraudulent account, electronic equipment and storage medium Active CN110046783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811525692.6A CN110046783B (en) 2018-12-13 2018-12-13 Method and device for identifying fraudulent account, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811525692.6A CN110046783B (en) 2018-12-13 2018-12-13 Method and device for identifying fraudulent account, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110046783A CN110046783A (en) 2019-07-23
CN110046783B true CN110046783B (en) 2023-04-28

Family

ID=67273713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811525692.6A Active CN110046783B (en) 2018-12-13 2018-12-13 Method and device for identifying fraudulent account, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110046783B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177585B (en) * 2021-04-23 2024-04-05 上海晓途网络科技有限公司 User classification method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130012169A (en) * 2011-06-14 2013-02-01 박한칠 History managing method for steal-proofing user account and system therefor
CN105740280A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Variable importance detection method and apparatus
CN105809502A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Transaction risk detection method and apparatus
CN107872436A (en) * 2016-09-27 2018-04-03 阿里巴巴集团控股有限公司 A kind of account recognition methods, apparatus and system
CN108416495A (en) * 2018-01-30 2018-08-17 杭州排列科技有限公司 Scorecard method for establishing model based on machine learning and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7644433B2 (en) * 2002-12-23 2010-01-05 Authernative, Inc. Authentication system and method based upon random partial pattern recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130012169A (en) * 2011-06-14 2013-02-01 박한칠 History managing method for steal-proofing user account and system therefor
CN105740280A (en) * 2014-12-10 2016-07-06 阿里巴巴集团控股有限公司 Variable importance detection method and apparatus
CN105809502A (en) * 2014-12-30 2016-07-27 阿里巴巴集团控股有限公司 Transaction risk detection method and apparatus
CN107872436A (en) * 2016-09-27 2018-04-03 阿里巴巴集团控股有限公司 A kind of account recognition methods, apparatus and system
CN108416495A (en) * 2018-01-30 2018-08-17 杭州排列科技有限公司 Scorecard method for establishing model based on machine learning and device

Also Published As

Publication number Publication date
CN110046783A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
CN109922032B (en) Method, device, equipment and storage medium for determining risk of logging in account
US11676087B2 (en) Systems and methods for vulnerability assessment and remedy identification
CN110992167B (en) Bank customer business intention recognition method and device
CN106875078B (en) Transaction risk detection method, device and equipment
CN110147967B (en) Risk prevention and control method and device
WO2010037030A1 (en) Evaluating loan access using online business transaction data
CN111629010B (en) Malicious user identification method and device
WO2016018286A1 (en) Product risk profile
CN111460312A (en) Method and device for identifying empty-shell enterprise and computer equipment
CN109376766B (en) Portrait prediction classification method, device and equipment
CN110930218B (en) Method and device for identifying fraudulent clients and electronic equipment
CN114492605A (en) Federal learning feature selection method, device and system and electronic equipment
CN110796269A (en) Method and device for generating model, and method and device for processing information
CN113553583A (en) Information system asset security risk assessment method and device
CN116823428A (en) Anti-fraud detection method, device, equipment and storage medium
CN110046783B (en) Method and device for identifying fraudulent account, electronic equipment and storage medium
CN106375259B (en) Same-user account identification method and device
CN113792298A (en) Method and device for detecting vehicle safety risk
CN113159637A (en) Malicious user determination method and device, storage medium and electronic device
CN109345376A (en) A kind of e-bank is counter to cheat method and system
CN112465632A (en) New financial AI intelligent wind control decision method and system
Bui et al. A clustering-based shrink autoencoder for detecting anomalies in intrusion detection systems
CN114820219B (en) Complex network-based fraud community identification method and system
CN114022154A (en) Bank intelligent counter transaction risk control method and device
CN113657808A (en) Personnel evaluation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant