CN112150153A - Telecommunication fraud user identification method and device - Google Patents

Telecommunication fraud user identification method and device Download PDF

Info

Publication number
CN112150153A
CN112150153A CN202011083252.7A CN202011083252A CN112150153A CN 112150153 A CN112150153 A CN 112150153A CN 202011083252 A CN202011083252 A CN 202011083252A CN 112150153 A CN112150153 A CN 112150153A
Authority
CN
China
Prior art keywords
user
information
transaction detail
probability
fraud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011083252.7A
Other languages
Chinese (zh)
Inventor
严欢
唐浩雲
梁奇
蒋洪伟
李科
汤浩
丁笑远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202011083252.7A priority Critical patent/CN112150153A/en
Publication of CN112150153A publication Critical patent/CN112150153A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/382Payment protocols; Details thereof insuring higher security of transaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Abstract

The application provides a telecommunication fraud user identification method and a telecommunication fraud user identification device, in the method, a first prediction model is utilized to process a transaction detail time sequence of a user to obtain first characteristic information of the user belonging to a telecommunication fraud user, and the first prediction model determines the first characteristic information based on characteristics of each transaction detail information in the transaction detail time sequence and context characteristics among each transaction detail information; determining second characteristic information of the user belonging to the telecommunication fraud user based on the financial attribute information of the user by utilizing a second prediction model; determining a probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information; and determining that the user has the telecom fraud risk in the case that the probability that the user belongs to the telecom fraud user is greater than a set threshold. The scheme of the application can effectively identify the users with the telecom fraud risk.

Description

Telecommunication fraud user identification method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for identifying a telecommunication fraud user.
Background
The telecommunication fraud refers to a criminal act that a lawless person compiles false information in a telephone, network and short message mode, conducts remote and non-contact fraud on a victim, and induces the victim to pay money or transfer money to the lawless person.
Telecommunication fraud threatens the property security of people seriously. In order to reduce telecommunication fraud, it is necessary to identify users at risk of telecommunication fraud more effectively. Therefore, how to effectively identify the users at risk of telecommunication fraud is a technical problem that needs to be solved in the field.
Disclosure of Invention
In order to solve the above technical problems, embodiments of the present application provide a method and an apparatus for identifying a telecommunication fraud user, so as to identify a user with a telecommunication fraud risk more effectively.
In one aspect, the present application provides a telecommunication fraud user identification method, including:
obtaining financial attribute information of a user to be analyzed and a transaction detail time series, the transaction detail time series comprising: the transaction detail information of a plurality of different time points corresponding to the user, and the financial attribute information is information used for reflecting the user attribute and the financial account characteristics;
inputting the transaction detail time series into a trained first prediction model to obtain first characteristic information of the user belonging to a telecommunication fraud user predicted by the first prediction model, wherein the first prediction model determines the first characteristic information based on the characteristics of each transaction detail information in the transaction detail time series and the context characteristics among each transaction detail information;
inputting the financial attribute information into a trained second prediction model to obtain second characteristic information of the user, predicted by the second prediction model, belonging to the telecommunication fraud user;
determining a probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information;
determining that the user is at a risk of telecom fraud if the probability that the user belongs to a telecom fraud user is greater than a set threshold.
Preferably, the determining the probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information comprises:
and determining the probability that the user belongs to the telecommunication fraud user based on the first characteristic information and the second characteristic information and by using the trained classification model.
Preferably, the first feature information is a first feature vector, and the second feature information is a second feature vector;
the determining the probability that the user belongs to the telecommunication fraud user based on the first characteristic information and the second characteristic information and by using the trained classification model comprises the following steps:
summing the first feature vector and the second feature vector to obtain a third feature vector;
inputting the third feature vector into a fully-connected network of a classification model, so as to obtain a fourth feature vector for representing that the user belongs to a telecom fraud user, wherein the classification model comprises the fully-connected network and a normalization function layer;
inputting the fourth feature vector into a normalization function layer of a classification model, and obtaining the probability that the user belongs to a telecom fraud user, which is output by the normalization function layer.
Preferably, the first prediction model, the second prediction model and the classification model are obtained by synchronously training financial attribute information samples and transaction detail time series samples of a plurality of positive sample users and financial attribute information samples and transaction detail time series samples of a plurality of negative sample users;
wherein, the positive sample user is an annotated telecom fraud user, and the negative sample user is an annotated non-telecom fraud user;
the transaction detail time series samples of the positive sample user and the negative sample user are input information of the first prediction model, the financial attribute information samples of the positive sample user and the negative sample user are input information of the second prediction model, the transaction detail time series samples comprise a plurality of transaction detail information samples, and the financial attribute information samples are used for reflecting user attributes and information of financial account characteristics.
Preferably, the first prediction model sequentially includes: the network comprises at least one layer of bidirectional long and short term memory network and a full connection network layer connected with the at least one layer of bidirectional long and short term memory network.
Preferably, the first prediction model further includes: and the batch standardization layer is connected with the output end of each bidirectional long-short term memory network in the at least one layer of bidirectional long-short term memory network.
Preferably, before the inputting the transaction detail time series into the trained first prediction model, the method further includes:
for each type of transaction detail information in the transaction detail time sequence, if the transaction detail information is numerical, standardizing and smoothing the numerical value of the transaction detail information in the transaction detail time sequence;
if the transaction detail information is non-numerical, determining the vector of the transaction detail information at each time point in the transaction detail time sequence.
In yet another aspect, the present application further provides a telecommunication fraud user identification apparatus, including:
an information obtaining unit configured to obtain financial attribute information of a user to be analyzed and a transaction detail time series, the transaction detail time series including: the transaction detail information of a plurality of different time points corresponding to the user, and the financial attribute information is information used for reflecting the user attribute and the financial account characteristics;
a first feature prediction unit, configured to input the transaction detail time series into a trained first prediction model, and obtain first feature information that is predicted by the first prediction model and belongs to a telecom fraud user, where the first prediction model determines the first feature information based on features of each transaction detail information in the transaction detail time series and context features between each transaction detail information;
the second characteristic prediction unit is used for inputting the financial attribute information into a trained second prediction model to obtain second characteristic information of the user, predicted by the second prediction model, belonging to the telecommunication fraud user;
a probability determination unit, configured to determine a probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information;
a risky user identification unit for determining that the user is at risk of telecom fraud if the probability that the user belongs to a telecom fraud user is greater than a set threshold.
Preferably, the probability determining unit is specifically configured to determine the probability that the user belongs to the telecommunication fraud user based on the first characteristic information and the second characteristic information and by using a trained classification model.
Preferably, the first feature information obtained by the first prediction unit is a first feature vector, and the second feature information obtained by the second prediction unit is a second feature vector;
the probability determination unit includes:
the vector summation subunit is used for summing the first feature vector and the second feature vector to obtain a third feature vector;
the vector prediction subunit is used for inputting the third feature vector into a fully-connected network of a classification model to obtain a fourth feature vector for representing that the user belongs to a telecom fraud user, and the classification model comprises the fully-connected network and a normalization function layer;
and the probability determining subunit is used for inputting the fourth feature vector to a normalization function layer of a classification model to obtain the probability that the user output by the normalization function layer belongs to the telecom fraud user.
From the above contents, the feature information of the user with the risk of telecommunication fraud is analyzed from the two dimensions of the financial attribute information of the user and the financial-related transaction detail time sequence, so that the method and the device are beneficial to more comprehensively analyzing the possibility of the user with the risk of telecommunication fraud.
Meanwhile, by combining different characteristics of the two dimensional information, the transaction detail time sequence is analyzed by utilizing a first prediction model capable of combining context characteristics, so that the predicted first characteristic information more accurately reflects the condition that the user has the telecom fraud risk; and analyzing the second characteristic information of the telecom fraud risk reflected by the financial attribute information by using a second prediction model, and on the basis, identifying the telecom fraud users according to the characteristic information predicted from two dimensions, thereby realizing more effective identification of the telecom fraud users and improving the reliability and accuracy of identification.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a telco fraud user identification method of the present application;
FIG. 2 is a schematic diagram of a network model architecture provided herein;
FIG. 3 is a schematic block diagram illustrating an exemplary telecommunication fraud user identification apparatus according to the present application.
Detailed Description
The scheme of the application can be suitable for analyzing the users in banks or other financial structures and discovering the users with the risk of telecommunication fraud.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in fig. 1, which shows a schematic flow chart of another embodiment of the telecommunication fraud user identification method of the present application, the method of the present embodiment may include:
s101, obtaining financial attribute information and transaction detail time sequence of a user to be analyzed.
Wherein the transaction detail time series includes: and the transaction detail information of a plurality of different time points corresponding to the user. It can be understood that the time sequence is a sequence of the data according to a time sequence, and therefore, the transaction detail information in the transaction detail time sequence is also arranged according to a sequence of the corresponding time points. For example, a transaction detail time series composed of transaction detail information of transactions at different time points in the last half year can be obtained
The transaction detail information refers to information generated by a user related to financial transaction behaviors, and the transaction detail information at each time point refers to detail information of one transaction behavior generated at the time point.
The transaction detail information of the user at each time point may include one or more kinds of detail information related to one or more kinds of transaction behaviors, and optionally, each time point may correspond to multiple kinds of transaction detail information.
For example, the transaction detail information corresponding to each time point may include: transaction amount, account balance, transaction channel, transaction location, transaction counter account number, transaction type, and counter balance after transaction. The financial institution transaction may include, among other things, transfer and receipt of a transfer.
The transaction channel can be divided into an electronic bank and an Automatic Teller Machine (ATM) channel.
The transaction place records the place where the transaction occurs, if the transaction channel is an electronic bank, the IP Address and the Media Access Control Address (MAC) of the transaction equipment are recorded; if the transaction channel is an ATM, the ATM's device number is recorded.
The transaction counter-party account in the transaction detail is another party user corresponding to the transaction behavior of the user, for example, in the transfer transaction behavior, the transaction counter-party account may be a financial account such as a bank account of the user who accepts the transfer. It is understood that the trading partner account number may be associated with the age, the academic history, the account balance and the like of the trading object associated with the trading partner account number, and these pieces of information have higher correlation with whether the user to be analyzed currently has fraudulent behavior. For example, the higher the age of the counterparty, the lower the learned history, and the higher the account balance, the higher the probability that the counterparty is fraudulent, and correspondingly, the higher the probability that the current object has fraudulent activity.
The transaction types may include internet and banking. The network connection mainly comprises a network connection protocol, a network connection gateway, a network connection business entrustment and a network connection payment; the Unionpay mainly comprises the staged payment of the Unionpay, the payment of the Unionpay protocol and the business entrustment of the Unionpay.
As an alternative, the application may analyze the user's account transfer transaction behavior. Accordingly, the transaction amount, the transaction channel and the like all belong to the relevant information of the transfer transaction.
The financial attribute information of the user is information used for reflecting the user attribute and the financial account characteristics. For example, the financial attribute information of the user may include the user attribute information recorded by a financial institution such as a bank, and the characteristic information of the financial account related to the user stored by the financial institution.
For example, the user attribute information recorded by the financial institution may include age information, academic information, credit information, and the like of the user, and may further include user attribute information related to financial behaviors, such as historical financial crime information, total number of held bank cards, and card change frequency information.
The financial account characteristics may include historical freeze information, historical debit information, transaction frequency information, cash withdrawal frequency information, account balance information, and the like for financial accounts such as the user's bank account. The information can be extracted by analyzing the account information of the user.
It can be understood that, when the total number of cards held by the user is higher, the card changing frequency is higher, the bank account freezing, the deduction history record and the cash drawing frequency of the bank account are higher, and the fraud probability is higher. Meanwhile, the ages of the fraud molecules and the academic records also present a certain clustering relationship, and under the background of novel phishing, the ages of the fraud molecules are younger and have a greater relationship with the academic records, so that the financial attribute information of the user can reflect the characteristics that the user has the risk of telecommunication fraud.
S102, inputting the transaction detail time series into a trained first prediction model, and obtaining first characteristic information of the user, predicted by the first prediction model, belonging to the telecommunication fraud user.
The first prediction model determines the first characteristic information based on the characteristics of each transaction detail information in the transaction detail time series and the context characteristics among the transaction detail information. For example, the first predictive model may include at least one Bi-directional Long Short-Term Memory (Bi-LSTM) network (also referred to as Bi-directional Long-Term Memory network) and a fully-connected network layer coupled to the at least one Bi-LSTM network.
It can be understood that the Bi-LSTM network can analyze the context relationship between the data at each time point in the time series, so as to extract the characteristic information in the time series more accurately, and therefore, the Bi-LSTM-based network can determine the characteristic information for characterizing the user as a telecom fraud user more accurately. For the sake of convenience of distinction, the feature information predicted by the first prediction model is referred to as first feature information.
It will be appreciated that there may be at least one type of transaction detail information at each point in the transaction detail time series, typically a plurality of types of transaction detail information, e.g., there may be 7 types of transaction detail information at each point in time as previously mentioned. Transaction detail information can be classified into numeric and non-numeric types. For example, if the transaction detail information is the amount of the user account, the user transaction detail information is numeric because the amount of the user account is numeric. When the transaction detail information is a transaction counter account, the transaction counter account actually includes the sex and the academic history of the transaction object associated with the transaction object account, which are not numerical types, and therefore, the transaction detail information is non-numerical type.
In order to improve the prediction accuracy of the first prediction model, for any type of transaction detail information in the transaction detail time series, if the transaction detail information is numerical, the numerical value of the transaction detail information in the transaction detail time series can be standardized and smoothed. If the transaction detail information is non-numerical, determining the vector of the transaction detail information at each time point in the transaction detail time sequence.
The method for stabilizing and standardizing the transaction detail information belonging to the numerical type in the transaction detail time sequence can be various, and the method is not limited in the application.
For example, a first order difference party may be used to smooth the time series formed by such time detail information in the transaction time series. Specifically, the transaction detail information at each time point in the transaction detail time series is extracted for the transaction detail information, and a time series only including the transaction detail information at different time points is obtained. For the time sequence of the transaction detail information, the difference value of any two adjacent transaction detail information in the time sequence can be calculated through first-order difference, and the obtained difference value is used for forming a sequence after the first-order difference.
The step of standardizing the time series of the transaction detail information aiming at any numerical transaction detail information refers to the step of scaling the numerical value of each transaction detail information in the time series to a certain range so as to eliminate the influence of dimension on the result.
For example, the following steps are carried out:
assuming that the time sequence before smoothing is X (t) ═ 10,20,35,20,15, the sequence after the first difference is X1(t) ═ 10,15, -15, -5.
For the sequence X1(t) obtained after the first difference to [10,15, -15, -5], if a certain value in the sequence is less than zero, the value is normalized by: the product of the ratio of this value to the smallest value in the first order difference sequence and minus one. If the value is not less than zero, the value is normalized to: the ratio of this value to the largest value in the first order difference sequence. On this basis, the sequence after the first difference is normalized to obtain X2(t) [0.67, 1, -1, -0.33 ].
S103, inputting the financial attribute information into the trained second prediction model to obtain second characteristic information of the user, which is predicted by the second prediction model and belongs to the telecom fraud user.
Wherein the second prediction model may be any neural network model.
Alternatively, the second predictive model may be a fully connected network. Considering that the financial attribute information is independent and has no related attribute information of the same kind, the feature information of the user belonging to the telecom fraud user can be extracted more efficiently by using the full-connection network.
For the sake of convenience of distinction, the feature information extracted by the second prediction model is referred to as second feature information.
It should be noted that, in practical application, the sequence of steps S102 and S103 is not limited to that shown in fig. 1, and the sequence of these two steps may be interchanged or executed simultaneously in practical application.
S104, determining the probability that the user belongs to the telecom fraud user based on the first characteristic information and the second characteristic information.
There may be many possible situations in the manner of determining the corresponding probability of the user by combining the first characteristic information and the second characteristic information.
For example, in the case that the first feature information and the second feature information are both vectors, that is, the first feature information is represented by the first feature vector, and the second feature information is represented by the second feature vector, an average value can be obtained for the two vectors, and then normalization is performed by using a normalization function, so as to obtain the probability that the user belongs to the telo fraud user.
In yet another possible case, the probability that the user belongs to the telecom fraud user can be determined based on the first characteristic information and the second characteristic information and by using the trained classification model.
For example, the classification model may have a variety of possibilities.
Optionally, the classification model is a fully-connected network and a normalization function layer. In this case, if the first feature information is a first feature vector and the second feature information is a second feature vector, the first feature vector and the second feature vector may be summed to obtain a third feature vector, and the third feature vector is input to the full-connection network of the classification model to obtain a fourth feature vector for characterizing that the user belongs to a telecom fraud user. Then, the fourth feature vector is input into a normalization function layer of the classification model, and the probability that the user belongs to the telecommunication fraud user and is output by the normalization function layer is obtained.
It can be understood that, in practical applications, the normalization function layer can output the probability that the user belongs to the telecom fraud user and the probability that the user does not belong to the telecom fraud user, the sum of the two probabilities being 1, and only the analysis needs to be performed in the present application based on the probability that the user belongs to the telecom fraud user.
It can be understood that, the feature information predicted from two directions in combination with the user-related information in two dimensions can more comprehensively reflect the feature that the user has the risk of telecommunication fraud, and therefore, the probability that the user belongs to the telecommunication fraud user can be more accurately obtained by combining the feature information in the two dimensions.
S105, determining that the user has a telecom fraud risk under the condition that the probability that the user belongs to the telecom fraud user is greater than a set threshold value.
The set threshold may be set as needed, for example, the set threshold may be 0.8.
According to the method and the device, the characteristic information of the telecommunication fraud risk of the user is analyzed from the two dimensions of the financial attribute information of the user and the financial-related transaction detail time sequence, and the possibility of the telecommunication fraud risk of the user is analyzed more comprehensively.
Meanwhile, by combining different characteristics of the two dimensional information, the transaction detail time sequence is analyzed by utilizing a first prediction model capable of combining context characteristics, so that the predicted first characteristic information more accurately reflects the condition that the user has the telecom fraud risk; and analyzing the second characteristic information of the telecom fraud risk reflected by the financial attribute information by using a second prediction model, and on the basis, identifying the telecom fraud users according to the characteristic information predicted from two dimensions, so that the reliability and the accuracy of identifying the telecom fraud users can be improved.
It can be understood that, in the case where the first feature information and the second feature information are processed by using the classification model, in order to improve the accuracy of the probability that is finally predicted, the first prediction model, the second prediction model, and the classification model may be synchronously trained.
Specifically, the first prediction model, the second prediction model and the classification model are obtained by utilizing financial attribute information samples and transaction detail time sequence samples of a plurality of positive sample users and financial attribute information samples and transaction detail time sequence samples of a plurality of negative sample users.
Wherein, the positive sample user is an annotated telecom fraud user, and the negative sample user is an annotated non-telecom fraud user.
The transaction detail time series samples of the positive sample user and the negative sample user are input information of the first prediction model, and the financial attribute information samples of the positive sample user and the negative sample user are input information of the second prediction model. The transaction detail time series samples comprise a plurality of transaction detail information samples, and the financial attribute information samples are used for reflecting the user attribute and the information of the financial account characteristics.
The transaction detail information sample has the same meaning and contained information as the previous transaction detail information of the user, and the financial attribute information sample may also refer to the related description of the previous financial attribute information, which is not described herein again.
It is understood that, in the case that the transaction detail time series samples of the positive sample user and the negative sample user are input information of the first prediction model, and the financial attribute information samples of the positive sample user and the negative sample user are input information of the second prediction model, there are many possible ways to train the first prediction model, the second prediction model and the classification model, which is not limited in this application.
For example, for any sample user (a positive sample user or a negative sample user), the transaction detail time series sample of the sample user can be input into the first prediction model, and first prediction characteristic information output by the first prediction model is obtained; meanwhile, the financial attribute information sample of the sample user is input into a second prediction model, and second prediction characteristic information output by the second prediction model is obtained. And adding the first predicted characteristic information and the second characteristic information and inputting the added information into the classification model to obtain the probability of the risk of the user belonging to the telecommunication fraud user predicted by the classification model. And if the probability is greater than a set threshold, the user is considered to be predicted to belong to the telecom fraud user.
And judging whether the integral model formed by the first prediction model, the second prediction model and the classification model converges or not by combining the information actually labeled by each sample user and the predicted result. If not, the internal parameters of the three models can be adjusted and retrained until convergence.
When judging whether the three models are converged, the loss function value can be calculated by using the loss function. For example, the loss function may be a cross-entropy loss function, and the cross-entropy loss function value L may be expressed as the following formula one:
Figure BDA0002719442610000111
where y is the true knot of the sample userThe number of the fruit vectors is the number of the fruit vectors,
Figure BDA0002719442610000112
a result vector predicted for the classification model. If the user belongs to the telecom fraud user, the true result vector can be (1,0), and if the probability that the user is predicted to belong to the telecom fraud user is 0.8 and the probability that the user does not belong to the telecom fraud user is 0.2, the predicted result vector can be (0.8, 0.2).
It is understood that, in order to avoid the overfitting of the model, in the present application, in addition to obtaining the validation set and the test set, the validation set and the test set also include the transaction detail time series samples and financial attribute information of the positive sample user and the negative sample user, and the parameters in the three network models are continuously adjusted by using the validation set, for example, by using the regularization method. Finally, the classification accuracy of the overall model formed by the three models is verified by using a test set.
In order to facilitate understanding of the scheme of the present application, a description will be given below taking as an example one of the constituent structures of the first prediction model, the second prediction model, and the classification model in the present application.
Fig. 2 is a schematic diagram showing a model structure according to the present application.
In fig. 2 the model comprises a first prediction model 201, a second prediction model 202 and a classification model 203. Wherein the first predictive model comprises: two Bi-directional Bi-LSTM networks and one fully connected network layer. For ease of distinction, the fully-connected network layer in the first prediction model is referred to as the first fully-connected network layer.
In order to increase the convergence rate of the model, improve the generalization capability of the model, and effectively avoid overfitting of the network model on the training set, the first prediction model in the network model architecture of fig. 2 further includes: and the batch standardization layer is connected with the output end of each Bi-LSTM in at least one layer of Bi-LSTM network.
As shown in FIG. 2, the output end of each Bi-LSTM network is connected to a Batchnormalization layer, and the output result of the last Batchnormalization layer is inputted to the first fully-connected network layer in the first prediction model.
While the second prediction model is a fully connected network layer, which is labeled as the second fully connected network layer in fig. 2 for ease of distinction.
Wherein the classification model comprises a third fully-connected network layer and a normalization function layer.
On the basis of fig. 2, a transaction detail time series of the user (such as the transaction detail time series after the transaction detail information has been subjected to the normalization and smoothing process or the vector conversion) is input into the first prediction model, and the transaction detail time series in fig. 2 includes the transaction detail information at a plurality of time points from time 1(t 1 in the figure) to time N (tn), where N is the total number of time points in the time series. The first Bi-LSTM of the first prediction model converts the transaction detail time sequence, and inputs the converted vector into BatchNormalization, and similarly, after another Bi-LSTM and BatchNormalization processing are sequentially performed, the risk vector corresponding to different transaction detail information in the transaction detail time sequence can be obtained, the risk vector is a context vector between the transaction detail vector and other transaction detail information, and the context vector can reflect the characteristic that the user has the telecom fraud risk. The risk vector is input to a first fully connected network, resulting in extracted first feature information.
Correspondingly, the financial attribute information of the user is input into the second fully-connected network, and of course, the financial attribute information of the user may be converted into vectors before being input into the second fully-connected network, for example, the financial attribute information of each dimension is converted into vectors by using a word encoding method, and a matrix containing each vector is constructed. The second fully connected network may extract second characteristic information based on the financial attribute information.
As can be seen from fig. 2, the first feature information output by the first fully-connected network and the second feature information output by the second fully-connected network are added and then input to a third fully-connected network of the classification model, and the vector output by the third fully-connected network is normalized. Wherein the normalization layer can comprise two normalization neurons, so that the probability that the user belongs to a telecom fraud user (e.g., fraud probability in FIG. 2) and the probability that the user does not belong to a telecom fraud user (e.g., non-fraud probability in FIG. 2) can be obtained.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Corresponding to a telecommunication fraud user identification method of the present application, the present application further provides a telecommunication fraud user identification apparatus, as shown in fig. 3, which shows a schematic constitutional structure diagram of an embodiment of a telecommunication fraud user identification method of the present application.
As can be seen in fig. 3, the apparatus may include:
an information obtaining unit 301, configured to obtain financial attribute information of a user to be analyzed and a transaction detail time series, where the transaction detail time series includes: the transaction detail information of a plurality of different time points corresponding to the user, and the financial attribute information is information used for reflecting the user attribute and the financial account characteristics;
a first feature prediction unit 302, configured to input the transaction detail time series into a trained first prediction model, so as to obtain first feature information that is predicted by the first prediction model and belongs to a telecom fraud user, where the first prediction model determines the first feature information based on features of each transaction detail information in the transaction detail time series and context features between each transaction detail information;
a second feature prediction unit 303, configured to input the financial attribute information into a trained second prediction model, so as to obtain second feature information of the user, which is predicted by the second prediction model and belongs to a telecom fraud user;
a probability determination unit 304, configured to determine a probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information;
a risky user identifying unit 305 for determining that the user is at risk of telecom fraud if the probability that the user belongs to a telecom fraud user is greater than a set threshold.
In a possible implementation manner, the probability determining unit in the apparatus is specifically configured to determine the probability that the user belongs to a telecom fraud user based on the first feature information and the second feature information and by using a trained classification model.
As an optional manner, the first feature information obtained by the first prediction unit is a first feature vector, and the second feature information obtained by the second prediction unit is a second feature vector;
accordingly, the probability determination unit includes:
the vector summation subunit is used for summing the first feature vector and the second feature vector to obtain a third feature vector;
the vector prediction subunit is used for inputting the third feature vector into a fully-connected network of a classification model to obtain a fourth feature vector for representing that the user belongs to a telecom fraud user, and the classification model comprises the fully-connected network and a normalization function layer;
and the probability determining subunit is used for inputting the fourth feature vector to a normalization function layer of a classification model to obtain the probability that the user output by the normalization function layer belongs to the telecom fraud user.
In a possible implementation manner, the first prediction model, the second prediction model and the classification model are obtained by synchronously training financial attribute information samples and transaction detail time series samples of a plurality of positive sample users and financial attribute information samples and transaction detail time series samples of a plurality of negative sample users;
wherein, the positive sample user is an annotated telecom fraud user, and the negative sample user is an annotated non-telecom fraud user;
the transaction detail time series samples of the positive sample user and the negative sample user are input information of the first prediction model, the financial attribute information samples of the positive sample user and the negative sample user are input information of the second prediction model, the transaction detail time series samples comprise a plurality of transaction detail information samples, and the financial attribute information samples are used for reflecting user attributes and information of financial account characteristics.
In one possible implementation, the first prediction model in the first prediction unit sequentially includes: the network comprises at least one layer of bidirectional long and short term memory network and a full connection network layer connected with the at least one layer of bidirectional long and short term memory network.
Optionally, the first prediction model further includes: and the batch standardization layer is connected with the output end of each bidirectional long-short term memory network in the at least one layer of bidirectional long-short term memory network.
In another possible implementation manner, the method further includes:
the first data processing unit is used for normalizing and smoothing the value of each type of transaction detail information in the transaction detail time sequence if the type of transaction detail information is numerical before the first characteristic prediction unit inputs the transaction detail time sequence into a trained first prediction model;
and the second data processing unit is used for determining the vector of the transaction detail information at each time point in the transaction detail time sequence if the transaction detail information is of a non-numerical type.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.
Meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A telecommunication fraud user identification method, comprising:
obtaining financial attribute information of a user to be analyzed and a transaction detail time series, the transaction detail time series comprising: the transaction detail information of a plurality of different time points corresponding to the user, and the financial attribute information is information used for reflecting the user attribute and the financial account characteristics;
inputting the transaction detail time series into a trained first prediction model to obtain first characteristic information of the user belonging to a telecommunication fraud user predicted by the first prediction model, wherein the first prediction model determines the first characteristic information based on the characteristics of each transaction detail information in the transaction detail time series and the context characteristics among each transaction detail information;
inputting the financial attribute information into a trained second prediction model to obtain second characteristic information of the user, predicted by the second prediction model, belonging to the telecommunication fraud user;
determining a probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information;
determining that the user is at a risk of telecom fraud if the probability that the user belongs to a telecom fraud user is greater than a set threshold.
2. The method as recited in claim 1, wherein said determining, based on said first characteristic information and second characteristic information, a probability that said user belongs to a telecom fraud user comprises:
and determining the probability that the user belongs to the telecommunication fraud user based on the first characteristic information and the second characteristic information and by using the trained classification model.
3. The method of claim 2, wherein the first feature information is a first feature vector and the second feature information is a second feature vector;
the determining the probability that the user belongs to the telecommunication fraud user based on the first characteristic information and the second characteristic information and by using the trained classification model comprises the following steps:
summing the first feature vector and the second feature vector to obtain a third feature vector;
inputting the third feature vector into a fully-connected network of a classification model, so as to obtain a fourth feature vector for representing that the user belongs to a telecom fraud user, wherein the classification model comprises the fully-connected network and a normalization function layer;
inputting the fourth feature vector into a normalization function layer of a classification model, and obtaining the probability that the user belongs to a telecom fraud user, which is output by the normalization function layer.
4. The method according to any one of claims 2 or 3, wherein the first prediction model, the second prediction model and the classification model are obtained by synchronously training financial attribute information samples and transaction detail time series samples of a plurality of positive sample users and financial attribute information samples and transaction detail time series samples of a plurality of negative sample users;
wherein, the positive sample user is an annotated telecom fraud user, and the negative sample user is an annotated non-telecom fraud user;
the transaction detail time series samples of the positive sample user and the negative sample user are input information of the first prediction model, the financial attribute information samples of the positive sample user and the negative sample user are input information of the second prediction model, the transaction detail time series samples comprise a plurality of transaction detail information samples, and the financial attribute information samples are used for reflecting user attributes and information of financial account characteristics.
5. The method of claim 1, wherein the first predictive model comprises, in order: the network comprises at least one layer of bidirectional long and short term memory network and a full connection network layer connected with the at least one layer of bidirectional long and short term memory network.
6. The method of claim 5, wherein the first predictive model further comprises: and the batch standardization layer is connected with the output end of each bidirectional long-short term memory network in the at least one layer of bidirectional long-short term memory network.
7. The method of claim 1, wherein prior to said inputting said transaction detail time series into a trained first predictive model, further comprising:
for each type of transaction detail information in the transaction detail time sequence, if the transaction detail information is numerical, standardizing and smoothing the numerical value of the transaction detail information in the transaction detail time sequence;
if the transaction detail information is non-numerical, determining the vector of the transaction detail information at each time point in the transaction detail time sequence.
8. A telecommunication fraud user identification apparatus, comprising:
an information obtaining unit configured to obtain financial attribute information of a user to be analyzed and a transaction detail time series, the transaction detail time series including: the transaction detail information of a plurality of different time points corresponding to the user, and the financial attribute information is information used for reflecting the user attribute and the financial account characteristics;
a first feature prediction unit, configured to input the transaction detail time series into a trained first prediction model, and obtain first feature information that is predicted by the first prediction model and belongs to a telecom fraud user, where the first prediction model determines the first feature information based on features of each transaction detail information in the transaction detail time series and context features between each transaction detail information;
the second characteristic prediction unit is used for inputting the financial attribute information into a trained second prediction model to obtain second characteristic information of the user, predicted by the second prediction model, belonging to the telecommunication fraud user;
a probability determination unit, configured to determine a probability that the user belongs to a telecom fraud user based on the first characteristic information and the second characteristic information;
a risky user identification unit for determining that the user is at risk of telecom fraud if the probability that the user belongs to a telecom fraud user is greater than a set threshold.
9. The apparatus as claimed in claim 8, wherein the probability determining unit is specifically configured to determine the probability that the user belongs to a telecom fraud user based on the first feature information and the second feature information and using a trained classification model.
10. The apparatus according to claim 9, wherein the first feature information obtained by the first prediction unit is a first feature vector, and the second feature information obtained by the second prediction unit is a second feature vector;
the probability determination unit includes:
the vector summation subunit is used for summing the first feature vector and the second feature vector to obtain a third feature vector;
the vector prediction subunit is used for inputting the third feature vector into a fully-connected network of a classification model to obtain a fourth feature vector for representing that the user belongs to a telecom fraud user, and the classification model comprises the fully-connected network and a normalization function layer;
and the probability determining subunit is used for inputting the fourth feature vector to a normalization function layer of a classification model to obtain the probability that the user output by the normalization function layer belongs to the telecom fraud user.
CN202011083252.7A 2020-10-12 2020-10-12 Telecommunication fraud user identification method and device Pending CN112150153A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011083252.7A CN112150153A (en) 2020-10-12 2020-10-12 Telecommunication fraud user identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011083252.7A CN112150153A (en) 2020-10-12 2020-10-12 Telecommunication fraud user identification method and device

Publications (1)

Publication Number Publication Date
CN112150153A true CN112150153A (en) 2020-12-29

Family

ID=73951441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011083252.7A Pending CN112150153A (en) 2020-10-12 2020-10-12 Telecommunication fraud user identification method and device

Country Status (1)

Country Link
CN (1) CN112150153A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011884A (en) * 2021-01-29 2021-06-22 腾讯科技(深圳)有限公司 Account feature extraction method, device and equipment and readable storage medium
CN114066490A (en) * 2022-01-17 2022-02-18 浙江鹏信信息科技股份有限公司 GoIP fraud nest point identification method, system and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130104231A (en) * 2012-03-13 2013-09-25 주식회사 한국프라임테크놀로지 Financial fraud suspicious transaction monitoring system and a method thereof
CN109410036A (en) * 2018-10-09 2019-03-01 北京芯盾时代科技有限公司 A kind of fraud detection model training method and device and fraud detection method and device
CN110458576A (en) * 2019-07-31 2019-11-15 同济大学 The network trading that detects is counter in a kind of fusion ex ante forecasting and thing cheats method
CN110718223A (en) * 2019-10-28 2020-01-21 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for voice interaction control
CN111222026A (en) * 2020-01-09 2020-06-02 支付宝(杭州)信息技术有限公司 Training method of user category identification model and user category identification method
CN111401906A (en) * 2020-03-05 2020-07-10 中国工商银行股份有限公司 Transfer risk detection method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130104231A (en) * 2012-03-13 2013-09-25 주식회사 한국프라임테크놀로지 Financial fraud suspicious transaction monitoring system and a method thereof
CN109410036A (en) * 2018-10-09 2019-03-01 北京芯盾时代科技有限公司 A kind of fraud detection model training method and device and fraud detection method and device
CN110458576A (en) * 2019-07-31 2019-11-15 同济大学 The network trading that detects is counter in a kind of fusion ex ante forecasting and thing cheats method
CN110718223A (en) * 2019-10-28 2020-01-21 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for voice interaction control
CN111222026A (en) * 2020-01-09 2020-06-02 支付宝(杭州)信息技术有限公司 Training method of user category identification model and user category identification method
CN111401906A (en) * 2020-03-05 2020-07-10 中国工商银行股份有限公司 Transfer risk detection method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011884A (en) * 2021-01-29 2021-06-22 腾讯科技(深圳)有限公司 Account feature extraction method, device and equipment and readable storage medium
CN113011884B (en) * 2021-01-29 2023-08-04 腾讯科技(深圳)有限公司 Account feature extraction method, device, equipment and readable storage medium
CN114066490A (en) * 2022-01-17 2022-02-18 浙江鹏信信息科技股份有限公司 GoIP fraud nest point identification method, system and computer readable storage medium

Similar Documents

Publication Publication Date Title
CA3065807C (en) System and method for issuing a loan to a consumer determined to be creditworthy
US20220020026A1 (en) Anti-money laundering methods and systems for predicting suspicious transactions using artifical intelligence
US9185095B1 (en) Behavioral profiling method and system to authenticate a user
Shen et al. Application of classification models on credit card fraud detection
Kim et al. Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost
US20130124393A1 (en) Connecting decisions through customer transaction profiles
US20140279527A1 (en) Enterprise Cascade Models
CN112150153A (en) Telecommunication fraud user identification method and device
Ruiz et al. Credit scoring in microfinance using non-traditional data
Yeşilkanat et al. An adaptive approach on credit card fraud detection using transaction aggregation and word embeddings
EP4060563A1 (en) Automatic profile extraction in data streams using recurrent neural networks
Diwate et al. Loan Approval Prediction Using Machine Learning
Thisarani et al. Artificial intelligence for futuristic banking
Devika et al. Credit card fraud detection using logistic regression
Makolo et al. Credit card fraud detection system using machine learning
Abdulghani et al. Credit card fraud detection using XGBoost algorithm
CN110458684A (en) A kind of anti-fraud detection method of finance based on two-way shot and long term Memory Neural Networks
CN113269629A (en) Credit limit determining method, electronic equipment and related product
Kang Fraud Detection in Mobile Money Transactions Using Machine Learning
Wu Real-time Predictive Analysis of Loan Risk with Intelligent Monitoring and Machine Learning Technique
CN117391709B (en) Internet payment management method
US11900385B1 (en) Computerized-method and system for predicting a probability of fraudulent financial-account access
US20230088840A1 (en) Dynamic assessment of cryptocurrency transactions and technology adaptation metrics
US11694208B2 (en) Self learning machine learning transaction scores adjustment via normalization thereof accounting for underlying transaction score bases relating to an occurrence of fraud in a transaction
Smiles et al. Data mining based hybrid latent representation induced ensemble model towards fraud prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination