CN112488163A - Abnormal account identification method and device, computer equipment and storage medium - Google Patents

Abnormal account identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112488163A
CN112488163A CN202011286403.9A CN202011286403A CN112488163A CN 112488163 A CN112488163 A CN 112488163A CN 202011286403 A CN202011286403 A CN 202011286403A CN 112488163 A CN112488163 A CN 112488163A
Authority
CN
China
Prior art keywords
account
target
data
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011286403.9A
Other languages
Chinese (zh)
Inventor
殷振滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202011286403.9A priority Critical patent/CN112488163A/en
Publication of CN112488163A publication Critical patent/CN112488163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application belongs to the field of artificial intelligence, and relates to a method for identifying an abnormal account, which comprises the following steps: acquiring account information data and account operation data of a target account; inputting the account operation data into a target LSTM model to obtain a first probability that the target account is an abnormal account, and inputting the account information data into a target deep FM model to obtain a second probability that the target account is an abnormal account; and combining the first probability and the second probability, inputting the combined probability into a target LR model, and outputting a judgment result that the target account is an abnormal account. The application also provides a device for identifying the abnormal account, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and in order to realize stability and safety of data storage, the account information data and the account operation data can be stored in the block chain.

Description

Abnormal account identification method and device, computer equipment and storage medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an abnormal account identification method and device, computer equipment and a storage medium.
Background
In the internet era, different users have different accounts on various internet website platforms. When the account is not kept properly, is attacked or the user of the account performs illegal operations, such as issuing false illegal information, violating platform rules, the system usually sets the account as an abnormal account to avoid the benefit loss and privacy leakage of the platform or the account owner.
Currently, a means for determining account abnormality generally compares the currently acquired information about an account with legal information set for the account in advance, or determines whether the current account operation is a legal operation to determine whether the account is abnormal, for example, for a social network account, it is common to acquire the current login information (e.g., login location, login device) of a user when the user logs in the account, and then determine whether the currently logged in account is abnormal by comparing the current login information with the set legal login information.
However, the above method for identifying an abnormal account is difficult to apply in a scenario where some account abnormalities are not simply whether the account information is abnormal or whether the account operation is legal. Usually, for example, in a scene of identifying wool party account numbers, a plurality of virtual account numbers are constructed by a wool party user by adopting equipment such as a cat pool to participate in a large amount of discount activities to win violence, but the virtual account numbers, namely the wool party account numbers, are usually identified as normal account numbers on account number information, and the operation of participating in discount activities is usually legal behavior allowed by a system.
Disclosure of Invention
An embodiment of the application aims to provide an abnormal account identification method, an abnormal account identification device, a computer device and a storage medium, so as to solve the technical problem that the abnormal account cannot be effectively identified in a complex scene in the prior art.
In order to solve the above technical problem, an embodiment of the present application provides a method for identifying an abnormal account, which adopts the following technical scheme:
acquiring account information data and account operation data of a target account;
inputting the account operation data into a target LSTM model to obtain a first probability that the target account is an abnormal account, and inputting the account information data into a target deep FM model to obtain a second probability that the target account is an abnormal account;
and combining the first probability and the second probability, inputting the combined probability into a target LR model, and outputting a judgment result that the target account is an abnormal account.
In order to solve the above technical problem, an embodiment of the present application further provides an apparatus for identifying an abnormal account, which adopts the following technical scheme:
the first acquisition module is used for acquiring account information data and account operation data of the target account;
the calculation module is used for inputting the account operation data into a target LSTM model to obtain a first probability that the target account is an abnormal account, and inputting the account information data into a target deep FM model to obtain a second probability that the target account is an abnormal account;
and the judging module is used for combining the first probability and the second probability and inputting the combined probability into a target LR model, and outputting a judging result that the target account is an abnormal account.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory having computer readable instructions stored therein and a processor, the processor implementing the steps of the above-described abnormal account number identification method when executing the computer readable instructions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer-readable storage medium, having computer-readable instructions stored thereon, which, when executed by a processor, implement the steps of the above-described abnormal account identification method.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
according to the method and the device, after account information data and account operation data of the target account are obtained, the account operation data are input into the target LSTM model, the account information data are input into the target deep FM model, output results of the target LSTM model and the target deep FM model are combined and input into the target LR model, and finally the target LR model outputs the judgment result that the target account is an abnormal account. In summary, in the embodiment of the present application, an integration model is provided to process account information data and account operation data of a target account and output a determination result of whether the target account is an abnormal account.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
fig. 2 is a flowchart illustrating an embodiment of an abnormal account identification method according to the present application;
FIG. 3 is a schematic structural diagram of an integration model applied in an embodiment of the present application;
FIG. 4 is a flowchart illustrating an embodiment before step S202 in FIG. 2;
fig. 5 is a schematic diagram of an embodiment of an abnormal account number identification apparatus 500 according to the present application;
fig. 6 is a schematic diagram of another embodiment of an abnormal account number identification apparatus 500 according to the present application;
FIG. 7 is a schematic diagram of one embodiment of a computer device 700 of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, motion Picture experts Group Audio Layer III (MP 3) players, motion Picture experts Group Audio Layer IV (MP 4) players, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the method for identifying an abnormal account provided in the embodiment of the present application generally includesServer/terminal DeviceThe execution is carried out, and correspondingly, the identification device of the abnormal account is generally arranged inServer/terminal deviceIn (1).
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2, a flow diagram of one embodiment of a method for abnormal account number identification in accordance with the present application is shown. The identification method of the abnormal account number comprises the following steps:
step S201, account information data and account operation data of the target account are acquired.
In this embodiment, the first electronic device (for example, the one shown in fig. 1) on which the identification method of the abnormal account operatesServer/terminal device) The method can interact with the external equipment in a wired connection mode or a wireless connection mode, and receives account information data and account operation data of the target account sent by the external equipment or actively acquired and obtained by the external equipment. The account information data of the target account may include current account information data or historical account information data. The account operation data of the target account may be used to indicate various operation data of the target account in the system, such as account login record data, use data of some functions in the system, record data of some events in the system, and the like, and the specific contained data may be set by the user in advance, which is not limited specifically here. It should be noted that the wireless connection manners may include, but are not limited to, 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, Ultra Wideband (UWB) connection, and others that are already in use todayKnown or developed in the future.
In one possible implementation manner, before step S201, the target account may be an account that is determined to be possibly in an abnormal state after being screened by a simple preliminary screening condition. Specifically, account information data or account operation data of a plurality of accounts may be obtained in advance, and then, a target account that may meet the abnormal account primary screening condition is screened from the account information data or the account operation data of the plurality of accounts by using a preset primary screening condition. For example, the abnormal account is an account of a woolen party, the operation data of the account mainly includes related record data of the account participating in the preferential activity, and the preliminary screening condition may include the frequency of the account participating in the preferential activity; such as a degree of discount. For example, if an account participates in a plurality of preferential activities and the participation frequency is high, the account may be a woollen party; as another example, the average of the preferential amplitude of the plurality of activities engaged in, if too high, may also be a woolen party or the like. Because wool party is a few after all, and adopt integrated model discernment calculated amount great, through setting for some simple abnormal account number preliminary screening conditions in this embodiment, can reduce data acquisition, alleviate the calculation burden.
In a specific implementation manner, the account information data and the account operation data of the target account may be obtained in a data point-burying manner. The process of collecting information by data burying points is also generally called log collection, and specifically, a code can be implanted in an APP or web product to monitor the record of account operation events (for example, clicking to participate in a certain lottery activity, getting a coupon activity, etc.). Once the event is triggered, the user uploads information about the event, defined in the buried point code, that needs to be uploaded. The specific activities to be recorded, account information data, and account operation data content may be preset by the user, for example, the account information data may specifically include one or more of a user ID, a user device ID, IP information of the user device, friend information of the user, address information of the user, and the like, and the account operation data may be one or more of a current event code, a trigger time, a user history participation event record, and the like.
Step S202, inputting the account number operation data into a target LSTM model to obtain a first probability that the target account number is an abnormal account number, and inputting the account number information data into a target deep FM model to obtain a second probability that the target account number is an abnormal account number.
In order to obtain an identification model with excellent generalization performance, in the embodiment of the present application, an integrated model with a network structure including multiple levels and multiple sub-models, which is obtained through pre-training, is preset locally in the first electronic device based on an idea of ensemble learning (ensemble learning) in machine learning.
Specifically, the existing integration model may adopt various fusion methods, for example, methods such as average fusion, weighted fusion, voting fusion, and the like, and the integration model used in this embodiment and subsequent embodiments may be a stacking structure. The Stacking is an ensemble learning method for performing heterogeneous integration on a plurality of base learners, and aims to improve the generalization performance of the model. In this application, the integrated model obtained by the training includes a two-layer structure, and a schematic structural diagram of the integrated model may refer to the diagram shown in fig. 3, and the integrated model may include:
a low-level target long-short term memory (LSTM) model and a target DeepFM model, and a high-level target Logistic Regression (LR) model. And respectively outputting the primary recognition results after the original data are respectively input into the target LSTM model and the target deep FM model. And finally, integrating the initial recognition results output in the previous step, calculating the initial recognition results as the input of the target LR model, and outputting the final recognition results.
Referring to the schematic structural diagram shown in fig. 3, after the account information data and the account operation data of the target account are obtained, the account operation data may be used as an input of the target LSTM model, so as to output a first probability that the target account is an abnormal account, and the account information data may be used as an input of the target deep fm model, so as to output a second probability that the target account is an abnormal account.
Step S203, combining the first probability and the second probability and inputting the combined probabilities into a target LR model, and outputting a judgment result that the target account is an abnormal account.
In this embodiment, referring to the schematic diagram shown in fig. 3, after the first probability and the second probability are obtained, the first probability and the second probability may be combined, that is, the first probability and the second probability are input into the target LR model as a group of data to be determined, and the final determination result that the target account is an abnormal account is output. For example, if the first probability is 0.8 and the second probability is 0.75, the first probability and the second probability are combined to form a set of numbers (0.8.0.75), and the set of numbers is input to the target LR model to output the final discrimination result.
In a possible implementation manner, if the target LSTM model and the target deep fm model are both provided with different weight values, the first probability and the second probability are combined and input into the target LR model, which is specifically a weighted combination manner. Specifically, assuming that the weights of the target LSTM model and the target DeepFM model are 1.0 and 0.8, respectively, the first probability is 0.8, and the second probability is 0.75, the combined weights are (0.8 × 1.0, 0.8 × 0.75), that is, (0.8,0.6), and the number of sets is input to the target LR model, and the final discrimination result is output.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
according to the method and the device, after account information data and account operation data of the target account are obtained, the account operation data are input into the target LSTM model, the account information data are input into the target deep FM model, output results of the target LSTM model and the target deep FM model are combined and input into the target LR model, and finally the target LR model outputs the judgment result that the target account is an abnormal account. In summary, in the embodiment of the present application, an integration model is provided to process account information data and account operation data of a target account and output a determination result of whether the target account is an abnormal account.
In some possible implementations, the pre-training process of the integrated model, i.e., the target LSTM model, the target deep fm model, and the target LR model, may be performed on a first electronic device, or may be performed on another second electronic device, which may be a high-performance computing device (e.g., a workstation or a dedicated server), and the pre-training process is pushed to the first electronic device by the second electronic device after the training is completed. In the embodiment of the present application, a training process of a target integrated model is described by taking a first electronic device as an example, and a training process performed on a second electronic device is similar to the training process, which is not described in detail herein. Fig. 4 may be referred to in a specific training process, where fig. 4 is a schematic view of an embodiment before step S202 in fig. 2, and the method for identifying an abnormal account may further include:
step S401, an account operation data set and an account information data set are constructed.
In this embodiment, the data acquisition process is as that in step S201, the data is acquired in a data embedding manner, and the acquired data may be stored in a database. If the database has the data related to the abnormal account user and the normal user, which are provided with the user tags, the account operation data and the account information data of the abnormal account and the normal user, which meet the preset requirements, can be extracted from the related data according to the types of the account operation data and the account information data, so that an account operation data set and an account information data set are obtained. The preset requirements may be a preset number or proportion, for example, the total amount of the required data is 10000, 5000 abnormal account numbers and 5000 normal users. If the database does not contain the related data about the abnormal account number users and the normal users, or the existing related data does not meet the preset requirements, the deficient data can be subsequently collected and labeled. Specifically, account operation data and account information data of all users are extracted from the database, and labels can be manually set for the data, so that account operation data sets and account information data sets meeting preset requirements are finally obtained.
It should be noted that each piece of account operation data in the account operation data set has one piece of account information data in the account information data set in a one-to-one correspondence manner, and each set of the one-to-one corresponding account operation data and each piece of the account information data correspond to the same account main body. The labels of the account operation data and the account information data of the same user in the account operation data set and the account information data set are the same, if the account operation data set and the account information data set have the condition that the same user corresponds to different labels, the labels can be independently removed, and the labels are manually marked again and then added into the data set.
In an actual application scenario, the abnormal account related to the embodiment of the present application may include a wool party account, and the identification of the abnormal account is also the identification of the wool party account. One of the most prominent features of the wool party is that they are rewarded for their time intervals when obtaining coupons and engaging in various activities, which are less than normal users. Therefore, time difference values of each account participating in each preset preferential activity can be collected in a data point burying mode, a time difference value sequence is constructed and formed, and the time difference value sequence data is used as account operation data. The time difference sequence data may be exemplified as follows:
(1) user X logs in APP at time 09: 52: 20;
(2) time to participate in campaign 1 at 09: 52: 25;
(3) time to participate in campaign 2 at 09: 52: 28;
(4) time to exit APP 09: 52:30.
The resulting time difference sequence is [5,3,2 ].
It should be emphasized that, in order to further ensure the privacy and security of the account operation data and the account information data, the account operation data and the account information data may also be stored in a node of a block chain, where the block chain may be a private chain or a federation chain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
At this time, the constructing the account operation data set in step S401 may include:
and after time difference value sequence data of a plurality of accounts participating in a plurality of preset preferential activities are collected, an account operation data set containing the time difference value sequence data is constructed.
Step S402, the account operation data set is divided into a first training set and a first testing set, and the account information data set is divided into a second training set and a second testing set.
In this embodiment, there may be a plurality of ways to divide the data set into the training set and the test set, the division ratio may be preset, for example, may be 8:2, and the specific division method is a common prior art at present, and is not described herein again.
Step S403, performing cross training on the initial LSTM model by using the first training set after data segmentation, predicting the first training set and the first testing set by using the target LSTM model obtained by the cross training, and respectively outputting LSTM training set prediction data and LSTM testing set prediction data.
Among them, LSTM is a special Recurrent Neural Network (RNN) that can learn long-term dependence information. There are many versions of LSTM, one important version is gru (gated recovery unit), and the present proposal does not limit the specific version of the LSTM model.
The data segmentation of the first training set, the cross training of the initial LSTM model and other processes relate to the cross validation process in the Stacking integrated model, and the process is as follows:
(1) after the account operation data set is divided into a first training set and a first test set, the first training set is divided into K (positive integers which can be defined by self) sets with consistent data quantity, one set is taken as a verification set, and the rest sets are sub-training sets to be combined into K different training combinations.
(2) And training the initial LSTM model by alternately utilizing the sub-training sets in the training combination, and predicting the verification set and the test set in the training combination by utilizing the target LSTM model obtained by each training to obtain K groups of verification set prediction data and K groups of test set prediction data.
(3) And combining K groups of verification set prediction data as LSTM training set prediction data, and taking prediction data obtained by averaging K groups of test set prediction data as final LSTM test set prediction data.
The following examples illustrate:
for example, assuming that 3500 group data in total are in the account operation data set, after data segmentation, the training set has 3000 group data in total, 500 group data in the test set, and the group number K is set to 3, where K is the group number, the training process is as follows:
step A: setting 1-1000 data in 3000 as a verification set, training an initial LSTM model by taking 1001-3000 as a sub-training set, and after the training is finished, respectively predicting 1-1000 verification sets and test sets by using a target LSTM model obtained by a first round of training to obtain 1000X 1 matrix-form LSTM verification set prediction data of 1-1000 and 500X 1 matrix-form first round LSTM test set prediction data;
and B: setting 1001-;
and C: setting 2001 + 3000 in 3000 as a verification set, training an initial LSTM model by using 1-2000 as a sub-training set, and after the training is finished, respectively predicting 2001 + 3000 verification sets and test sets by using a target LSTM model obtained by the third round of training to obtain LSTM verification set prediction data and third round of LSTM test set prediction data in a 1000 x 1 matrix form of 2001 + 3000;
step D: and merging the 3 groups of LSTM verification set prediction data in the steps to finally obtain 1-3000 LSTM training set prediction data in a 3000 x 1 matrix form, and averaging the 3 rounds of LSTM test set prediction data to obtain final 500 x 1LSTM test set prediction data.
And S404, performing cross training on the initial DeepFM model by using the second training set after data segmentation, predicting the second training set and the second testing set by using the target DeepFM model obtained through the cross training, and respectively outputting DeepFM training set prediction data and DeepFM testing set prediction data.
The DeepFM algorithm effectively combines the advantages of a factorization machine and a neural network in feature learning: the combined features of low order and high order are extracted at the same time, so the method is more and more widely used. In deep FM, an FM algorithm is responsible for extracting the first-order features and second-order features formed by pairwise combination of the first-order features; the DNN algorithm is responsible for extracting features of high-order features formed by fully connecting input first-order features and the like. The deep FM algorithm combines the advantages of the breadth model and the depth model, the FM model and the DNN model are jointly trained, and low-order feature combination and high-order feature combination can be simultaneously learned; end-to-end model, without feature engineering; DeepFM shares the same input and embedding vector, training is more efficient.
In this embodiment, the data segmentation performed on the second training set and the cross training on the initial deep fm model are similar to the related processes for the first training set and the initial LSTM model in step S403, and are not described here again.
Step S405, the LSTM training set prediction data and the DeepFM training set prediction data, the LSTM test set prediction data and the DeepFM test set prediction data are combined respectively to obtain training set prediction data and test set prediction data.
In this embodiment, after merging, data dimensions may be increased, for example, referring to the example illustrated in step S403, if the data volume of the training set is 3000 and the test set is 500, and after the above step S403 and step S404, the obtained LSTM training set prediction data and the deep fm training set prediction data are both data in the form of 3000 × 1 matrix, the obtained training set prediction data after merging is data in the form of 3000 × 2 matrix, and correspondingly, the obtained test set prediction data after merging is also data in the form of 500 × 2 matrix.
In some possible implementations, if the target LSTM model and the target deep fm model are both provided with weights, the prediction data may be weighted and combined when combined. For example, the weights of the target LSTM model and the target deep fm model are 1.0 and 0.8, respectively, the probability value in the target LSTM model is 0.9 for the same account, and the probability value in the target deep fm model is 0.8, then after merging, the obtained set of data is (0.9 × 1, 0.8 × 0.8), that is, (0.9, 0.64). The setting of the specific model weight value is set by the user, and is not limited too much here. If a weight is set, in step S203, when the first probability and the second probability are combined, weighted combination is also necessary.
And step S406, training an initial LR model by using the training set prediction data and the test set prediction data to obtain the target LR model.
Specifically, the LR model is trained using the 3000 × 2 probability matrix as a training set, and the training result is controlled using the prediction data as a reference, and finally the target LR model is obtained through training.
In some possible implementation manners, the number of target models obtained by the initial LSTM and the initial deep fm model at the end of the training process is K in number of parts of the training set during the cross validation training, that is, K parallel target LSTM models and K parallel target deep fm models.
In this case, in step S203, the first probability and the second probability are combined and input to a target LR model, and a determination result that the target account is an abnormal account is output, which may be divided into two processing manners:
(1) respectively inputting account number operation data and account number information data of a target account number into K target LSTM models and K target DeepFM models to respectively obtain K first probabilities and K second probabilities; and respectively combining the K first probabilities and the K second probabilities by taking the mean values firstly, or performing weighted combination, inputting the mean values of the pair of probabilities into a target LR model, and outputting the finally determined target account number as the judgment result of the abnormal account number.
(2) Respectively inputting account number operation data and account number information data of a target account number into K target LSTM models and K target DeepFM models to respectively obtain K first probabilities and K second probabilities; merging or weighting and merging to obtain K pairs of probability pairs; inputting the K pairs of probability pairs into LR to obtain K probability values; and averaging the K probability values to obtain a final judgment result that the target account is an abnormal account.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
in an embodiment of the present application, a method for performing integrated model training on a first electronic device is provided.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, the processes of the embodiments of the methods described above can be included. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 5, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an apparatus for identifying an abnormal account, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for identifying an abnormal account according to this embodiment includes:
a first obtaining module 501, configured to obtain account information data and account operation data of a target account;
a calculating module 502, configured to input the account number operation data into a target LSTM model to obtain a first probability that the target account number is an abnormal account number, and input the account number information data into a target deep fm model to obtain a second probability that the target account number is an abnormal account number;
the determining module 503 is configured to combine the first probability and the second probability and input the combined probability into the target LR model, and output a determination result that the target account is an abnormal account.
In some possible implementations, with further reference to fig. 6, fig. 6 is a schematic diagram of another embodiment of an apparatus 500 for identifying an abnormal account according to an embodiment of the present application, where the apparatus 500 for identifying an abnormal account may further include:
a data set construction module 504, configured to construct an account operation data set and an account information data set;
a data dividing module 505, configured to divide the account operation data set into a first training set and a first test set, and divide the account information data set into a second training set and a second test set;
the first training module 506 is configured to perform cross training on the initial LSTM model by using the first training set after data segmentation to obtain the target LSTM model, predict the first training set and the first test set by using the target LSTM model, and output LSTM training set prediction data and LSTM test set prediction data respectively;
the second training module 507 is configured to perform cross training on the initial deep fm model by using the second training set after data segmentation, predict the second training set and the second test set by using the target deep fm model obtained through the cross training, and output deep fm training set prediction data and deep fm test set prediction data respectively;
a merging module 508, configured to merge the LSTM training set prediction data and the DeepFM training set prediction data, the LSTM testing set prediction data and the DeepFM testing set prediction data, respectively, to obtain training set prediction data and testing set prediction data;
a third training module 509, configured to train the initial LR model with the training set prediction data and the test set prediction data to obtain the target LR model.
In some possible implementations, the data set constructing module 504 is specifically configured to obtain time difference value sequence data of each activity that a user participates in, and construct an account operation data set including the time difference value sequence data and an account information data set including the account information data.
In some possible implementations, the apparatus 500 for identifying an abnormal account may further include:
the second acquisition module is used for acquiring account information data or account operation data of a plurality of accounts;
and the screening module is used for determining the target account which meets the primary screening condition of the abnormal account in the account information data or the account operation data of the plurality of accounts.
In some possible implementation manners, the determining module 503 is specifically configured to weight and combine the first probability and the second probability according to weight values respectively set for the target LSTM model and the target DeepFM model in advance, input the weighted and combined first probability and second probability into the target LR model, and output a determination result that the target account is an abnormal account.
In some possible implementations, the first training module 506 specifically includes:
the data segmentation submodule is used for dividing the first training set into K sets with consistent data size, taking one set as a verification set, and taking the rest sets as sub-training sets to obtain K different training combinations, wherein K is a positive integer;
the cross training submodule is used for training the initial LSTM model by utilizing the sub-training sets in each training combination in a cross mode, and predicting the verification set and the test set of the current round by utilizing the target LSTM model obtained by each round of training to obtain K groups of verification set prediction data and K groups of test set prediction data;
and the merging submodule is used for merging the K groups of verification set prediction data as the LSTM training set prediction data, and averaging the K groups of test set prediction data to obtain the final LSTM test set prediction data.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
in the embodiment of the application, after acquiring the account information data and the account operation data of the target account, the identification device 500 for the abnormal account inputs the account operation data into the target LSTM model, inputs the account information data into the target deep fm model, merges the output results of the target LSTM model and the target deep fm model, and inputs the output results into the target LR model, and finally outputs the determination result that the target account is the abnormal account by the target LR model. In summary, in the embodiment of the present application, an integration model is provided to process account information data and account operation data of a target account and output a determination result of whether the target account is an abnormal account.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 7, fig. 7 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 700 includes a memory 701, a processor 702, and a network interface 703 communicatively coupled to each other via a system bus. It is noted that only a computer device 700 having components 61-703 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 701 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 701 may be an internal storage unit of the computer device 700, such as a hard disk or a memory of the computer device 700. In other embodiments, the memory 701 may also be an external storage device of the computer device 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 700. Of course, the memory 701 may also include both internal and external memory units of the computer device 700. In this embodiment, the memory 701 is generally used for storing an operating system and various application software installed on the computer device 700, such as computer readable instructions of the above-mentioned identification method for abnormal account numbers. In addition, the memory 701 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 702 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 702 is generally configured to control the overall operation of the computer device 700. In this embodiment, the processor 702 is configured to execute computer readable instructions stored in the memory 701 or process data, for example, execute computer readable instructions of the identification method of the abnormal account.
The network interface 703 may include a wireless network interface or a wired network interface, and the network interface 703 is generally used to establish a communication connection between the computer device 700 and other electronic devices.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
according to the method and the device, after account information data and account operation data of the target account are obtained, the account operation data are input into the target LSTM model, the account information data are input into the target deep FM model, output results of the target LSTM model and the target deep FM model are combined and input into the target LR model, and finally the target LR model outputs the judgment result that the target account is an abnormal account. In summary, in the embodiment of the present application, an integration model is provided to process account information data and account operation data of a target account and output a determination result of whether the target account is an abnormal account.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores computer-readable instructions, which can be executed by at least one processor, so as to cause the at least one processor to execute the steps of the method for identifying an abnormal account number as described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
according to the method and the device, after account information data and account operation data of the target account are obtained, the account operation data are input into the target LSTM model, the account information data are input into the target deep FM model, output results of the target LSTM model and the target deep FM model are combined and input into the target LR model, and finally the target LR model outputs the judgment result that the target account is an abnormal account. In summary, in the embodiment of the present application, an integration model is provided to process account information data and account operation data of a target account and output a determination result of whether the target account is an abnormal account.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A method for identifying an abnormal account is characterized by comprising the following steps:
acquiring account information data and account operation data of a target account;
inputting the account operation data into a target LSTM model to obtain a first probability that the target account is an abnormal account, and inputting the account information data into a target deep FM model to obtain a second probability that the target account is an abnormal account;
and combining the first probability and the second probability, inputting the combined probability into a target LR model, and outputting a judgment result that the target account is an abnormal account.
2. The method of claim 1, wherein the step of entering the account number operation data into a target LSTM model to obtain a first probability and entering the account number information data into a target deep fm model to obtain a second probability further comprises:
an account operation data set and an account information data set are constructed;
dividing the account operation data set into a first training set and a first testing set, and dividing the account information data set into a second training set and a second testing set;
performing cross training on an initial LSTM model by using the first training set after data segmentation to obtain a target LSTM model, predicting the first training set and the first testing set by using the target LSTM model, and respectively outputting LSTM training set prediction data and LSTM testing set prediction data;
performing cross training on the initial DeepFM model by using the second training set after data segmentation, predicting the second training set and the second testing set by using the target DeepFM model obtained by the cross training, and respectively outputting DeepFM training set prediction data and DeepFM testing set prediction data;
respectively merging the LSTM training set prediction data and the deep FM training set prediction data, the LSTM test set prediction data and the deep FM test set prediction data to obtain training set prediction data and test set prediction data;
and training an initial LR model by using the training set prediction data and the test set prediction data to obtain the target LR model.
3. The identification method according to claim 2, wherein the step of constructing the account operation dataset and the account information dataset specifically comprises:
acquiring time difference value sequence data of each activity participated by a user, and constructing an account operation data set containing the time difference value sequence data and an account information data set containing the account information data.
4. The identification method according to any one of claims 1 to 3, wherein before the obtaining of the account information data and the account operation data of the target account, the identification method further comprises:
acquiring account information data or account operation data of a plurality of accounts;
and determining the target account satisfying the condition of primary screening of the abnormal account in the account information data or the account operation data of the plurality of accounts.
5. The identification method according to any one of claims 1 to 3, wherein the combining the first probability and the second probability into a target LR model and outputting the result of the determination that the target account is an abnormal account comprises:
and according to weight values which are respectively set for the target LSTM model and the target DeepFM model in advance, weighting and combining the first probability and the second probability, inputting the first probability and the second probability into a target LR model, and outputting a judgment result that the target account number is an abnormal account number.
6. The identification method according to any one of claims 1 to 3, wherein the cross-training an initial LSTM model with the first training set after data segmentation to obtain the target LSTM model, predicting the first training set and the first test set with the target LSTM model, and outputting LSTM training set prediction data and LSTM test set prediction data respectively, includes:
dividing the first training set into K sets with consistent data size, taking one set as a verification set and taking the rest sets as sub-training sets to obtain K different training combinations, wherein K is a positive integer;
training the initial LSTM model by alternately utilizing the sub-training sets in each training combination, and predicting the verification set and the test set of the current round by utilizing the target LSTM model obtained by each round of training to obtain K groups of verification set prediction data and K groups of test set prediction data;
and merging the K groups of verification set prediction data as LSTM training set prediction data, and averaging the K groups of test set prediction data to obtain final LSTM test set prediction data.
7. An abnormal account number recognition device, characterized in that, the recognition device includes:
the first acquisition module is used for acquiring account information data and account operation data of the target account;
the calculation module is used for inputting the account operation data into a target LSTM model to obtain a first probability that the target account is an abnormal account, and inputting the account information data into a target deep FM model to obtain a second probability that the target account is an abnormal account;
and the judging module is used for combining the first probability and the second probability and inputting the combined probability into a target LR model, and outputting a judging result that the target account is an abnormal account.
8. The identification device of claim 7, further comprising:
the data set construction module is used for constructing an account operation data set and an account information data set;
the data segmentation module is used for dividing the account operation data set into a first training set and a first testing set and dividing the account information data set into a second training set and a second testing set;
the first training module is used for performing cross training on an initial LSTM model by using the first training set after data segmentation to obtain a target LSTM model, predicting the first training set and the first testing set by using the target LSTM model, and respectively outputting LSTM training set prediction data and LSTM testing set prediction data;
the second training module is used for performing cross training on the initial DeepFM model by using the second training set after data segmentation, predicting the second training set and the second testing set by using the target DeepFM model obtained through the cross training, and respectively outputting DeepFM training set prediction data and DeepFM testing set prediction data;
a merging module, configured to merge the LSTM training set prediction data and the DeepFM training set prediction data, the LSTM testing set prediction data and the DeepFM testing set prediction data, respectively, to obtain training set prediction data and testing set prediction data;
and the third training module is used for training an initial LR model by utilizing the training set prediction data and the test set prediction data to obtain the target LR model.
9. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions, the processor implementing the steps of the method for identifying an abnormal account number according to any one of claims 1 to 6 when executing the computer-readable instructions.
10. A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when executed by a processor, the computer-readable instructions implement the steps of the method for identifying an abnormal account according to any one of claims 1 to 6.
CN202011286403.9A 2020-11-17 2020-11-17 Abnormal account identification method and device, computer equipment and storage medium Pending CN112488163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011286403.9A CN112488163A (en) 2020-11-17 2020-11-17 Abnormal account identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011286403.9A CN112488163A (en) 2020-11-17 2020-11-17 Abnormal account identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112488163A true CN112488163A (en) 2021-03-12

Family

ID=74931358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011286403.9A Pending CN112488163A (en) 2020-11-17 2020-11-17 Abnormal account identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112488163A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705682A (en) * 2021-08-27 2021-11-26 微民保险代理有限公司 User behavior feature processing method and device
CN113762585A (en) * 2021-05-17 2021-12-07 腾讯科技(深圳)有限公司 Data processing method, account type identification method and device
CN115982664A (en) * 2023-03-09 2023-04-18 北京芯盾时代科技有限公司 Abnormal account identification method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507470A (en) * 2020-03-02 2020-08-07 上海金仕达软件科技有限公司 Abnormal account identification method and device
CN111538873A (en) * 2019-12-23 2020-08-14 浙江大学 Telecommunication customer churn probability prediction method and system based on end-to-end model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538873A (en) * 2019-12-23 2020-08-14 浙江大学 Telecommunication customer churn probability prediction method and system based on end-to-end model
CN111507470A (en) * 2020-03-02 2020-08-07 上海金仕达软件科技有限公司 Abnormal account identification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭喜元 等: "数据驱动的故障预测", 31 March 2016, 哈尔滨工业大学出版社, pages: 232 - 236 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762585A (en) * 2021-05-17 2021-12-07 腾讯科技(深圳)有限公司 Data processing method, account type identification method and device
CN113705682A (en) * 2021-08-27 2021-11-26 微民保险代理有限公司 User behavior feature processing method and device
CN113705682B (en) * 2021-08-27 2024-05-14 微民保险代理有限公司 User behavior feature processing method and device
CN115982664A (en) * 2023-03-09 2023-04-18 北京芯盾时代科技有限公司 Abnormal account identification method, device, equipment and storage medium
CN115982664B (en) * 2023-03-09 2023-08-04 北京芯盾时代科技有限公司 Abnormal account identification method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111784348B (en) Account risk identification method and device
Halvaiee et al. A novel model for credit card fraud detection using Artificial Immune Systems
CN112488163A (en) Abnormal account identification method and device, computer equipment and storage medium
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN112749749B (en) Classification decision tree model-based classification method and device and electronic equipment
CN113947215A (en) Federal learning management method and device, computer equipment and storage medium
CN111831675A (en) Storage model training method and device, computer equipment and storage medium
CN112035549B (en) Data mining method, device, computer equipment and storage medium
CN112132676B (en) Method and device for determining contribution degree of joint training target model and terminal equipment
CN112508118B (en) Target object behavior prediction method aiming at data offset and related equipment thereof
CN112861662B (en) Target object behavior prediction method based on face and interactive text and related equipment
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN113726784A (en) Network data security monitoring method, device, equipment and storage medium
CN115757075A (en) Task abnormity detection method and device, computer equipment and storage medium
Awosika et al. Transparency and privacy: the role of explainable ai and federated learning in financial fraud detection
CN113761375A (en) Message recommendation method, device, equipment and storage medium based on neural network
CN113704637A (en) Object recommendation method, device and storage medium based on artificial intelligence
CN117114901A (en) Method, device, equipment and medium for processing insurance data based on artificial intelligence
CN116777646A (en) Artificial intelligence-based risk identification method, apparatus, device and storage medium
CN114925275A (en) Product recommendation method and device, computer equipment and storage medium
CN114090407A (en) Interface performance early warning method based on linear regression model and related equipment thereof
CN111737319A (en) User cluster prediction method and device, computer equipment and storage medium
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium
CN117350461B (en) Enterprise abnormal behavior early warning method, system, computer equipment and storage medium
CN111598159B (en) Training method, device, equipment and storage medium of machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination