CN116502132A - Account set identification method, device, equipment, medium and computer program product - Google Patents

Account set identification method, device, equipment, medium and computer program product Download PDF

Info

Publication number
CN116502132A
CN116502132A CN202210056606.1A CN202210056606A CN116502132A CN 116502132 A CN116502132 A CN 116502132A CN 202210056606 A CN202210056606 A CN 202210056606A CN 116502132 A CN116502132 A CN 116502132A
Authority
CN
China
Prior art keywords
resource transfer
account
transfer type
node
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210056606.1A
Other languages
Chinese (zh)
Inventor
李高
刘肖
李志颖
吴鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210056606.1A priority Critical patent/CN116502132A/en
Publication of CN116502132A publication Critical patent/CN116502132A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application relates to an account number set identification method, an account number set identification device, computer equipment, a storage medium and a computer program product. The method can be applied to abnormal payment account identification, and comprises the following steps: acquiring a resource transfer type sequence formed by resource transfer types for transferring resources based on an account; constructing a resource transfer type directed graph according to the resource transfer type sequence of each account; training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on a resource transfer type directed graph to obtain word vectors corresponding to various resource transfer types; obtaining account vectors of all the accounts according to the word vectors, and determining an account set formed by the accounts with similar change modes of the resource transfer types based on the account vectors; and identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set. The method can reduce the training labeling cost, sense unknown risks and improve the model training speed.

Description

Account set identification method, device, equipment, medium and computer program product
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for identifying an account set, a computer device, a storage medium, and a computer program product.
Background
Along with the rapid development of computer technology and internet technology, social tools become indispensable communication tools for people's daily life, and create many convenience conditions for people's life and work. In recent years, malicious network communities are built through social tools or malicious funds are communicated through the social tools for illegal purposes, and the partners need to be identified in time to maintain a safe social platform.
Conventionally, a supervised model is generally adopted to mine known risks, that is, a data mining method is adopted to mine out a behavior pattern of a malicious object based on object historical behavior data (that is, known malicious object behavior data and normal object behavior data), and then a classification model about the object is established based on the behavior pattern. Obviously, a large amount of label data is often required for training the model, the label collecting and labeling cost is high, the supervised model only can mine known risks and has weak perception capability on unknown risks, and moreover, the supervised model trains based on the object and the object behavior data, so that model training parameters are extremely large and difficult to train.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an account number set identification method, apparatus, computer device, computer readable storage medium and computer program product that can reduce training labeling costs, sense unknown risks, and increase model training speed.
In a first aspect, the present application provides an account set identification method. The method comprises the following steps:
acquiring a resource transfer type sequence formed by resource transfer types for transferring resources based on an account;
constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to each resource transfer type;
obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining an account set formed by accounts with similar change modes of the resource transfer types based on the account vectors;
And identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
In a second aspect, the application further provides an account set identification device. The device comprises:
the acquisition module is used for acquiring a resource transfer type sequence formed by resource transfer types for resource transfer based on the account;
the construction module is used for constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
the word vector acquisition module is used for training a word vector model based on a neural network by using the extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to all the resource transfer types;
the account set determining module is used for obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining account sets formed by accounts with similar change modes of the resource transfer types based on the account vectors;
The identification module is used for identifying an account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
In one embodiment, the obtaining module is further configured to obtain resource transfer data of resource transfer of each account in a historical period of time; and for each account, sequencing the corresponding resource transfer data according to the time sequence of the resource transfer, and generating a resource transfer type sequence of each account according to the resource transfer type in each resource transfer data after sequencing.
In one embodiment, the building module is further configured to count co-occurrence times of two adjacent resource transfer types in the resource transfer type sequence; and constructing a resource transfer type directed graph by taking each resource transfer type as a node and taking the co-occurrence times of two resource transfer types represented by two nodes in the resource transfer type sequence as the edge weight of a directed edge between the two nodes.
In one embodiment, the word vector obtaining module is further configured to take each node in the resource transfer type directed graph as a starting node, and walk from the starting node to a node adjacent to the starting node; for a node to which the current trip is made, determining each neighbor node of the node to which the current trip is made; when the neighbor node is the node to which the previous trip is performed, acquiring the return trip probability between the node to which the current trip is performed and the neighbor node; when the neighbor node is the neighbor node of the node which is moved to the previous time, acquiring the edge weight of the directed edge of the node which is moved to the current time, which points to the neighbor node; when the neighbor node is not the previous trip node and the neighbor node of the previous trip node, acquiring the internal and external trip probability between the current trip node and the neighbor node; according to the neighbor node indicated by the maximum one of the return trip probability, the edge weight and the internal and external trip probability, the node to which the current trip is carried out to the neighbor node indicated by the maximum one; recording the nodes passing by from the initial node until the number of the nodes passing by reaches a preset walking step length, and obtaining an extended resource transfer type sequence according to the nodes passing by.
In one embodiment, the word vector obtaining module is further configured to determine, for each target node in the resource transfer type directed graph, a corresponding neighboring node; calculating the return migration probability of the target node to the neighbor node according to the edge weight of the directed edge pointing to the neighbor node from the target node and a preset return parameter; and calculating the internal and external migration probability of the target node to the neighbor node according to the edge weight of the directed edge of the target node to the neighbor node and the preset internal and external migration parameter.
In one embodiment, the word vector obtaining module is further configured to obtain a word list formed by each resource transfer type and a word list vector of each resource transfer type; acquiring the extended resource transfer type sequence; inputting a word list vector of a target resource transfer type in the extended resource transfer type sequence into a word vector model based on a neural network, and outputting the probability of each resource transfer type in the word list as a context resource transfer type of the target resource transfer type through the word vector model; the probability maximization corresponding to the resource transfer type adjacent to the target resource transfer type in the extended resource transfer type sequence is used as a target, and the weight matrix of the word vector model is updated to obtain a trained word vector model; and mapping the vocabulary vectors of each resource transfer type into corresponding word vectors through the weight matrix of the trained word vector model.
In one embodiment, the account set determining module is further configured to obtain a plurality of word vectors corresponding to the resource transfer type sequence of the account according to each resource transfer type in the resource transfer type sequence of the account and the word vectors corresponding to each resource transfer type; and averaging the word vectors to obtain account vectors of all the accounts.
In one embodiment, the account set determining module is further configured to perform clustering processing on each account based on the similarity between account vectors, to obtain a plurality of clusters; and obtaining an account number set formed by the account numbers with similar resource transfer modes according to the account numbers in each cluster.
In one embodiment, the account set determining module is further configured to randomly select k accounts from the accounts, and use an account vector of the selected k accounts as a clustering center, where k is a natural number greater than 1; according to the distances from the account vectors of the accounts to k clustering centers, distributing the accounts to the clustering clusters where the closest clustering centers are located; and calculating the mean value of the account vectors included in each cluster, returning to the distances from the account vector of each account to the k cluster centers after obtaining updated k cluster centers, and continuing to execute the step of distributing each account to the cluster where the cluster center closest to the account is located until the k cluster centers are not updated any more.
In one embodiment, the identification module is further configured to obtain, for each account set, a resource transfer sequence of each account included in the account set; digging a frequent resource transfer mode of the resource transfer sequence of each account, and obtaining a frequent resource transfer mode of the resource transfer of the accounts in the account set; and when the frequent resource transfer mode of the account set comprises an abnormal resource transfer mode, determining that the account set is the account set with abnormal resource transfer.
In one embodiment, the identification module is further configured to count a support degree of each resource transfer type according to the resource transfer sequence of each account; taking the resource transfer type with the support degree larger than a preset threshold value as a frequent 1 item in the account number set; recursively executing the steps of determining a projection sequence taking the frequent i items as prefixes in the account set for each frequent i item, determining a single item with a support degree larger than a preset threshold value in the projection sequence, respectively merging the frequent i items with the single item, and then outputting frequent i+1 items in the account set, wherein i=i+1 is caused until the end recursion condition is met; and according to the output frequent items, obtaining a frequent resource transfer mode of the account set for resource transfer.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a resource transfer type sequence formed by resource transfer types for transferring resources based on an account;
constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to each resource transfer type;
obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining an account set formed by accounts with similar change modes of the resource transfer types based on the account vectors;
and identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring a resource transfer type sequence formed by resource transfer types for transferring resources based on an account;
constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to each resource transfer type;
obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining an account set formed by accounts with similar change modes of the resource transfer types based on the account vectors;
and identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
acquiring a resource transfer type sequence formed by resource transfer types for transferring resources based on an account;
constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to each resource transfer type;
obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining an account set formed by accounts with similar change modes of the resource transfer types based on the account vectors;
and identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
According to the account set identification method, the device, the computer equipment, the storage medium and the computer program product, the account sets in a mass account are mined in an unsupervised identification mode, suspicious account sets are timely perceived and early warned, and network security of a platform is maintained. Specifically, the account is represented based on a resource transfer type sequence formed by the resource transfer types of the account for resource transfer, the characteristics of the object are removed, the number of words to be trained is reduced from the order of the object to the order of the resource transfer types, and the model training speed can be greatly improved. The resource transfer type sequence is used for constructing a resource transfer type directed graph, and node characteristics and node relation characteristics are comprehensively considered by utilizing the expanded resource transfer type sequence regenerated by the directed graph, so that word vectors of all nodes output by a word vector model obtained based on training of the expanded resource transfer type sequence can be well expressed, and the accuracy of expressing the resource transfer type is improved. The account number vector of each account number is directly obtained according to the word vector corresponding to each resource transfer type, the account number vector of the account number is calculated rapidly and simply, and the account number can be expressed well. And determining an account number set formed by accounts with similar modes of resource transfer type change based on the account number vector, so that the account number set with an unknown type can be mined, and the unknown risk can be perceived. After each account set is determined, the account set with abnormal resource transfer is further identified for each account set, so that the abnormal account set can be rapidly judged, and compared with a mode of manually checking, analyzing and identifying the behaviors of each account set in sequence, the identification efficiency is effectively improved. By the aid of the set of efficient, cost-free, accurate and effective identification process, partners performing abnormal resource transfer in the network can be timely checked out, and network security is maintained.
Drawings
FIG. 1 is an application environment diagram of an account set identification method in one embodiment;
FIG. 2 is a flowchart of a method for identifying an account set in one embodiment;
FIG. 3 is a schematic diagram of a resource transfer type sequence for each account in one embodiment;
FIG. 4 is a schematic diagram of a resource transfer type directed graph in one embodiment;
FIG. 5 is a schematic diagram of an extended resource transfer type sequence generated in one embodiment;
FIG. 6 is a schematic diagram of a migration in a node neighborhood or node connection depth of a graph node in one embodiment;
FIG. 7 is a schematic diagram of a random walk generation sequence based on graph nodes in one embodiment;
FIG. 8 is a schematic diagram of a word vector model in one embodiment;
FIG. 9 is a schematic diagram of a word vector model in another embodiment;
FIG. 10 is a schematic overall flow chart of an account identification method in one embodiment;
FIG. 11 is a flowchart of a method for identifying an account set in a specific embodiment;
FIG. 12 is a schematic diagram of a resource transfer type directed graph in one specific example;
FIG. 13 is a schematic diagram of an account set obtained after clustering in a specific example;
FIG. 14 is a block diagram illustrating an apparatus for identifying an account set in one embodiment;
Fig. 15 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Some terms involved in this application are described:
node2vec: a word vector training algorithm model for graph data structures.
K-Means: a commonly used distance-based clustering algorithm can be used for clustering of large-scale data.
Prefixspan: a frequent sequence pattern mining algorithm.
CBOW: an algorithm for computing an embedding vector of a word.
Skip-Gram: an algorithm for computing an embedding vector of a word.
And (3) embedding: a word is represented by a vector calculated by an algorithm, and this process is called embedding.
A Graph (Graph) is a data structure representing the relationship between a series of objects.
Nodes (vertexes) in the graph refer to objects to be analyzed in networks, each object in the network corresponds to one node in the graph, and each object can be represented by an object identifier, a payment account number of the object, and the like. For another example, in the present application, a node in a resource transfer type directed graph characterizes a resource transfer type.
Edges (edges) in the graph are lines between two nodes in the graph, and are used for representing the relationship between the two nodes, and each node in the community network represents one object, and the edges between the nodes represent the relationship between the objects, such as friend relationship, payment relationship and the like between the objects. In the present application, an edge between two nodes in a resource transfer type directed graph indicates that there is a precedence relationship between two resource transfer types, for example, when a next resource transfer type is a, a next resource transfer type is B, and then a next resource transfer type is C, the resource transfer type directed graph includes such a sub-graph: a→b→c, used for representing the above-mentioned mode of resource transfer type change.
A Directed Graph (Directed Graph) is a Graph with edges having directions, such as the relationship of interest between objects, that is Directed. For example, in the present application, the mode of the resource transfer type change is also directional, the former resource transfer type a, the next resource transfer type B, and the former resource transfer type B, the next resource transfer type a, which embody different resource transfer type change modes.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
The account number set identification method provided by the embodiment of the application relates to an artificial intelligence natural language processing technology. For example, in the embodiment of the present application, after an extended resource transfer type sequence is obtained based on a resource transfer type directed graph, a word vector model based on a neural network is trained by using the extended resource transfer type sequence, so as to obtain word vectors corresponding to each resource transfer type, and the word vectors can be calculated and structured to accurately express the resource transfer type, so that a resource transfer type change mode of one account is accurately expressed, and account numbers similar to the resource transfer type change mode are ensured to be clustered to determine an account number set.
The account number set identification method can be applied to financial wind control products to identify abnormal group tasks such as illegal credit intermediaries, cashing, multi-head borrowing, illegal community organizations and the like.
The account number set identification method provided by the embodiment of the invention can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server.
In one embodiment, the server 104 obtains the resource transfer data of the account number from the terminal 102 and extracts the resource transfer type therefrom. The server 104 acquires a resource transfer type sequence formed by resource transfer types for transferring resources based on the account; constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer; training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on a resource transfer type directed graph to obtain word vectors corresponding to each resource transfer type; obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining account sets formed by accounts with similar resource transfer modes based on the account vectors; and identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
In one embodiment, the server 104 is a computer device with a relatively strong data processing capability, and may obtain, from a data storage system, resource transfer data of an account for resource transfer, and extract a resource transfer type from the resource transfer data to obtain a resource transfer type sequence of each account, so as to execute the account set identification method provided in the embodiment of the present application.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a method for identifying an account set is provided, which is illustrated by using the method applied to the computer device (for example, the server 104) in fig. 1 as an example, and includes the following steps:
step 202, a resource transfer type sequence composed of resource transfer types for transferring resources based on account numbers is obtained.
Wherein the account number may be used to uniquely identify an object in the paymate. The account number may be an object identifier, such as a mobile phone number of the object, a mailbox, or a platform internal identifier created by the paymate for each object, where the internal identifier is used to uniquely identify an object.
The resource transfer refers to a process of online expense based on an account number, namely, a process of online resource transfer-in or transfer-out. The resource may be funds, virtual coins, virtual funds, points, coupons, and the like. The resource transfer type is the type adopted by resource transfer, and for malicious partners, the change mode of the resource transfer type adopted by each account when the resource transfer is carried out has certain similarity and regularity, and the change mode of the resource transfer type is carried out by mining the account, so that the malicious partners are identified. The resource transfer types may be, for example, payment scenarios including cash lifting, change purchase, change transfer out, receipt of a red packet in a single chat session, receipt of a red packet in a group session, face-to-face transfer (out), face-to-face transfer (in), business payment to change, business payment (business red packet), business payment (swipe card), business payment (applet payment), binding bank card, general transfer (out), general transfer (in), and the like.
The resource transfer type sequence is a sequence composed of consecutive resource transfer types. The continuous resource transfer type is a resource transfer type with a sequence of resource transfer times. For example, for the account USER, 10 resource transfers are performed within a day, a resource transfer type sequence of the account USER in the day can be obtained according to the resource transfer type (payment scene) used by each resource transfer, and the 10 resource transfers can be divided into multiple copies according to concentration degree, so as to obtain multiple resource transfer type sequences of the account USER in the day, that is, each account can also correspond to multiple resource transfer type sequences.
In one embodiment, obtaining a resource transfer type sequence of resource transfer types based on account numbers for resource transfer includes: acquiring resource transfer data of each account for resource transfer in a historical time period; and for each account, sequencing the corresponding resource transfer data according to the time sequence of the resource transfer, and generating a resource transfer type sequence of each account according to the resource transfer type in each resource transfer data after sequencing.
The historical time period can be one hour, one day or two days in the past, and can be set according to actual requirements. For example, according to the characteristics of strong concentration and short time of abnormal resource transfer behaviors of malicious group partners, the resource transfer data of each account in the past hour or the past day can be obtained every 1 hour or every one day, then the account is sorted according to the sequence of time generated by the resource transfer data of each account, and the resource transfer type sequence of each account is obtained according to the sorted resource transfer data.
The resource transfer data may be object payment behavior pipelining data, which may include account numbers, resource transfer times, resource transfer types, transfer amounts, and the like. For example, for an account USER, the resource transfer data that occurs over a period of time includes 5 pieces of:
2021-10-10:21:00:00 transfer 1000
2021-10-10:21:10:00 commercial payment 500
2021-10-10 22:10:00 Red packet 20
2021-10-10 22:20:00 face-to-face transfer 5000
Then, the resource transfer type sequence corresponding to the account is generated as follows: transfer-commercial payment-red package-face-to-face transfer. Of course, the resource transfer type sequence corresponding to the account number can be generated uniformly according to the reverse order of the time generated by the resource transfer data, that is: face-to-face transfer-red-package-commercial payment-transfer. The method can be divided into two resource transfer type sequences according to the time concentration degree generated by the resource transfer data: transfer-commercial payment, red-package-face-to-face transfer. The method for generating the resource transfer type sequence is not limited in the embodiment of the present application, as long as the resource transfer type sequence of each account is generated in a uniform manner. The embodiments of the present application mainly take the generation of a resource transfer type sequence according to the time sequence of the generation of the resource transfer data as an example.
The account is represented by a resource transfer type sequence formed by the resource transfer types of the account for resource transfer, the characteristics of the object are removed, the number of words to be trained is reduced from the order of the object to the order of the resource transfer types, and the model training speed can be greatly improved.
In addition, it should be noted that, the account processed in the embodiment of the present application may be an account of a full-volume object in the payment platform, so that the resource transfer behavior of the full-volume object may be processed, and a partner where the malicious resource transfer behavior is located may be mined. The processed account number can also be an account number related to a certain group session with high-frequency resource transfer behavior, including an account number of an object added into the group session, an account number of an object which is added and then exits from the group session, and an account number of a group member still belonging to the group session, so that the specific identification of an illegal group network can be realized. The processed account number can also be an account number of the object without the resource transfer behavior filtered from the full-quantity object, so that the number of the processed objects can be reduced.
FIG. 3 is a schematic diagram of a resource transfer type sequence of each account in one embodiment. Referring to fig. 3, a sequence of resource transfer types over the same historical period in the past is included in account U1, account U2, and account U3. The resource transfer type sequence of the account U1 is DAB, the resource transfer type sequence of the account U2 is BE and DEF, the resource transfer type sequence of the account U3 is ECB and BA, each letter represents a resource transfer type, and an arrow represents the time sequence of generating resource transfer data of the resource transfer type.
Step 204, constructing a resource transfer type directed graph according to the resource transfer type sequence of each account, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer.
The resource transfer type directed graph is constructed based on a resource transfer type sequence of each account. The nodes in the resource transfer type directed graph represent the resource transfer types, and the directed edges among the nodes represent the change modes of the resource transfer types when the account numbers are used for transferring the resources. For example, after the resource transfer type A is adopted for one time of resource transfer and the resource transfer type B is adopted for one time of resource transfer, the directed edge points to the resource transfer type B from the resource transfer type A. For another example, after the resource transfer type a is adopted for one time, the resource transfer type C is adopted for one time, and then the directed edge points to the resource transfer type C from the resource transfer type a.
In addition, the directed edges between the nodes also correspond to edge weights, and the edge weights between the node A and the node B can be the times of pointing from the resource transfer type A to the resource transfer type B, namely the AB co-occurrence times, in all the resource transfer type sequences corresponding to the accounts processed by the account set identification. Note that the number of co-occurrences of AB and BA are two different data. If the sequence of the resource transfer type A pointed by the resource transfer type B exists in all the sequences of the resource transfer types corresponding to the processed account, the directed edge between the node A and the node B is a bidirectional edge, namely the resource transfer mode can be changed in a bidirectional way, and the edge weight of the bidirectional edge is the co-occurrence times of the AB and the BA respectively.
In one embodiment, constructing a resource transfer type directed graph according to a resource transfer type sequence of each account includes: counting the co-occurrence times of two adjacent resource transfer types in a resource transfer type sequence; and constructing a resource transfer type directed graph by taking each resource transfer type as a node and taking the co-occurrence times of the two resource transfer types represented by the two nodes in a resource transfer type sequence as the edge weight of the directed edge between the two nodes.
For example, for the multiple resource transfer type sequences shown in fig. 3, the co-occurrence times of any two resource transfer types in the resource transfer type sequences are counted, for the resource transfer type sequence DAB of the account U1, there are directed edges d→a, a→b, the resource transfer type sequences of the account U2 are BE and DEF, there are directed edges b→e, d→e, e→f, the resource transfer type sequences of the account U3 are ECB and BA, there are directed edges e→c, c→b, b→a, and the co-occurrence times of the statistics DA, AB, BE, DE, EF, EC, CB, BA are all 1, that is, the corresponding edge weight α=1. The constructed resource transfer type directed graph is shown in fig. 4. Referring to fig. 4, it may be clear that, in the resource transfer type sequence corresponding to the processed account, there is no change mode from the resource transfer type D to the resource transfer type F, that is, there is no directed edge between the node D and the node F.
And 206, training a word vector model based on the neural network by using the extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to the resource transfer types.
In the related art, the word is often based on the object and the object feature set, so that the word number of the vocabulary is huge, and the model training is difficult. In the method, the resource transfer type is used as words, the word number is greatly reduced to only hundreds, the model training speed is greatly improved, and after the word vectors of the resource transfer type are obtained, the account is represented by using a plurality of word vectors corresponding to the resource transfer type sequence, so that the attribute of the account is well expressed.
To obtain word vectors for resource transfer types, a training corpus needs to be obtained. Compared with the training corpus directly using the original resource transfer type sequence as the word vector, in the method, the resource transfer type directed graph is constructed based on the original resource transfer type sequence, after the extended resource transfer type sequence is regenerated based on the directed graph, the word vector of the resource transfer type is generated by using the extended resource transfer type sequence, and the adjacent relation between nodes in the directed graph can be fully mined, so that the word vector can better express the relation between the nodes.
The extended resource transfer type sequence obtained based on the resource transfer type directed graph is a uniform-length resource transfer type sequence. As shown in fig. 5, a schematic diagram of an extended resource transfer type sequence generated in one embodiment. Referring to fig. 5, the generated resource transfer type sequence is of uniform length, which has mined and expanded the original resource transfer type sequence. Therefore, even if the length of the resource transfer type sequence corresponding to each account is not uniform, even if the length of the resource transfer type sequence corresponding to some accounts is 1, the expanded resource transfer type sequence with uniform length can enable the word vector model to learn the ability of accurately expressing each resource transfer type, so that each account can obtain account vectors with uniform length, and subsequent calculation and processing are facilitated.
In one embodiment, when generating an extended resource transfer type sequence based on a resource transfer type directed graph, edge weights between two nodes in the resource transfer type directed graph need to be considered, and random walk is performed according to the size of the edge weights, so as to generate the extended resource transfer type sequence. The edge weight between two nodes in the resource transfer type directed graph can represent the mode of resource transfer type change in the resource transfer sequence corresponding to all the processed account numbers, and the larger the edge weight between the two nodes is, the more the co-occurrence times of the resource transfer types represented by the two nodes are represented, so that the more the co-occurrence times of the two resource transfer types in the extended resource transfer type sequence are supposed to be, and in addition, the potential resource transfer type change mode can be mined, so that the extended resource transfer type sequence can comprehensively consider the neighborhood characteristics and the network structure characteristics of the nodes.
Illustratively, a Node2Vec algorithm may be employed to mine an extended sequence of resource transfer types from a directed graph of resource transfer types. Node2vec is an unsupervised graph feature learning algorithm, which is a graph feature learning algorithm comprehensively considering DFS (Depth-first Sampling) neighborhood and BFS (BFDth-first Sampling) neighborhood. The method is mainly characterized in that a node neighborhood (BFS) or a node connection Depth (DFS) to which random walk tends to be currently walked is controlled through a return parameter p, an access parameter q, the number of times N from each node and a walk long step, so that the regenerated expanded resource transfer type sequence can comprehensively consider the neighborhood characteristics and the structural characteristics of the directed graph nodes, and word vectors after word embedding can well express the relationship between the nodes.
FIG. 6 is a schematic diagram illustrating migration deep within a node neighborhood or node connection of a graph node, in one embodiment. Referring to fig. 6, for the node v to which the current moves, the node to which the next moves may be a neighbor node x1 in its neighborhood or a node x2 that can go deep into connection, and represents a node neighborhood (BFS) and a node deep into connection (DFS) that tend to move to the current, respectively.
As shown in fig. 7, which is a schematic diagram of a random walk generation sequence based on graph nodes in one embodiment, referring to fig. 7, before walking, a walk weight distribution of the nodes needs to be calculated according to a return parameter p, an inside and outside walk parameter q, and an edge weight of a directed edge in the directed graph. Referring to fig. 7, when the node t moves to the current node v, the distribution of the movement weight of the node v moving to the next node is adjusted by p and q. The adjustment rule is as follows:
if the next node of the current node v is the node t (i.e., returns to the previous node), the walk weight of the current node v to the node t is obtained by dividing the edge weight α1 of the directed edge of the node v toward the node t by p. If the next node of the current node v is also the neighbor node x1 of t, the travelling weight from the current node v to the node x1 is the edge weight α2 of the directed edge of the node v to the node x 1. For other cases, like node x2 or node x3 in fig. 7, the walk weight that walks from the current node v to node x2 is obtained by dividing the edge weight α3 of the directed edge of node t to node x2 by q, and the walk weight that walks from the current node v to node x3 is obtained by dividing the edge weight α4 of the directed edge of node t to node x3 by q.
After the node migration weight distribution is obtained through calculation, starting from each node of the graph, randomly migrating in the graph according to the modified migration weight distribution, and recording the path nodes of the migration. It can be seen that by controlling p and q, the probability of BFS wander or DFS wander can be controlled at the wander time. If p >1, the walk will be as little backward as possible, i.e. the next node is unlikely to be the last accessed node t. If 0< p <1, the walk will have a greater tendency to return to the last node, so that neighboring nodes around the node will always walk back. q >1, tends to traverse node x1, i.e., toward BFS, p >1, which is adjacent to node t, tends to traverse node x2 or x3, i.e., toward DFS.
Further, the length of the walk long steps, i.e., the walk number control generated walk sequence, may be controlled, and the number of generated walk sequences may be controlled by controlling the number of times N each node starts to walk. For example, when N takes a value of 2, the number of nodes in the figure is 50, and the step size of the wandering step is 5, 100 wandering sequences with a length of 5 can be obtained in the above manner.
The method and the device are applied to the resource transfer type directed graph, and an extended resource transfer type sequence is mined.
In one embodiment, the step of obtaining an extended sequence of resource transfer types based on the resource transfer type directed graph comprises: taking each node in the resource transfer type directed graph as a starting node, and starting from the starting node, wandering to a node adjacent to the starting node; for the node to which the current trip is made, determining each neighbor node of the node to which the current trip is made; when the neighbor node is the node which is moved to the previous time, acquiring the return movement probability between the node which is moved to the current time and the neighbor node; when the neighbor node is the neighbor node of the node which is moved to the previous time, acquiring the edge weight of the directed edge of the node which is moved to the current time, which points to the neighbor node; when the neighbor node is not the previous node and the neighbor node of the previous node, the internal and external migration probability between the current node and the neighbor node is obtained; according to the neighbor node indicated by the maximum one of the return trip probability, the edge weight and the internal and external trip probability, the node from the current trip to the neighbor node indicated by the maximum one is moved; recording the nodes passing by from the initial node until the number of the nodes passing by reaches the preset travel step length, and obtaining an extended resource transfer type sequence according to the nodes passing by.
Before that, the return trip probability and the inside and outside trip probability of the trip to the next neighbor node may also be calculated for each node, specifically including: for each target node in the resource transfer type directed graph, determining a corresponding neighbor node; according to the edge weight of the directed edge pointing to the neighbor node from the target node and the preset return parameter, calculating the return trip probability of the target node to the neighbor node; and calculating the internal and external migration probability of the target node to the neighbor node according to the edge weight of the directed edge of the target node to the neighbor node and the preset internal and external migration parameter.
The return trip probability is a trip probability of returning to the neighboring node after the neighboring node has been tripped to the target node, and may be obtained by dividing an edge weight of a directed edge directed to the neighboring node from the target node by a preset return parameter p. The inside and outside migration probability is a probability that a next neighbor node to which the target node is migrated is not a certain node after the target node is migrated from the certain node, and can be obtained by dividing an edge weight of a directed edge directed to the neighbor node from the target node by a preset inside and outside migration parameter q.
The resource transfer type directed graph shown in fig. 4 is taken as an example. Assuming that the input parameters are p=0.25, q=4, step=4, and n=2, taking the resource transfer type B in the directed graph as the starting node as an example, after the source transfer type B is moved to the source transfer type a, the neighbor nodes of the source transfer type a include D, B, the movement weights from a to D are α/q, that is, 0, and the movement weights from a to B are α/p, that is, 4, and then the current node a will continue to move to the node B. The neighbor nodes of the resource transfer type B comprise A, E and C, the travelling weight from B to A is alpha/p, namely 4, the travelling weight from B to E is alpha/q, namely 1/4, the travelling weight from B to C is alpha/q, namely 0, then the current node B continues to travel to the node A, 4 nodes are already routed at the moment, the current travelling is ended, and a travelling sequence BABA is obtained.
Starting from a resource transfer type B in the directed graph as an initial node, after the resource transfer type B is moved to a resource transfer type E, the neighbor nodes of the resource transfer type E comprise D, F and C, the movement weight from E to D is alpha/q, namely 0, the movement weight from E to F is alpha/q, namely 1/4, the movement weight from E to C is alpha, namely 1, and then the current node E is moved to the node C continuously. The neighbor nodes of the resource transfer type C comprise E and B, the migration weight from C to E is alpha/p, namely 0, and the migration weight from C to B is alpha, namely 1, then the current node C continues to migrate to the node B, 4 nodes are already routed at the moment, the current migration is ended, and a migration sequence BECB is obtained. In this way, the extended resource transfer type sequences BABA and BECB can be obtained from the node B. In this manner, a random walk is continued from other nodes in the other directed graph of the resource transfer type, such as A, C, D, E, F, for a total of 12 extended sequences of resource transfer types are available.
In this embodiment, a resource transfer type directed graph is constructed by using a resource transfer type sequence, and node characteristics and node relationship characteristics are comprehensively considered by using an extended resource transfer type sequence regenerated by the directed graph, so that word vectors of all nodes output by a word vector model obtained by training based on the extended resource transfer type sequence can be well expressed, the relationship between the nodes can be well expressed, and the accuracy of expressing the resource transfer type is improved.
After the extended resource transfer type sequence is obtained, training a word vector model based on a neural network by using the extended resource transfer type sequence, and outputting word vectors corresponding to each resource transfer type through the trained word vector model. Word vectors are also known as distributed representations, word embedding, or distributed representations based on neural networks, by which word vector models model context, and the relationship between context and target words.
In one embodiment, the Word vector model may be, for example, word2Vec, such as a CBOW model, skip-gram model, or the like. The CBOW model predicts the target words in the context through the context, takes the context words as the input of the model, takes the target words in the context words as the expected output of the model, and carries out model training on the model. The Skip-gram model performs context prediction based on the target word, takes a word as the input of the model, and takes the word of the context as the expected output of the model for model training.
The input of the word vector model during model training is a word list vector coded by One-Hot, the output is the probability of each word in the word list, the weight from the input layer to the hidden layer is obtained by training the word vector model, and the input vector of each word can be mapped into the word vector.
As shown in fig. 8, a CBOW model is used for illustration, the vocabulary includes V words, for a certain training corpus, C context words x1k, x2k, x3k, …, xCk of a target word W in the training corpus are input into the model, the vocabulary vector is converted into C N-dimensional vectors through a weight matrix w_ (v×n) between the input layer and the hidden layer, the C N-dimensional vectors are added and averaged to obtain a hidden layer vector h, the hidden layer vector is converted into a target word vector through a weight matrix W' _ (n×v) between the hidden layer and the output layer, then an activating function is performed to obtain an output vector y of V-dimensional, each element in y represents a probability of each word in the vocabulary being the target word W, and the weight matrix w_ (v×n) and the weight matrix w_ (n×v) are updated based on a target of the maximization probability corresponding to the target word W in y. And after training, inputting the vocabulary vector of each word in the vocabulary into a model to obtain the word vector of each word.
As shown in fig. 9, a Skip-gram model is used for illustration, the word list includes V words, for a certain training corpus, a target word w in the training corpus is used as input, and C context words of the target word w in the training corpus are used as expected output for training. The method comprises the steps of inputting a vocabulary vector of a target word W (xk in a vocabulary) into a model, converting the vocabulary vector into an N-dimensional vector through a weight matrix W_ (V x N) between an input layer and a hidden layer, converting the hidden layer vector into C context vectors through a weight matrix W '_ (N x V) between the hidden layer and an output layer, and obtaining C output vectors yc with V dimensions, wherein each output vector yc represents the probability of a context word of each word in the vocabulary at a context position C of the target word W, and updating the weight matrix W_ (V x N) and the weight matrix W' _ (N x V) based on the target of maximizing the probability corresponding to the C context words. And after training, inputting the vocabulary vector of each word in the vocabulary into a model to obtain the word vector of each word.
Illustratively, the Skip-gram model is applied to the extended sequence of resource transfer types of the present application, training the word vector model.
In one embodiment, training a word vector model based on a neural network using an extended sequence of resource transfer types obtained based on a resource transfer type directed graph to obtain word vectors corresponding to respective resource transfer types, comprising: acquiring a vocabulary formed by each resource transfer type and a vocabulary vector of each resource transfer type; acquiring an extended resource transfer type sequence; inputting word list vectors of target resource transfer types in the extended resource transfer type sequence into a word vector model based on a neural network, and outputting the probability of each resource transfer type in the word list as a context resource transfer type of the target resource transfer type through the word vector model; the weight matrix of the word vector model is updated by taking the maximization of the probability corresponding to the resource transfer type adjacent to the target resource transfer type in the extended resource transfer type sequence as a target, so as to obtain a trained word vector model; and mapping the vocabulary vectors of each resource transfer type into corresponding word vectors through the weight matrix of the trained word vector model.
The vocabulary formed by the resource transfer types can comprise V resource transfer types, the vocabulary vector of each resource transfer type can be a V-dimensional One-Hot vector, and the vocabulary vector (Embedding) of each resource transfer type is a high-dimensional vector capable of accurately expressing the resource transfer type. For each extended sequence of resource transfer types, one of the resource transfer types may be sequentially taken as a target resource transfer type. For example, the extended resource transfer type sequence Seq is abceb, and the context is a word before and after, when the training may first use the first resource transfer type a as the target resource transfer type, the context resource transfer type is { "/", "B" }, the next training may use the second resource transfer type B as the target resource transfer type, the context resource transfer type is { "a", "C" }, and so on.
When the second resource transfer type B is used as the target resource transfer type, a vocabulary vector corresponding to the resource transfer type B is acquired and input into a word vector model, 2V-dimensional output vectors are output through the word vector model, and the output vectors carry model parameters. Wherein, each element in the vector of the first V dimension represents the probability that each resource transfer type in the vocabulary is the resource transfer type located at the upper position of the resource transfer type B, each element in the vector of the second V dimension represents the probability that each resource transfer type in the vocabulary is the resource transfer type located at the lower position of the resource transfer type B, and the probability that the vector resource transfer type A in the vector of the first V dimension is maximized, the probability that the vector resource transfer type C in the vector of the second V dimension is maximized is used as a target, the model parameters are optimized, and the weight matrix of word vector optimization is obtained. In this manner, the word vector model continues to be optimized with other resource transfer types in the sequence of extended resource transfer types as target resource transfer types and with the sequence of other extended resource transfer types until the extended resource transfer type training is finished when all of the extended resource transfer type training is trained. And then, inputting the vocabulary vectors of the resource transfer types in the vocabulary into a vocabulary vector model, and mapping the vocabulary vectors of the resource transfer types into corresponding vocabulary vectors.
Step 208, obtaining account vectors of the accounts according to the word vectors corresponding to the resource transfer types, and determining an account set formed by the accounts with similar resource transfer modes based on the account vectors.
The account vector is expressed in a vectorization mode. In the application, after the high-dimensional word vector corresponding to each resource transfer type is obtained, the resource transfer type sequence for resource transfer can be carried out based on the account numbers, the account number vector of each account number can be obtained, and the account number vector can accurately express the change mode of the resource transfer type for resource transfer based on the account numbers. Therefore, the account number set formed by the account numbers with similar change modes of the resource transfer types can be further determined based on the similarity between the account number vectors. The accounts in each account set have certain similarity, namely the similarity exists in the change modes of the resource transfer types of the accounts for resource transfer.
In one embodiment, obtaining the account number vector of each account number according to the word vector corresponding to each resource transfer type includes: obtaining a plurality of word vectors corresponding to the resource transfer type sequence of the account according to each resource transfer type in the resource transfer type sequence of the account and the word vectors corresponding to each resource transfer type; and averaging the word vectors to obtain account vectors of the accounts.
Specifically, the computer device may represent each resource transfer type in the resource transfer type sequence of the account by a corresponding word vector to form a matrix, each row of the matrix represents one resource transfer type sequence in the resource transfer type sequence, the number of rows is the length of the sequence, and average each column of the matrix to obtain a vector equal to the word vector in length, namely the account vector.
In this embodiment, all the resource transfer types in the resource transfer type sequence of one account are averaged to be used as the account vector of the account, and since the word vector of each resource transfer type is a high-dimensional vector, the attribute of the account can be well expressed after the averaging, so that the account vector calculation is extremely fast and simple.
In one embodiment, determining an account set composed of accounts with similar change patterns of the resource transfer types based on the account vector includes: clustering each account based on the similarity between account vectors to obtain a plurality of clusters; and obtaining an account number set formed by the account numbers with similar resource transfer modes according to the account numbers in each cluster.
Wherein clustering is the process of aggregating together account numbers with similar resource transfer patterns. The computer equipment can cluster all the accounts based on the similarity between the account vectors, and a plurality of accounts which are clustered into one type form an account set. The similarity between accounts may be represented by cosine distance, euclidean distance, manhattan distance, hamming distance, etc. between corresponding account vectors.
To be able to detect unknown risk types, an unsupervised clustering approach, such as K-Means clustering approach, may be used. The K-Means algorithm is an unsupervised clustering algorithm, and the implementation flow is as follows:
1. inputting the class number k to be clustered;
2. randomly selecting k data points in the data set as initial clustering center points;
3. the data points in the data set are distributed to the initial distance center point closest to the data set to form k clusters.
4. And calculating the mean value of each cluster to obtain k new cluster center points.
5. Repeating the steps 3-4 according to k new clustering center points until the clustering center points are not changed.
Wherein the distance between the data points is calculated using cosine distance, euclidean distance, manhattan distance, hamming distance, etc.
Illustratively, the K-means algorithm is applied to the embodiment of the application, and the account number is divided into a plurality of account number sets. In one embodiment, clustering is performed on each account based on similarity between account vectors to obtain a plurality of clusters, including: randomly selecting k accounts from the accounts, taking account vectors of the selected k accounts as a clustering center, wherein k is a natural number larger than 1; according to the distances from the account vectors of the accounts to k clustering centers, distributing the accounts to the clustering clusters where the closest clustering centers are located; and calculating the mean value of the account vectors included in each cluster, returning to the distances from the account vectors of each account to the k cluster centers after obtaining updated k cluster centers, and continuing to execute the step of distributing each account to the cluster where the cluster center closest to the account is located until the k cluster centers are not updated any more.
In this embodiment, k is the number of preset account sets to be classified, and all accounts are divided into multiple account sets by taking account vectors of the respective accounts as clustering objects. And determining an account number set formed by accounts with similar modes of resource transfer type change based on the account number vector, so that the account number set with an unknown type can be mined, and the unknown risk can be perceived.
Step 210, identifying an account set with abnormal resource transfer from each account set according to the resource transfer type sequence of the accounts in each account set.
Specifically, after each account set is obtained, the account set with abnormal resource transfer is further identified according to the resource transfer type sequence included in each account set. It will be appreciated that even though the individual accounts of the set of accounts are relatively similar, i.e., the sequence of resource transfer types is similar, i.e., the pattern of resource transfer type changes is similar, it is not necessarily representative of an abnormal set of accounts. The computer device may obtain a change pattern of a resource transfer type of the abnormal resource transfer, for example, send a red packet, receive a red packet, present to a bank card, and identify, for each account set, whether such an abnormal resource transfer pattern exists, thereby identifying the account set in which the abnormal resource transfer exists.
In one embodiment, according to a resource transfer type sequence of the accounts in each account set, identifying the account set with abnormal resource transfer from each account set includes: for each account set, acquiring a resource transfer sequence of each account included in the account set; excavating a frequent resource transfer mode of the resource transfer sequence of each account to obtain a frequent resource transfer mode of the account in the account set for transferring the resources; when the frequent resource transfer mode of the account set comprises an abnormal resource transfer mode, determining that the account set is the account set with abnormal resource transfer.
Wherein the frequent pattern is a pattern that frequently occurs in the data set. In this embodiment of the present application, the frequent resource transfer mode refers to a subsequence that frequently appears in the resource transfer type sequences for a resource transfer sequence of an account included in an account set. For example, assuming that the account U1, the account U2, the account U3, and the account U4 belong to the same account set, the respective resource transfer type sequences are ABC, BDCDEF, DABC, DBEFC, it is obvious that the sub-sequence BC appears in each sequence, and BC is a frequent resource transfer mode of the accounts in the account set, that is, a mode of transferring resources by using the resource transfer type B first and then transferring resources by using the resource transfer type C. The frequent resource transfer mode of the accounts in the account set is mined to help find out the common mode of the accounts for transferring the resources and the association relation between the accounts, and when the frequent resource transfer mode mined from a certain account set belongs to an abnormal resource transfer mode, the accounts in the account set have abnormal resource transfer behaviors.
Further, to ensure accuracy of the mining of the abnormal account set, for the identified account set with abnormal resource transfer, the account in the account set may be further checked by other means, such as detecting whether the resource transfer value exceeds a preset threshold, or by manual inspection. Alternatively, the computer device may directly perform processing such as limiting the resource transfer behavior on the account in the account set.
In one embodiment, the computer device may employ Apriori algorithm, FP-Growth algorithm, prefixSpan algorithm, etc. to mine frequent resource transfer patterns for individual accounts.
Illustratively, the frequent resource transfer mode of each account is obtained by a PrefixSpan algorithm.
In one embodiment, mining a frequent resource transfer mode of a resource transfer sequence of each account, to obtain a frequent resource transfer mode of resource transfer of an account in each account set, includes: according to the resource transfer sequence of each account, counting the support degree of each resource transfer type; taking the resource transfer type with the support degree larger than a preset threshold value as a frequent 1 item in the account number set; recursively executing the steps of determining a projection sequence taking the frequent i items as prefixes in the account set for each frequent i item from i=1, determining a single item with a support degree larger than a preset threshold in the projection sequence, respectively combining the frequent i items with the single item, and then outputting frequent i+1 items in the account set, and enabling the i=i+1 until the finishing recursion condition is met; and according to the output frequent items, obtaining a frequent resource transfer mode of the account number set for resource transfer.
The prefix is a subsequence of the front part of the sequence, and the remaining subsequence behind the prefix in the sequence is a projection sequence and can be called as a suffix. For example, for the resource transfer type sequence ABCBD, the prefix is a, the projection sequence is BCBD, and when the prefix is AB, the projection sequence is CBD. When the account number set includes a plurality of resource transfer type sequences, the projection sequence corresponding to the same prefix may include a plurality of resource transfer type sequences. The end recursion condition may be that the length of the frequent item has reached a threshold, that all resource transfer type sequence processing in the account set is complete, and so on.
For example: the account U1, the account U2, the account U3 and the account U4 belong to the same account set, the respective resource transfer type sequences are ABC, BDCDEF, DABC, DBEFC, and firstly, the support degree of each resource transfer type is counted and is respectively A:2, B:4, C:4, D:3, E:2,F:2, assuming a support threshold of 50%, then frequent 1 terms include { B, C, D }. For each frequent 1 item, such as B, determining that its corresponding projection sequence includes C, DCDEF, C, EFC, and counting the payment degree of each item is respectively: c:3, D:2, E:2: f:3, then the frequent 2 items include { BC, BF }, in such a way that the sequential processing of the remaining frequent 1 items continues, obtaining the other frequent 2 items. For each frequent 2 items beginning with B, such as BC, determining its corresponding projection sequence includes: DEF with a degree of support of less than 50%. For BF, determining the corresponding projection sequence includes: and C, if the support degree is less than 50%, the excavation of the frequent resource transfer mode beginning with the B is finished. In this way, the processing is continued for the remaining frequent items 2 in turn, obtaining the other frequent items 3 or ending execution when the end recursion condition is satisfied.
In this embodiment, after each account set is determined, an account set with abnormal resource transfer is further identified for each account set, so that the abnormal account set can be quickly judged, and compared with a mode of manually checking, analyzing and identifying the behavior of each account set in sequence, the identification efficiency is effectively improved.
In one embodiment, in order to improve the recursion efficiency, before starting recursion, all prefixes in the resource transfer type sequence of the account included in the account set and the projection sequence corresponding to each prefix are determined, so that only counting of each single item needs to be completed during recursion.
Fig. 10 is a schematic overall flow chart of an account identification method in one embodiment. Referring to fig. 10, firstly, for all accounts to be processed, a corresponding resource transfer type sequence is obtained according to corresponding resource transfer data, then a resource transfer type directed graph is generated based on the original resource transfer type sequence, an extended resource transfer type sequence is obtained by using the directed graph, word vectors of each resource transfer type are obtained by using an extended resource transfer type sequence training word vector model, account vectors are obtained based on the word vectors of the resource transfer type sequence in the original resource transfer type sequence of each account, clusters are obtained by calculating similarity of the account vectors, frequent resource transfer type modes of each cluster are further calculated, and clusters with abnormal resource transfer are identified.
According to the account set identification method, an unsupervised identification mode is adopted, account sets in a large number of accounts are mined, suspicious account sets are timely perceived and early-warned, and network security of a platform is maintained. Specifically, the account is represented based on a resource transfer type sequence formed by the resource transfer types of the account for resource transfer, the characteristics of the object are removed, the number of words to be trained is reduced from the order of the object to the order of the resource transfer types, and the model training speed can be greatly improved. The resource transfer type sequence is used for constructing a resource transfer type directed graph, and node characteristics and node relation characteristics are comprehensively considered by utilizing the expanded resource transfer type sequence regenerated by the directed graph, so that word vectors of all nodes output by a word vector model obtained based on training of the expanded resource transfer type sequence can be well expressed, and the accuracy of expressing the resource transfer type is improved. The account number vector of each account number is directly obtained according to the word vector corresponding to each resource transfer type, the account number vector of the account number is calculated rapidly and simply, and the account number can be expressed well. And determining an account number set formed by accounts with similar modes of resource transfer type change based on the account number vector, so that the account number set with an unknown type can be mined, and the unknown risk can be perceived. After each account set is determined, the account set with abnormal resource transfer is further identified for each account set, so that the abnormal account set can be rapidly judged, and compared with a mode of manually checking, analyzing and identifying the behaviors of each account set in sequence, the identification efficiency is effectively improved. By the aid of the set of efficient, cost-free, accurate and effective identification process, partners performing abnormal resource transfer in the network can be timely checked out, and network security is maintained.
As shown in fig. 11, in a specific embodiment, the account set identification method includes the following steps:
step 1102, obtaining resource transfer data of each account for transferring resources in a historical time period;
step 1104, for each account, sorting the corresponding resource transfer data according to the time sequence of resource transfer, and generating a resource transfer type sequence of each account according to the resource transfer type in each resource transfer data after sorting;
step 1106, counting the co-occurrence times of two adjacent resource transfer types in the resource transfer type sequence;
step 1108, constructing a resource transfer type directed graph by taking each resource transfer type as a node and taking the co-occurrence times of the two resource transfer types represented by the two nodes in a resource transfer type sequence as the edge weight of the directed edge between the two nodes;
step 1110, respectively taking each node in the resource transfer type directed graph as a starting node, and starting from the starting node, moving to a node adjacent to the starting node;
step 1112, for the node to which the current trip is made, determining each neighboring node of the node to which the current trip is made;
step 1114, when the neighboring node is the node that has moved to the previous time, acquiring a return trip probability between the node that has moved to the current time and the neighboring node;
Step 1116, when the neighboring node is the neighboring node of the previous node, obtaining the edge weight of the directed edge of the current node to which the neighboring node points;
step 1118, when the neighboring node is not the previous node and the neighboring node of the previous node, acquiring the internal and external travelling probability between the current node and the neighboring node;
step 1120, according to the neighbor node indicated by the largest one of the return trip probability, the edge weight and the inside and outside trip probability, the node from the current trip to the neighbor node indicated by the largest one is walked;
step 1122, recording the nodes passing from the initial node until the number of the nodes passing reaches the preset travel step length, and obtaining an extended resource transfer type sequence according to the nodes passing;
step 1124, obtaining a vocabulary composed of each resource transfer type and a vocabulary vector of each resource transfer type;
step 1126, obtaining an extended resource transfer type sequence;
step 1128, inputting word list vectors of target resource transfer types in the extended resource transfer type sequence into a word vector model based on a neural network, and outputting the probability of each resource transfer type in the word list as a context resource transfer type of the target resource transfer type through the word vector model;
Step 1130, using the maximum probability corresponding to the resource transfer type adjacent to the target resource transfer type in the extended resource transfer type sequence as a target, updating the weight matrix of the word vector model, and obtaining a trained word vector model;
step 1132, mapping the vocabulary vectors of each resource transfer type into corresponding word vectors through the weight matrix of the trained word vector model, and obtaining the word vectors corresponding to each resource transfer type;
step 1134, obtaining a plurality of word vectors corresponding to the resource transfer type sequence of the account according to each resource transfer type in the resource transfer type sequence of the account, and averaging the plurality of word vectors to obtain account vectors of each account;
step 1136, clustering the accounts based on the similarity between the account vectors to obtain a plurality of clusters, and obtaining an account set formed by the accounts with similar resource transfer modes according to the accounts in each cluster;
step 1138, for each account set, acquiring a resource transfer sequence of each account included in the account set, and mining a frequent resource transfer mode of the resource transfer sequence of each account to acquire a frequent resource transfer mode of the account in the account set for transferring the resources;
In step 1140, when the frequent resource transfer mode of the account set includes an abnormal resource transfer mode, determining that the account set is an account set with abnormal resource transfer.
The account number set identification method of the present application is explained below with a specific example.
1. And acquiring payment data of each account in the same period, and sequencing according to payment time. The following table shows:
account number Payment time Payment type Payment amount
USER1 2021-10-10 21:00:00 Transfer of money 1000
USER1 2021-10-10 21:10:00 Commercial payment 500
USER1 2021-10-10 22:10:00 Red purse 20
USER1 2021-10-10 22:20:00 Face-to-face transfer 5000
USER2 2021-10-10 21:20:00 Red purse 20
USER2 2021-10-10 22:10:00 Face-to-face transfer 5000
USER2 2021-10-10 22:30:00 Red purse 10
USER2 2021-10-10 22:32:00 Face-to-face transfer
2. For each account, the number of co-occurrences of payment types for two adjacent payments data is counted, as shown in the following table:
current scene The next scene Number of co-occurrence times
Transfer of money Commercial payment 1
Commercial payment Red purse 1
Red purse Face to faceSurface transfer 3
Face-to-face transfer Red purse 1
3. A resource transfer type directed graph is constructed based on the edges embodied in the table above, as shown in fig. 12.
4. Obtaining an extended resource transfer type sequence by using a resource transfer type directed graph, training a word vector model by using the extended resource transfer type sequence, and obtaining a word vector of each resource transfer type as shown in the following table:
payment behavior scene Scene vector
Transfer of money [-0.001,0.03,-0.007,0.34,0.5]
Red purse [0.01,-0.13,0.45,-0.34,-0.1]
Commercial payment [0.01,0.63,-0.15,0.1,0.3]
Face-to-face transfer [0.01,-0.1,-0.04,0.46,-0.28]
5. An account number vector is generated. Each resource transfer type of the resource transfer type sequence of the account is replaced by a corresponding word vector to form a matrix, each row of the matrix is provided with the word vector of one resource transfer type, the number of the rows is the length of the sequence, and then each column of the matrix is averaged to obtain the account vector. The following table shows:
account number Account vector
USER1 [-0.00025,0.0075,-0.00175,0.085,0.125]
USER2 [0.005,-0.065,0.225,-0.17,-0.05]
6. And excavating similar accounts to obtain an account set. And clustering similar accounts after the distance calculation is performed by adopting a similarity formula, so as to obtain an account set. Examples are as follows:
7. setting a preset threshold of the support degree and the length of frequent items, and finding out the frequent resource transfer mode in the account number set according to the resource transfer type sequence of each account number in each account number set.
8. And identifying the frequent resource transfer mode as the account number set with abnormal resource transfer.
In addition, it should be noted that, in order to implement real-time supervision on the account, the computer device may acquire the latest resource transfer data of the account at intervals, and implement the account set identification method of the present application by using the latest resource transfer data, that is, after re-acquiring the resource transfer type sequence, re-train the model to obtain word vectors of each resource transfer type, and then mine the abnormal account set based on the word vectors.
Fig. 13 is a schematic diagram of an account set obtained after clustering in a specific example. Referring to fig. 13, each row represents an account, each circle represents a primary resource transfer behavior of the account, and circles of different styles represent different resource transfer types. It can be seen that each account in the account set has a similar change pattern of the resource transfer type, i.e. the account is transferred out or in by other means immediately after receiving and transmitting the group red packet in the group session, which is an abnormal resource transfer pattern.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an account set identification device for realizing the account set identification method. The implementation scheme of the device for solving the problem is similar to the implementation scheme described in the above method, so the specific limitation in the embodiment of the device for identifying an account set provided in the following may refer to the limitation of the method for identifying an account set, which is not described herein.
In one embodiment, as shown in fig. 14, there is provided an account set identification apparatus 1400, including: an acquisition module 1402, a construction module 1404, a word vector acquisition module 1406, an account set determination module 1408, and an identification module 1410, wherein:
an obtaining module 1402, configured to obtain a resource transfer type sequence formed by resource transfer types that perform resource transfer based on an account;
a construction module 1404, configured to construct a resource transfer type directed graph according to the resource transfer type sequences of the respective accounts, where nodes in the resource transfer type directed graph represent resource transfer types, and directed edges between the nodes represent a change pattern of the resource transfer types in the resource transfer;
a word vector obtaining module 1406, configured to train a word vector model based on a neural network by using the extended resource transfer type sequence obtained based on the resource transfer type directed graph, to obtain word vectors corresponding to each resource transfer type;
The account set determining module 1408 is configured to obtain an account vector of each account according to the word vector corresponding to each resource transfer type, and determine an account set formed by accounts with similar change patterns of the resource transfer types based on the account vector;
the identifying module 1410 is configured to identify, from each account set, an account set with abnormal resource transfer according to a resource transfer type sequence of an account in each account set.
In one embodiment, the obtaining module 1402 is further configured to obtain resource transfer data of each account for transferring resources in a historical period of time; and for each account, sequencing the corresponding resource transfer data according to the time sequence of the resource transfer, and generating a resource transfer type sequence of each account according to the resource transfer type in each resource transfer data after sequencing.
In one embodiment, the constructing module 1404 is further configured to count the co-occurrence times of two adjacent resource transfer types in the sequence of resource transfer types; and constructing a resource transfer type directed graph by taking each resource transfer type as a node and taking the co-occurrence times of the two resource transfer types represented by the two nodes in a resource transfer type sequence as the edge weight of the directed edge between the two nodes.
In one embodiment, the word vector obtaining module 1406 is further configured to take each node in the resource transfer type directed graph as a starting node, and walk from the starting node to a node adjacent to the starting node; for the node to which the current trip is made, determining each neighbor node of the node to which the current trip is made; when the neighbor node is the node which is moved to the previous time, acquiring the return movement probability between the node which is moved to the current time and the neighbor node; when the neighbor node is the neighbor node of the node which is moved to the previous time, acquiring the edge weight of the directed edge of the node which is moved to the current time, which points to the neighbor node; when the neighbor node is not the previous node and the neighbor node of the previous node, the internal and external migration probability between the current node and the neighbor node is obtained; according to the neighbor node indicated by the maximum one of the return trip probability, the edge weight and the internal and external trip probability, the node from the current trip to the neighbor node indicated by the maximum one is moved; recording the nodes passing by from the initial node until the number of the nodes passing by reaches the preset travel step length, and obtaining an extended resource transfer type sequence according to the nodes passing by.
In one embodiment, the word vector acquisition module 1406 is further configured to determine, for each target node in the resource transfer type directed graph, a corresponding neighbor node; according to the edge weight of the directed edge pointing to the neighbor node from the target node and the preset return parameter, calculating the return trip probability of the target node to the neighbor node; and calculating the internal and external migration probability of the target node to the neighbor node according to the edge weight of the directed edge of the target node to the neighbor node and the preset internal and external migration parameter.
In one embodiment, the word vector obtaining module 1406 is further configured to obtain a word list formed by each resource transfer type and a word list vector of each resource transfer type; acquiring an extended resource transfer type sequence; inputting word list vectors of target resource transfer types in the extended resource transfer type sequence into a word vector model based on a neural network, and outputting the probability of each resource transfer type in the word list as a context resource transfer type of the target resource transfer type through the word vector model; the weight matrix of the word vector model is updated by taking the maximization of the probability corresponding to the resource transfer type adjacent to the target resource transfer type in the extended resource transfer type sequence as a target, so as to obtain a trained word vector model; and mapping the vocabulary vectors of each resource transfer type into corresponding word vectors through the weight matrix of the trained word vector model.
In one embodiment, the account set determining module 1408 is further configured to obtain a plurality of word vectors corresponding to the resource transfer type sequence of the account according to each resource transfer type in the resource transfer type sequence of the account and the word vectors corresponding to each resource transfer type; and averaging the word vectors to obtain account vectors of the accounts.
In one embodiment, the account set determining module 1408 is further configured to perform clustering on each account based on the similarity between account vectors, to obtain a plurality of clusters; and obtaining an account number set formed by the account numbers with similar resource transfer modes according to the account numbers in each cluster.
In one embodiment, the account set determining module 1408 is further configured to randomly select k accounts from the accounts, and use the account vectors of the selected k accounts as a clustering center, where k is a natural number greater than 1; according to the distances from the account vectors of the accounts to k clustering centers, distributing the accounts to the clustering clusters where the closest clustering centers are located; and calculating the mean value of the account vectors included in each cluster, returning to the distances from the account vectors of each account to the k cluster centers after obtaining updated k cluster centers, and continuing to execute the step of distributing each account to the cluster where the cluster center closest to the account is located until the k cluster centers are not updated any more.
In one embodiment, the identifying module 1410 is further configured to, for each account set, obtain a resource transfer sequence of each account included in the account set; excavating a frequent resource transfer mode of the resource transfer sequence of each account to obtain a frequent resource transfer mode of the account in the account set for transferring the resources; when the frequent resource transfer mode of the account set comprises an abnormal resource transfer mode, determining that the account set is the account set with abnormal resource transfer.
In one embodiment, the identification module 1410 is further configured to count, according to the resource transfer sequence of each account, a support degree of each resource transfer type; taking the resource transfer type with the support degree larger than a preset threshold value as a frequent 1 item in the account number set; recursively executing the steps of determining a projection sequence taking the frequent i items as prefixes in the account set for each frequent i item from i=1, determining a single item with a support degree larger than a preset threshold in the projection sequence, respectively combining the frequent i items with the single item, and then outputting frequent i+1 items in the account set, and enabling the i=i+1 until the finishing recursion condition is met; and according to the output frequent items, obtaining a frequent resource transfer mode of the account number set for resource transfer.
According to the account set identification device, an unsupervised identification mode is adopted, account sets in a large number of accounts are mined, suspicious account sets are timely perceived and early-warned, and network security of a platform is maintained. Specifically, the account is represented based on a resource transfer type sequence formed by the resource transfer types of the account for resource transfer, the characteristics of the object are removed, the number of words to be trained is reduced from the order of the object to the order of the resource transfer types, and the model training speed can be greatly improved. The resource transfer type sequence is used for constructing a resource transfer type directed graph, and node characteristics and node relation characteristics are comprehensively considered by utilizing the expanded resource transfer type sequence regenerated by the directed graph, so that word vectors of all nodes output by a word vector model obtained based on training of the expanded resource transfer type sequence can be well expressed, and the accuracy of expressing the resource transfer type is improved. The account number vector of each account number is directly obtained according to the word vector corresponding to each resource transfer type, the account number vector of the account number is calculated rapidly and simply, and the account number can be expressed well. And determining an account number set formed by accounts with similar modes of resource transfer type change based on the account number vector, so that the account number set with an unknown type can be mined, and the unknown risk can be perceived. After each account set is determined, the account set with abnormal resource transfer is further identified for each account set, so that the abnormal account set can be rapidly judged, and compared with a mode of manually checking, analyzing and identifying the behaviors of each account set in sequence, the identification efficiency is effectively improved. By the aid of the set of efficient, cost-free, accurate and effective identification process, partners performing abnormal resource transfer in the network can be timely checked out, and network security is maintained.
The various modules in the account number set identification apparatus 1400 described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 15. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing resource transfer data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for account set identification.
It will be appreciated by those skilled in the art that the structure shown in fig. 15 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application is applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the account number set identification method provided in the embodiments of the present application when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the account number set identification method provided in the embodiments of the present application.
In one embodiment, a computer program product is provided, which includes a computer program that, when executed by a processor, implements the steps of the account number set identification method provided in the embodiments of the present application.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (15)

1. An account number set identification method, which is characterized by comprising the following steps:
acquiring a resource transfer type sequence formed by resource transfer types for transferring resources based on an account;
constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
Training a word vector model based on a neural network by using an extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to each resource transfer type;
obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining an account set formed by accounts with similar change modes of the resource transfer types based on the account vectors;
and identifying the account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
2. The method according to claim 1, wherein the obtaining a resource transfer type sequence of resource transfer types based on account numbers for resource transfer includes:
acquiring resource transfer data of each account for resource transfer in a historical time period;
and for each account, sequencing the corresponding resource transfer data according to the time sequence of the resource transfer, and generating a resource transfer type sequence of each account according to the resource transfer type in each resource transfer data after sequencing.
3. The method according to claim 1, wherein the constructing a resource transfer type directed graph according to the resource transfer type sequence of each account includes:
Counting the co-occurrence times of two adjacent resource transfer types in the resource transfer type sequence;
and constructing a resource transfer type directed graph by taking each resource transfer type as a node and taking the co-occurrence times of two resource transfer types represented by two nodes in the resource transfer type sequence as the edge weight of a directed edge between the two nodes.
4. The method of claim 1, wherein the step of obtaining an extended sequence of resource transfer types based on the resource transfer type directed graph comprises:
taking each node in the resource transfer type directed graph as a starting node, and starting from the starting node, wandering to a node adjacent to the starting node;
for a node to which the current trip is made, determining each neighbor node of the node to which the current trip is made;
when the neighbor node is the node to which the previous trip is performed, acquiring the return trip probability between the node to which the current trip is performed and the neighbor node;
when the neighbor node is the neighbor node of the node which is moved to the previous time, acquiring the edge weight of the directed edge of the node which is moved to the current time, which points to the neighbor node;
When the neighbor node is not the previous trip node and the neighbor node of the previous trip node, acquiring the internal and external trip probability between the current trip node and the neighbor node;
according to the neighbor node indicated by the maximum one of the return trip probability, the edge weight and the internal and external trip probability, the node to which the current trip is carried out to the neighbor node indicated by the maximum one;
recording the nodes passing by from the initial node until the number of the nodes passing by reaches a preset walking step length, and obtaining an extended resource transfer type sequence according to the nodes passing by.
5. The method according to claim 4, wherein the method further comprises:
for each target node in the resource transfer type directed graph, determining a corresponding neighbor node;
calculating the return migration probability of the target node to the neighbor node according to the edge weight of the directed edge pointing to the neighbor node from the target node and a preset return parameter;
and calculating the internal and external migration probability of the target node to the neighbor node according to the edge weight of the directed edge of the target node to the neighbor node and the preset internal and external migration parameter.
6. The method of claim 1, wherein training a word vector model based on a neural network using the extended sequence of resource transfer types obtained based on the resource transfer type directed graph to obtain word vectors corresponding to respective resource transfer types, comprises:
acquiring a vocabulary formed by each resource transfer type and a vocabulary vector of each resource transfer type;
acquiring the extended resource transfer type sequence;
inputting a word list vector of a target resource transfer type in the extended resource transfer type sequence into a word vector model based on a neural network, and outputting the probability of each resource transfer type in the word list as a context resource transfer type of the target resource transfer type through the word vector model;
the probability maximization corresponding to the resource transfer type adjacent to the target resource transfer type in the extended resource transfer type sequence is used as a target, and the weight matrix of the word vector model is updated to obtain a trained word vector model;
and mapping the vocabulary vectors of each resource transfer type into corresponding word vectors through the weight matrix of the trained word vector model.
7. The method according to claim 1, wherein the obtaining the account number vector of each account number according to the word vector corresponding to each resource transfer type includes:
obtaining a plurality of word vectors corresponding to the resource transfer type sequence of the account according to each resource transfer type in the resource transfer type sequence of the account and the word vectors corresponding to each resource transfer type;
and averaging the word vectors to obtain account vectors of all the accounts.
8. The method according to claim 1, wherein the determining, based on the account vector, an account set of accounts with similar change patterns of the resource transfer types includes:
clustering the account numbers based on the similarity between the account number vectors to obtain a plurality of clusters;
and obtaining an account number set formed by the account numbers with similar resource transfer modes according to the account numbers in each cluster.
9. The method of claim 8, wherein clustering each account based on the similarity between account vectors to obtain a plurality of clusters, comprises:
randomly selecting k accounts from the accounts, taking account vectors of the selected k accounts as a clustering center, wherein k is a natural number larger than 1;
According to the distances from the account vectors of the accounts to k clustering centers, distributing the accounts to the clustering clusters where the closest clustering centers are located;
and calculating the mean value of the account vectors included in each cluster, returning to the distances from the account vector of each account to the k cluster centers after obtaining updated k cluster centers, and continuing to execute the step of distributing each account to the cluster where the cluster center closest to the account is located until the k cluster centers are not updated any more.
10. The method according to any one of claims 1 to 9, wherein identifying, from each of the account sets, an account set in which an abnormal resource transfer exists according to a resource transfer type sequence of accounts in each of the account sets, includes:
for each account set, acquiring a resource transfer sequence of each account included in the account set;
digging a frequent resource transfer mode of the resource transfer sequence of each account, and obtaining a frequent resource transfer mode of the resource transfer of the accounts in the account set;
and when the frequent resource transfer mode of the account set comprises an abnormal resource transfer mode, determining that the account set is the account set with abnormal resource transfer.
11. The method of claim 10, wherein the mining the frequent resource transfer pattern for the resource transfer sequence of each account to obtain the frequent resource transfer pattern for the account in each account set includes:
according to the resource transfer sequence of each account, counting the support degree of each resource transfer type;
taking the resource transfer type with the support degree larger than a preset threshold value as a frequent 1 item in the account number set;
recursively executing the steps of determining a projection sequence taking the frequent i items as prefixes in the account set for each frequent i item, determining a single item with a support degree larger than a preset threshold value in the projection sequence, respectively merging the frequent i items with the single item, and then outputting frequent i+1 items in the account set, wherein i=i+1 is caused until the end recursion condition is met;
and according to the output frequent items, obtaining a frequent resource transfer mode of the account set for resource transfer.
12. An account set identification device, the device comprising:
the acquisition module is used for acquiring a resource transfer type sequence formed by resource transfer types for resource transfer based on the account;
The construction module is used for constructing a resource transfer type directed graph according to the resource transfer type sequences of the account numbers, wherein nodes in the resource transfer type directed graph represent the resource transfer types, and directed edges among the nodes represent the change modes of the resource transfer types in the resource transfer;
the word vector acquisition module is used for training a word vector model based on a neural network by using the extended resource transfer type sequence obtained based on the resource transfer type directed graph to obtain word vectors corresponding to all the resource transfer types;
the account set determining module is used for obtaining account vectors of all accounts according to word vectors corresponding to all the resource transfer types, and determining account sets formed by accounts with similar change modes of the resource transfer types based on the account vectors;
the identification module is used for identifying an account number set with abnormal resource transfer from each account number set according to the resource transfer type sequence of the account numbers in each account number set.
13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.
14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 11.
15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 11.
CN202210056606.1A 2022-01-18 2022-01-18 Account set identification method, device, equipment, medium and computer program product Pending CN116502132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210056606.1A CN116502132A (en) 2022-01-18 2022-01-18 Account set identification method, device, equipment, medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210056606.1A CN116502132A (en) 2022-01-18 2022-01-18 Account set identification method, device, equipment, medium and computer program product

Publications (1)

Publication Number Publication Date
CN116502132A true CN116502132A (en) 2023-07-28

Family

ID=87323592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210056606.1A Pending CN116502132A (en) 2022-01-18 2022-01-18 Account set identification method, device, equipment, medium and computer program product

Country Status (1)

Country Link
CN (1) CN116502132A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236721A (en) * 2023-11-09 2023-12-15 湖南财信数字科技有限公司 Monitoring method, system, computer equipment and storage medium for enterprise abnormal behavior

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236721A (en) * 2023-11-09 2023-12-15 湖南财信数字科技有限公司 Monitoring method, system, computer equipment and storage medium for enterprise abnormal behavior

Similar Documents

Publication Publication Date Title
Bansal et al. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning
Fawagreh et al. Random forests: from early developments to recent advancements
Dhal et al. A multi-objective feature selection method using Newton’s law based PSO with GWO
Rahman et al. Discretization of continuous attributes through low frequency numerical values and attribute interdependency
Wei et al. Efficient feature selection algorithm based on particle swarm optimization with learning memory
Lu et al. GLR: A graph-based latent representation model for successive POI recommendation
CN107391542A (en) A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates
Kumar et al. A benchmark to select data mining based classification algorithms for business intelligence and decision support systems
Díaz-Morales Cross-device tracking: Matching devices and cookies
Teng et al. Customer credit scoring based on HMM/GMDH hybrid model
Ruan et al. GADM: Manual fake review detection for O2O commercial platforms
Yin et al. A real-time dynamic concept adaptive learning algorithm for exploitability prediction
CN107392311A (en) The method and apparatus of sequence cutting
Hu et al. pRNN: A recurrent neural network based approach for customer churn prediction in telecommunication sector
Liu et al. POI Recommendation Method Using Deep Learning in Location‐Based Social Networks
CN116502132A (en) Account set identification method, device, equipment, medium and computer program product
Bhowmik et al. Dbnex: Deep belief network and explainable ai based financial fraud detection
Shan et al. Incorporating user behavior flow for user risk assessment
Denter et al. Forecasting future bigrams and promising patents: introducing text-based link prediction
CN106156256A (en) A kind of user profile classification transmitting method and system
CN116633589A (en) Malicious account detection method, device and storage medium in social network
Agrawal Fundamentals of machine learning
Wu et al. Mining trajectory patterns with point-of-interest and behavior-of-interest
Nayak et al. A modified differential evolution-based fuzzy multi-objective approach for clustering
CN113469816A (en) Digital currency identification method, system and storage medium based on multigroup technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination