CN113837874B - Data identification method and device, storage medium and electronic equipment - Google Patents

Data identification method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113837874B
CN113837874B CN202111381734.5A CN202111381734A CN113837874B CN 113837874 B CN113837874 B CN 113837874B CN 202111381734 A CN202111381734 A CN 202111381734A CN 113837874 B CN113837874 B CN 113837874B
Authority
CN
China
Prior art keywords
transaction
node
nodes
candidate
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111381734.5A
Other languages
Chinese (zh)
Other versions
CN113837874A (en
Inventor
郭翊麟
孙悦
蔡准
郭晓鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Trusfort Technology Co ltd
Original Assignee
Beijing Trusfort Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Trusfort Technology Co ltd filed Critical Beijing Trusfort Technology Co ltd
Priority to CN202111381734.5A priority Critical patent/CN113837874B/en
Publication of CN113837874A publication Critical patent/CN113837874A/en
Application granted granted Critical
Publication of CN113837874B publication Critical patent/CN113837874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a data identification method, a data identification device, a storage medium and electronic equipment, wherein the method comprises the following steps: collecting a plurality of transaction records; establishing a transaction network according to the transaction records, wherein nodes in the transaction network are users, and edges connecting two nodes in the transaction network represent that transaction behaviors exist between the users; determining a plurality of first candidate nodes meeting set conditions from a trading network, wherein the first candidate nodes form a first set; dividing the transaction network to obtain a plurality of sub-networks, determining a plurality of second candidate nodes according to the first set and the sub-networks, and forming a second set by the plurality of second candidate nodes; clustering nodes in the transaction network to obtain a plurality of clusters, determining a plurality of third candidate nodes according to the first set and the clusters, and forming a third set by the plurality of third candidate nodes; and determining the intersection of the second set and the third set, and taking the union of the intersection and the first set as a target node. By adopting the identification method, the accuracy and the efficiency of data identification can be improved.

Description

Data identification method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data identification method and apparatus, a storage medium, and an electronic device.
Background
With the continuous development of the internet, electronic banking has become one of the main competitive means of banking channels and marketing, and the network electronic banking brings convenience to people and provides a new channel for illegal transactions of lawbreakers.
At present, the identification of abnormal transaction behaviors is mainly realized by manually auditing transaction running water of a user through a service expert or constructing an expert rule according to the characteristics of abnormal transactions for identification. Due to the fact that the abnormal transaction behavior has the characteristics of diversification and complication, the rules of the service experts are adopted for direct identification, on one hand, the judgment standards of different experts are different, and therefore the identification result is high in subjectivity and poor in stability; on the other hand, the transaction behavior data is huge, so that the recognition efficiency is low, and the method is completely not suitable for the business scene requirement of large-scale recording of the current electronic bank. And the abnormal data is identified only by the single mode of expert experience, which can result in a larger false negative rate.
Disclosure of Invention
The invention provides a data identification method, a data identification device, a storage medium and electronic equipment, which can improve the accuracy and efficiency of data identification.
One aspect of the present invention provides a data identification method, including:
collecting a plurality of transaction records, wherein the transaction records are used for recording transaction behavior data among users;
establishing a transaction network according to the transaction record, wherein a node in the transaction network is the user, and an edge connecting two nodes in the transaction network represents that a transaction behavior exists between the users;
determining a plurality of first candidate nodes meeting set conditions from the transaction network, wherein the plurality of first candidate nodes form a first set;
segmenting the transaction network to obtain a plurality of sub-networks, determining a plurality of second candidate nodes according to the first set and the sub-networks, and forming a second set by the plurality of second candidate nodes;
clustering nodes in the transaction network to obtain a plurality of clusters, determining a plurality of third candidate nodes according to the first set and the clusters, and forming a third set by the plurality of third candidate nodes;
and determining the intersection of the second set and the third set, wherein the union of the intersection and the first set is a target node.
In an embodiment, the determining, from the transaction network, a plurality of first candidate nodes satisfying a set condition includes:
the setting conditions are multiple, and the nodes meeting all the setting conditions are first candidate nodes.
In an embodiment, the partitioning the transaction network into a plurality of sub-networks includes:
a plurality of nodes with transaction behaviors are divided into a sub-network.
In an embodiment, the determining a plurality of second candidate nodes according to the first set and the sub-network comprises:
searching a node in the sub-network, which has a transaction behavior with the first candidate node, and then determining the node as a second candidate node;
if the sub-network does not have a node with a transaction behavior with the first candidate node, searching a node to be selected which has a common node with the first candidate node in the sub-network, and if the ratio of the number of the common nodes of the node to be selected and the first candidate node to the total number of the edges of the node to be selected exceeds a preset threshold, determining that the node to be selected is a second candidate node.
In an embodiment, the clustering nodes in the transaction network to obtain a plurality of clusters includes:
and clustering a plurality of nodes meeting the similarity requirement into a cluster according to the similarity calculation result of the nodes in the transaction network.
In an embodiment, the determining a plurality of third candidate nodes according to the first set and the cluster includes:
determining the cluster in which the first candidate node is positioned as a target cluster;
and a node in the target cluster, which has a transaction behavior with the first candidate node and has a set relationship with the first candidate node, is a third candidate node.
In an embodiment, the determining a plurality of third candidate nodes according to the first set and the cluster further includes:
determining the cluster without the first candidate node as a cluster to be identified;
and analyzing and filtering the central nodes of the cluster to be identified, wherein the central nodes meeting the preset conditions are also third candidate nodes.
Another aspect of the present invention provides an apparatus for recognizing data, the apparatus including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a plurality of transaction records, and the transaction records are used for recording transaction behavior data among users;
the construction module is used for constructing a transaction network according to the transaction record, the nodes in the transaction network are the users, and edges connecting the two nodes in the transaction network represent that transaction behaviors exist between the users;
the first determination module is used for determining a plurality of first candidate nodes meeting set conditions from the transaction network, and the plurality of first candidate nodes form a first set;
the second determining module is used for segmenting the transaction network to obtain a plurality of sub-networks, determining a plurality of second candidate nodes according to the first set and the sub-networks, and forming a second set by the plurality of second candidate nodes;
a third determining module, configured to cluster nodes in the transaction network to obtain multiple clusters, determine multiple third candidate nodes according to the first set and the clusters, where the multiple third candidate nodes form a third set;
and the data processing module is used for determining the intersection of the second set and the third set, and the union of the intersection and the first set is a target node.
A further aspect of the invention provides a computer-readable storage medium having stored thereon a computer program for executing the identification method according to the invention.
Yet another aspect of the present invention provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the identification method.
In the above scheme of the present invention, a transaction network is constructed based on transaction records, a first set is formed by determining a plurality of first candidate nodes satisfying a set condition from the transaction network, a second set is formed by determining a plurality of second candidate nodes based on the first set and the transaction network, and a third set is formed by a plurality of third candidate nodes, and finally a target node is determined according to the first set, the second set and the third set.
Drawings
FIG. 1 shows a flow diagram of a method of identifying data;
FIG. 2 shows a schematic diagram of a transaction network;
FIG. 3 shows a schematic diagram of a sub-network;
fig. 4 shows a schematic structural diagram of a data recognition apparatus.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a data identification method according to an embodiment of the present invention, where the method includes:
step S101, collecting a plurality of transaction records, wherein the transaction records are used for recording transaction behavior data between users.
The transaction record is used for recording transaction behavior data between users, and the transaction behavior data comprises identification information of the users, such as identification numbers, account numbers, card numbers and the like, and also comprises flow direction of the transaction data between the users, size of the transaction data, transaction time and the like. For example, in the financial field, the transaction record may be a fund transaction of an electronic bank, and records a transaction amount, a transaction time, and the like between two bank cards, wherein the flow direction of the transaction data is from a transfer-out bank card to a transfer-in bank card, and the size of the transaction data is the transaction amount; for example, in the internet field, the transaction record may be a traffic transaction in the internet market, the two users are a media party and an advertising party, the flow direction of the transaction data is from the media party to the advertising party, and the size of the transaction data is the size of the transaction traffic; in the general consumption field, the transaction record may be an article transaction between a consumer and a merchant, wherein the flow of transaction data flows from the merchant to the consumer, and the size of the transaction data is the value of the transaction article, and may be a specific transaction fund, or may be in other forms besides the fund. The transaction record may also be a transaction occurring between a consumer and a consumer or between a merchant and a merchant, and the invention is not limited to specific transaction records.
Taking the financial field as an example, the transaction behavior data includes transaction time, transaction institution, client number of the bank, client gender of the bank, client age of the bank, client occupation of the bank, client relative information of the bank, bank card number, bank card type, transaction amount, transaction type, balance, abstract, whether cross-border transaction is performed, transaction mode, usage, transaction channel, country of transaction occurrence place, administrative region of transaction occurrence place, country of transaction going to country, administrative region of transaction going to administrative region, whether the opposite side is client of the bank, country of the opposite side financial institution, region of the opposite side, bank card number of the opposite side, client number of the opposite side, type of bank card of the opposite side, certificate of the opposite side, whether the opposite side is offshore client, and the like. Under different application scenarios, the specific content of the transaction record is different, that is, the specific content included in the transaction behavior data is different.
And S102, constructing a transaction network according to the transaction record, wherein the nodes in the transaction network are the users, and edges connecting the two nodes in the transaction network represent that transaction behaviors exist between the users.
The method comprises the steps that a transaction network is constructed based on transaction behavior data recorded by transaction records, nodes of the transaction network are data used for identifying uniqueness of a user, such as identity numbers, account names, card numbers and the like, edges of the transaction network are used for connecting two nodes with transaction behaviors, arrows of the edges point to indicate the flow direction of the transaction data, the number of the edges indicates the number of times of the transaction behaviors occurring between the two nodes, and the thickness of the edges indicates the size of the transaction data.
As shown in fig. 2, which is a schematic diagram of a transaction network constructed according to transaction records, taking the financial field as an example, nodes in the transaction network are bank account numbers used for identifying uniqueness of a user, such as bank card numbers where transactions occur, edges connecting two nodes indicate that a transaction occurs between two account numbers, directions of arrows on the edges point from a roll-out account number to a roll-in account number, the number of the edges indicates the number of times of transactions occur between two account numbers, the thickness of the edges indicates the size of a transaction amount between two account numbers, and the thickness of the edges is in direct proportion to the size of the transaction amount.
There is a connecting edge between C1 and C2 indicating that a transaction has occurred between C1 and C2, wherein the arrow on the edge points to C2 indicating that funds are transferred from C1 and into C2. There are 2 edges between account 3 and account 5, which indicates that 2 transactions occur between account 3 and account 5, where one transaction is a transfer of funds from account 3 to account 5, and the other transaction is a transfer of funds from account 5 to account 3, where the edge of account 3 transferred to account 5 is thicker, indicating that the amount of the transaction transferred from account 3 to account 5 is greater than the amount of the transaction transferred from account 5 to account 3.
Step S103, determining a plurality of first candidate nodes meeting set conditions from the transaction network, wherein the plurality of first candidate nodes form a first set.
In one example, determining a plurality of first candidate nodes from the trading network that satisfy a set condition includes:
and setting a plurality of conditions, wherein the nodes meeting all the conditions are the first candidate nodes.
The setting condition may be a rule set established for different application scenarios or different business processes according to domain knowledge or expert experience. For example, in the financial field, because the characteristics of the abnormal transaction mainly include the characteristics of centralized transfer-decentralized transfer-out, decentralized transfer-centralized transfer-out, small-amount try in a special time period, fast forward and fast forward of funds, and the like, in order to identify the account number of the abnormal transaction, the contents of the set conditions are as follows based on the characteristics of the abnormal transaction:
r1, the accumulated transaction times of the same account number are more than 10 times within 2 hours;
r2, the number of the same charge account charge-out related channels in 1 day is more than 2 or the number of the same charge account related channels in 1 day is more than 2; and the ratio of the transfer amount in 7 days to the transfer amount in 7 days is 95 to 105 percent;
r3, the transfer time points of the same account number in 1 year are concentrated in the range from 00:00 to 06:00, and the concentration is more than 20%;
r4, the ratio of the transfer amount in 1 day to the transfer amount in 1 day is between 90% and 110%, the ratio of the transfer amount in 1 year to the transfer amount in 1 year is between 90% and 110%, the ratio of the number of transfer-in pens in 1 day to the number of transfer-out pens in 1 day is more than 3, and the ratio of the number of transfer-in pens in 1 year to the number of transfer-out pens in 1 year is more than 3; or the concentration rate of the money concentrated in 10 yuan, 20 yuan, 50 yuan and 100 yuan within 1 year is more than 40%;
r5, the transaction of the account number in 1 year is concentrated in 3 months (concentration is 95%) and a sum test of not more than 100 yuan is carried out 3 months before the transaction; or the transactions of the account numbers are concentrated within 2 months (concentration is 95%) within 1 year and a sum of no more than 100 yuan is tested 2 months before the transactions; or the cumulative transfer-out amount in 1 year is more than or equal to 95 percent and less than or equal to 105 percent; or the time interval between two transactions of the same account is more than or equal to 180 days.
Similarly, other setting conditions can be formulated according to the characteristics of the abnormal transaction, for example, the accumulated amount of the accounts of the same account is more than 1000 ten thousand yuan within 1 year, the bank number of the accounts of the same account in the account entry and the account out related to the accounts of the same account is more than 5 within 1 year, and the accounts of the same account in the account entry and the account out related to the accounts of the same account are more than 50 within 1 year.
The first candidate node of the abnormal transaction is determined from the transaction network through the content of the set condition, each node in the transaction network traverses the set condition R1, the nodes meeting the set condition R1 sequentially traverse the set condition R2, the nodes meeting the set condition R2 traverse the set condition R3, the nodes meeting the set condition R3 traverse the set condition R4, the nodes meeting the set condition R4 traverse the set condition R5, and the nodes meeting the set conditions R1-R5 are the first candidate nodes, and all the first candidate nodes form a first set.
For example, 10000 users' transaction records are collected, and account numbers for identifying uniqueness of each user are sequentially recorded as C1-C10000, so that a transaction network comprising 10000 nodes can be constructed. Nodes in the trading network traverse the setting conditions R1 to R5, wherein 10 nodes meeting the setting conditions R1 to R5 simultaneously correspond to account numbers C15, C220, C1048, C2657, C3783, C4796, C5521, C7006, C8290 and C9047, respectively, so that C15, C220, C1048, C2657, C3783, C4796, C5521, C7006, C8290 and C9047 are first candidate nodes, and a first set of the first candidate nodes is { C15, C220, C1048, C2657, C3783, C4796, C5521, C7006, C900 and C9047 }.
Step S104, segmenting the transaction network to obtain a plurality of sub-networks, determining a plurality of second candidate nodes according to the first set and the sub-networks, and forming a second set by the plurality of second candidate nodes.
In one example, partitioning the transaction network into a plurality of sub-networks includes:
a plurality of nodes with transaction behaviors are divided into a sub-network, and for example, a connection graph algorithm or a high-density sub-network division algorithm is used for dividing a transaction network into a plurality of sub-networks.
In one example, the determining a plurality of second candidate nodes from the first set and the subnetwork includes:
searching a node in the sub-network, which has a transaction behavior with the first candidate node, and then determining the node as a second candidate node;
if the sub-network does not have a node with a transaction behavior with the first candidate node, searching a node to be selected which has a common node with the first candidate node in the sub-network, and if the ratio of the number of the common nodes of the node to be selected and the first candidate node to the total number of the edges of the node to be selected exceeds a preset threshold, determining that the node to be selected is a second candidate node.
And the node which has transacted with the candidate node and the first candidate node is a common node.
For a sub-network with a first candidate node, before searching whether a node transacting with the first candidate node exists in the sub-network, the sub-network is marked, the marking method may adopt a coloring algorithm, and the coloring algorithm may select a community discovery algorithm or a label propagation algorithm.
Fig. 3 is a schematic diagram of one of the subnetworks obtained by dividing the transaction network, and the subnetworks are described in detail by taking fig. 3 as an example, where a plurality of nodes in the subnetwork have transaction behaviors.
In fig. 3, C220 is an account with a known abnormal transaction, that is, a first candidate node determined according to a set condition, and nodes in the graph that have a direct transaction with C220 are both second candidate nodes, for example, the fund source of C220 mainly comes from C185, so C185 is a second candidate node; funds of the C220 account are distributed and transferred to five accounts, namely C1340, C2890, C3220, C4467 and C5793, so that the accounts C1340, C2890, C3220, C4467 and C5793 are second candidate nodes.
In fig. 3, C688 is a node which is not marked with an abnormal transaction label and has not directly transacted with C220, but C688 has transacted with three nodes, namely C1340, C4467 and C5793, so C1340, C4467 and C5793 are common transaction nodes between C220 and C688, and assuming that the preset threshold is 0.5, since 4 nodes transact with C668, 3 nodes have transacted with C220 in the 4 nodes, the number of common transaction nodes is 3/4 which is the number of nodes transacting with C668, and exceeds 0.5, so C688 is also a second candidate node. The second set of second candidate nodes is therefore { C688, C1340, C2890, C3220, C4467, C5793 }.
Step S105, clustering nodes in the transaction network to obtain a plurality of clusters, determining a plurality of third candidate nodes according to the first set and the clusters, and forming a third set by the plurality of third candidate nodes.
In one example, the clustering nodes in the trading network into a plurality of clusters includes:
and clustering a plurality of nodes meeting the similarity requirement into a cluster according to the similarity calculation result of the nodes in the transaction network.
In the transaction network, according to the contents of transaction behavior data, such as age, gender, contact information, identity card number, home address, work unit and the like, a deep learning model is used for carrying out graph Node embedding learning to obtain a Node characterization vector, and the deep learning model can adopt a Node2Vec model, a GraphSage model, an EGES model and the like. The method includes obtaining a characterization vector of each node through graph embedding learning, then clustering the nodes in a transaction network by using an unsupervised clustering method to obtain a plurality of clusters, wherein the unsupervised clustering method can adopt a k-means algorithm and the like, and the method is not particularly limited.
In one example, determining a plurality of third candidate nodes from the first set and the cluster includes:
determining the cluster in which the first candidate node is positioned as a target cluster;
and a node in the target cluster, which has a transaction behavior with the first candidate node and has a set relationship with the first candidate node, is a third candidate node.
The set relationship refers to a relationship except for a transaction behavior between the users of the two nodes, and whether a connection exists between the two users is further identified according to the content of the transaction behavior data. For example, in the financial field, the set relationship may be a relationship between users of two nodes or a relationship between coworkers, and it is determined whether the set relationship exists between the two users according to data such as information of relatives and work units in the transaction behavior data, and in other fields, the set relationship may be a common interest relationship between the two users.
For example, the account C220 is a first candidate node determined to satisfy a set condition, and a cluster including the account C220 is a target cluster. Assuming that the C1340, C3526, C4467 and C220 in the target cluster have traded and there is a relationship between the C1340, C3526, C4467 and C220, the C1340, C3526, C4467 are determined as a third candidate node.
In one example, the determining a plurality of third candidate nodes from the first set and the cluster further comprises:
determining the cluster without the first candidate node as a cluster to be identified;
and analyzing and filtering the central nodes of the cluster to be identified, wherein the central nodes meeting the preset conditions are third candidate nodes.
If the first candidate node does not exist in the cluster, the cluster is a cluster to be identified, and the purpose of analyzing and filtering the cluster to be identified is to avoid the node with abnormal transaction in the cluster to be identified. A plurality of preset conditions are set, and a node satisfying one of the preset conditions is determined to be a third candidate node, where the preset conditions in this embodiment are as follows:
(1) the proportion of the transaction times of 0-6 points in the morning of the transaction time of the central node to the total transaction times is greater than a threshold value, the threshold value is assumed to be 0.7, if the proportion is greater than 0.7, abnormal transactions exist in the central node, and the central node is a third candidate node;
(2) the ratio of the funds transferred in and out of the central node is greater than a threshold value, the threshold value is assumed to be 0.8, if the ratio is greater than 0.8, the central node is judged to have abnormal transactions, and the central node is a third candidate node;
(3) the central node has short-time high-frequency transaction characteristics, the transaction frequency ratio is greater than a threshold value, the threshold value is assumed to be 0.8, if the transaction frequency ratio is greater than 0.8, the central node is judged to have abnormal transactions, and the central node is a third candidate node;
(4) whether the fund source of the central node is inconsistent with the age and occupation or not, if not, judging that the central node has abnormal transactions, and the central node is a third candidate node;
(5) and if the central node transaction channel has frequent short-time and large-time transactions paid with a third party, judging that the central node has abnormal transactions, and taking the central node as a third candidate node.
Assuming that the total number of transactions of the central node C7125 in the cluster to be identified is 20, and the transaction time at 0-6 points is 15, the C7125 satisfies the preset condition (1), and the C7125 is a third candidate node, so that the third intersection composed of the third candidate node is { C1340, C3526, C4467, C7125 }.
And S106, determining the intersection of the second set and the third set, wherein the union of the intersection and the first set is a target node.
According to the above example, the first set is { C15, C220, C1048, C2657, C3783, C4796, C5521, C7006, C8290, C9047}, the second set is { C688, C1340, C2890, C3220, C4467, C5793}, and the third set is { C1340, C3526, C4467, C7125 }. Wherein the intersection of the second collection and the third collection is { C1340, C4467}, and the union of the intersection and the first collection is { C15, C220, C1048, C1340, C2657, C3783, C4467, C4796, C5521, C7006, C8290, C9047}, so that all nodes in the union are exception nodes, i.e. target nodes.
In the above scheme of the present invention, a transaction network is constructed by transaction behavior data recorded in a transaction record, a first set is formed by determining a plurality of first candidate nodes satisfying a set condition from the transaction network, a second set is formed by determining a plurality of second candidate nodes and a third set is formed by a plurality of third candidate nodes based on the first set and the transaction network, and finally a target node is determined according to the first set, the second set and the third set. The invention fully considers the relation between nodes and edges in the transaction network and identifies the second candidate node and the third candidate node on the basis of the first candidate node. As only the transaction data is checked according to expert experience in the prior art, only the first candidate node can be identified, compared with the prior art, the scheme can reduce the rate of missing report of the data, thereby improving the accuracy of data identification. For example, for a transaction scene of an electronic bank, the identification method of the scheme can improve the accuracy of identifying the abnormal transaction account number. For a scene with huge transaction behavior data, for example, an electronic bank has a large amount of transaction data every day, if manual examination and identification are performed on the transaction data only by depending on the experience of a business expert, the working efficiency is very low; the invention identifies the abnormal transaction node by traversing and analyzing the nodes by the automatic identification method and depending on the transaction behaviors of the nodes and the nodes in the transaction network, therefore, the identification method of the invention also has the effect of high identification efficiency.
Fig. 4 is a schematic diagram of an apparatus for identifying data according to an embodiment of the present invention, where the apparatus includes:
the collection module 201 is configured to collect a plurality of transaction records, where the transaction records are used to record transaction behavior data between users.
A building module 202, configured to build a transaction network according to the transaction record, where a node in the transaction network is the user, and an edge connecting two nodes in the transaction network indicates that a transaction behavior exists between users.
The first determining module 203 is configured to determine a plurality of first candidate nodes from the trading network, where the plurality of first candidate nodes meet a set condition, and the plurality of first candidate nodes form a first set.
In one example, the determining a plurality of first candidate nodes satisfying a set condition from the trading network includes:
the setting conditions are multiple, and the nodes meeting all the setting conditions in the transaction network are first candidate nodes. The setting condition may be a rule set established for different application scenarios or different business processes according to domain knowledge or expert experience.
A second determining module 204, configured to segment the transaction network to obtain a plurality of subnetworks, determine a plurality of second candidate nodes according to the first set and the subnetworks, where the plurality of second candidate nodes form a second set.
In one example, the partitioning the transaction network into a plurality of sub-networks includes:
a plurality of nodes with transaction behaviors are divided into a sub-network.
A third determining module 205, configured to cluster nodes in the transaction network to obtain a plurality of clusters, and determine a plurality of third candidate nodes according to the first set and the clusters, where the plurality of third candidate nodes form a third set.
In one example, the clustering nodes in the trading network into a plurality of clusters includes:
and clustering a plurality of nodes meeting the similarity requirement into a cluster according to the similarity calculation result of the nodes in the transaction network.
A data processing module 206, configured to determine an intersection of the second set and the third set, where a union of the intersection and the first set is a target node.
In one example, the second determining module 204 is specifically configured to:
searching a node in the sub-network, which has a transaction behavior with the first candidate node, and then determining the node as a second candidate node;
if the sub-network does not have a node with a transaction behavior with the first candidate node, searching a node to be selected which has a common node with the first candidate node in the sub-network, and if the ratio of the number of the common nodes of the node to be selected and the first candidate node to the total number of the edges of the node to be selected exceeds a preset threshold, determining that the node to be selected is a second candidate node.
In an example, the third determining module 205 is specifically configured to:
determining the cluster in which the first candidate node is positioned as a target cluster;
and a node in the target cluster, which has a transaction behavior with the first candidate node and has a set relationship with the first candidate node, is a third candidate node.
A third determining module 205, further configured to:
determining the cluster without the first candidate node as a cluster to be identified;
and analyzing and filtering the central nodes of the cluster to be identified, wherein the central nodes meeting the preset conditions are third candidate nodes.
A further aspect of the invention provides a computer-readable storage medium having stored thereon a computer program for executing the identification method according to the invention.
Yet another aspect of the present invention provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the identification method.
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to the various embodiments of the present application described in the "exemplary methods" section of this specification, above.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present application described in the "exemplary methods" section above of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (4)

1. A method for identifying data, the method comprising:
collecting a plurality of transaction records, wherein the transaction records are used for recording transaction behavior data among users;
establishing a transaction network according to the transaction record, wherein a node in the transaction network is the user, and an edge connecting two nodes in the transaction network represents that a transaction behavior exists between the users;
determining a plurality of first candidate nodes meeting set conditions from the transaction network, wherein the plurality of first candidate nodes form a first set, and the set conditions are rule sets established for different application scenes or different business processes according to domain knowledge or expert experience;
segmenting the transaction network into a plurality of sub-networks, determining a plurality of second candidate nodes according to the first set and the sub-networks, wherein the second candidate nodes form a second set, and the method comprises the following steps: dividing a plurality of nodes with transaction behaviors into a sub-network; searching a node in the sub-network, which has a transaction behavior with the first candidate node, and then determining the node as a second candidate node; if the sub-network does not have a node with a transaction behavior with the first candidate node, searching a node to be selected which has a common node with the first candidate node in the sub-network, and if the ratio of the number of the common nodes of the node to be selected and the first candidate node to the total number of the edges of the node to be selected exceeds a preset threshold, determining that the node to be selected is a second candidate node;
clustering nodes in the transaction network to obtain a plurality of clusters, determining a plurality of third candidate nodes according to the first set and the clusters, wherein the third candidate nodes form a third set, and the method comprises the following steps: clustering a plurality of nodes meeting the similarity requirement into a cluster according to the similarity calculation result of the nodes in the transaction network; determining the cluster in which the first candidate node is positioned as a target cluster; a node in the target cluster, which has a transaction behavior with the first candidate node and has a set relationship with the first candidate node, is a third candidate node; determining the cluster without the first candidate node as a cluster to be identified; analyzing and filtering the central nodes of the cluster to be identified, wherein the central nodes meeting preset conditions are third candidate nodes;
and determining the intersection of the second set and the third set, wherein the union of the intersection and the first set is a target node.
2. An apparatus for identifying data, the apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a plurality of transaction records, and the transaction records are used for recording transaction behavior data among users;
the construction module is used for constructing a transaction network according to the transaction record, the nodes in the transaction network are the users, and edges connecting the two nodes in the transaction network represent that transaction behaviors exist between the users;
the first determination module is used for determining a plurality of first candidate nodes meeting set conditions from the transaction network, wherein the plurality of first candidate nodes form a first set, and the set conditions are rule sets established for different application scenes or different business processes according to domain knowledge or expert experience;
a second determining module, configured to segment the transaction network to obtain a plurality of subnetworks, determine a plurality of second candidate nodes according to the first set and the subnetworks, where the plurality of second candidate nodes form a second set, and the second determining module includes: dividing a plurality of nodes with transaction behaviors into a sub-network; searching a node in the sub-network, which has a transaction behavior with the first candidate node, and then determining the node as a second candidate node; if the sub-network does not have a node with a transaction behavior with the first candidate node, searching a node to be selected which has a common node with the first candidate node in the sub-network, and if the ratio of the number of the common nodes of the node to be selected and the first candidate node to the total number of the edges of the node to be selected exceeds a preset threshold, determining that the node to be selected is a second candidate node;
a third determining module, configured to cluster nodes in the transaction network to obtain a plurality of clusters, and determine a plurality of third candidate nodes according to the first set and the clusters, where the plurality of third candidate nodes form a third set, where the third determining module includes: clustering a plurality of nodes meeting the similarity requirement into a cluster according to the similarity calculation result of the nodes in the transaction network; determining the cluster in which the first candidate node is positioned as a target cluster; a node in the target cluster, which has a transaction behavior with the first candidate node and has a set relationship with the first candidate node, is a third candidate node; determining the cluster without the first candidate node as a cluster to be identified; analyzing and filtering the central nodes of the cluster to be identified, wherein the central nodes meeting preset conditions are third candidate nodes;
and the data processing module is used for determining the intersection of the second set and the third set, and the union of the intersection and the first set is a target node.
3. A computer-readable storage medium, which stores a computer program for executing the identification method of claim 1.
4. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the identification method of claim 1.
CN202111381734.5A 2021-11-22 2021-11-22 Data identification method and device, storage medium and electronic equipment Active CN113837874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111381734.5A CN113837874B (en) 2021-11-22 2021-11-22 Data identification method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111381734.5A CN113837874B (en) 2021-11-22 2021-11-22 Data identification method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113837874A CN113837874A (en) 2021-12-24
CN113837874B true CN113837874B (en) 2022-04-12

Family

ID=78971465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111381734.5A Active CN113837874B (en) 2021-11-22 2021-11-22 Data identification method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113837874B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN108228706A (en) * 2017-11-23 2018-06-29 中国银联股份有限公司 For identifying the method and apparatus of abnormal transaction corporations
CN110647590A (en) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 Target community data identification method and related device
CN111445320A (en) * 2020-03-30 2020-07-24 深圳市华云中盛科技股份有限公司 Target community identification method and device, computer equipment and storage medium
WO2021189730A1 (en) * 2020-03-27 2021-09-30 深圳壹账通智能科技有限公司 Method, apparatus and device for detecting abnormal dense subgraph, and storage medium
CN113487427A (en) * 2021-04-20 2021-10-08 微梦创科网络科技(中国)有限公司 Transaction risk identification method, device and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107785058A (en) * 2017-07-24 2018-03-09 平安科技(深圳)有限公司 Anti- fraud recognition methods, storage medium and the server for carrying safety brain
CN108228706A (en) * 2017-11-23 2018-06-29 中国银联股份有限公司 For identifying the method and apparatus of abnormal transaction corporations
CN110647590A (en) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 Target community data identification method and related device
WO2021189730A1 (en) * 2020-03-27 2021-09-30 深圳壹账通智能科技有限公司 Method, apparatus and device for detecting abnormal dense subgraph, and storage medium
CN111445320A (en) * 2020-03-30 2020-07-24 深圳市华云中盛科技股份有限公司 Target community identification method and device, computer equipment and storage medium
CN113487427A (en) * 2021-04-20 2021-10-08 微梦创科网络科技(中国)有限公司 Transaction risk identification method, device and system

Also Published As

Publication number Publication date
CN113837874A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN111126828A (en) Knowledge graph-based multilayer fund abnormal flow direction monitoring method
US20160364794A1 (en) Scoring transactional fraud using features of transaction payment relationship graphs
CN106453357A (en) Network ticket buying abnormal behavior recognition method and system and equipment
CN104915879A (en) Social relationship mining method and device based on financial data
CN111368147B (en) Graph feature processing method and device
CN111429258A (en) Method and device for monitoring loan fund flow direction
CN114119137A (en) Risk control method and device
CN112116464B (en) Abnormal transaction behavior analysis method and system based on event sequence frequent item set
CN110060053B (en) Identification method, equipment and computer readable medium
Adusei The finance–growth nexus: Does risk premium matter?
CN114297448A (en) License applying method, system and medium based on intelligent epidemic prevention big data identification
CN113689218A (en) Risk account identification method and device, computer equipment and storage medium
CN111798304A (en) Risk loan determination method and device, electronic equipment and storage medium
CN113506113B (en) Credit card cash-registering group-partner mining method and system based on associated network
CN110796539A (en) Credit investigation evaluation method and device
CN109657148A (en) For abnormal operation recognition methods, device, server and the medium for reporting POI
CN112950290A (en) Mining method and device for economic dependence clients, storage medium and electronic equipment
CN113837874B (en) Data identification method and device, storage medium and electronic equipment
CN113870021B (en) Data analysis method and device, storage medium and electronic equipment
CN112819476A (en) Risk identification method and device, nonvolatile storage medium and processor
CN114820219B (en) Complex network-based fraud community identification method and system
CN112241820A (en) Risk identification method and device for key nodes in fund flow and computing equipment
CN113469696A (en) User abnormality degree evaluation method and device and computer readable storage medium
CN110570301B (en) Risk identification method, device, equipment and medium
Hatami et al. Evaluating portfolio performance by highlighting network property and the sharpe ratio in the stock market

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant