CN113870021A - Data analysis method and device, storage medium and electronic equipment - Google Patents

Data analysis method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN113870021A
CN113870021A CN202111460485.9A CN202111460485A CN113870021A CN 113870021 A CN113870021 A CN 113870021A CN 202111460485 A CN202111460485 A CN 202111460485A CN 113870021 A CN113870021 A CN 113870021A
Authority
CN
China
Prior art keywords
network
nodes
transaction
behavior data
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111460485.9A
Other languages
Chinese (zh)
Other versions
CN113870021B (en
Inventor
郭翊麟
孙悦
蔡准
郭晓鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Trusfort Technology Co ltd
Original Assignee
Beijing Trusfort Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Trusfort Technology Co ltd filed Critical Beijing Trusfort Technology Co ltd
Priority to CN202111460485.9A priority Critical patent/CN113870021B/en
Publication of CN113870021A publication Critical patent/CN113870021A/en
Application granted granted Critical
Publication of CN113870021B publication Critical patent/CN113870021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data analysis method, a data analysis device, a storage medium and electronic equipment, wherein the method comprises the following steps: collecting transaction behavior data of a plurality of users; determining users meeting a first set condition according to the transaction behavior data; constructing a transaction network according to transaction behaviors among users; segmenting a transaction network to obtain a plurality of sub-networks; determining a sub-network where nodes meeting a first set condition are located as a target network, and if the number of the nodes of the target network is less than a threshold value, determining all the nodes in the target network to form a first target set; if the sub-networks do not have marked nodes, dividing the sub-networks to obtain a plurality of community networks; and if the nodes meeting the second set condition exist in the community network, determining that all the nodes in the community network form a second target set. By adopting the method, the analysis efficiency and the accuracy of the target result can be improved.

Description

Data analysis method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a method and an apparatus for analyzing data, a storage medium, and an electronic device.
Background
With the continuous development of the internet, electronic banking has become one of the main competitive means of banking channels and marketing, and the network electronic banking brings convenience to people and provides a new channel for illegal trading. At present, the illegal transaction means gradually develop towards the direction of grouping and specialization, so that the analysis difficulty of the illegal transaction is increased.
At present, the identification and analysis of illegal transactions mainly depend on manual examination, business personnel analyze the transaction records of a certain account number, judge whether the account number is abnormal according to certain rules and experience, and mark the account number of the abnormal transaction.
However, the method can only analyze whether a single account is abnormal, and cannot dig out other accounts with similar behaviors to the account, and for a huge business scene, the manual auditing mode is slow, so that the method relying on manual auditing has the problems of low account accuracy and low efficiency.
Disclosure of Invention
The invention provides a data analysis method, a data analysis device, a storage medium and electronic equipment, which aim to improve the data analysis efficiency and accuracy.
One aspect of the present invention provides a method for analyzing data, including:
collecting transaction behavior data of a plurality of users;
determining users meeting a first set condition according to the transaction behavior data;
establishing a trading network according to trading behaviors between the users, wherein a node of the trading network is the user, and an edge for connecting two nodes in the trading network represents that the trading behaviors exist between the two nodes;
segmenting the transaction network to obtain a plurality of sub-networks;
determining a sub-network with nodes meeting a first set condition as a target network, and if the number of the nodes of the target network is less than a threshold value, determining that all the nodes in the target network form a first target set;
dividing sub-networks without nodes meeting a first set condition to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
In an embodiment, after collecting transaction behavior data of a plurality of users, the method further comprises: cleansing the transaction behavior data, comprising:
processing the missing value of the transaction behavior data, and determining the missing proportion of the transaction behavior data;
if the missing proportion is larger than a proportion threshold value, deleting the transaction behavior data;
and if the missing proportion is smaller than a proportion threshold value, filling the transaction behavior data.
In an embodiment, the determining, according to the transaction behavior data, a user who meets a first set condition includes:
the first setting conditions are multiple, and the users with the transaction behavior data meeting all the first setting conditions are determined to be the users meeting the first setting conditions.
In an embodiment, the partitioning the transaction network into a plurality of sub-networks includes:
a plurality of nodes with transaction behaviors are divided into a sub-network.
In one embodiment, the method further comprises:
if the number of the nodes of the target network is larger than a threshold value, dividing the target network to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
In one embodiment, the second setting condition is multiple, and the user whose transaction behavior data satisfies at least one of the second setting conditions is determined to be the user who satisfies the second setting condition.
In an embodiment, the dividing the sub-network into a plurality of community networks includes:
and calculating the similarity among the nodes in the sub-network, and dividing the nodes with the similarity meeting a threshold value into a community network.
Another aspect of the present invention provides an apparatus for analyzing data, the apparatus comprising:
the acquisition module is used for acquiring transaction behavior data of a plurality of users;
the marking module is used for determining users meeting first set conditions according to the transaction behavior data;
the construction module is used for constructing a transaction network according to transaction behaviors among the users, wherein nodes of the transaction network are the users, and edges for connecting two nodes in the transaction network represent that the transaction behaviors exist between the two nodes;
the segmentation module is used for segmenting the transaction network to obtain a plurality of sub-networks;
the first determining module is used for determining that a sub-network with nodes meeting a first set condition exists as a target network, and if the number of the nodes of the target network is smaller than a threshold value, determining that all the nodes in the target network form a first target set;
the dividing module is used for dividing the sub-networks without the nodes meeting the first set condition to obtain a plurality of community networks;
and the second determining module is used for determining that all the nodes in the community network form a second target set if the nodes meeting a second set condition exist in the community network.
In an embodiment, the acquisition module is further configured to:
processing the missing value of the transaction behavior data, and determining the missing proportion of the transaction behavior data;
if the missing proportion is larger than a proportion threshold value, deleting the transaction behavior data;
and if the missing proportion is smaller than a proportion threshold value, filling the missing transaction behavior data.
In an embodiment, the first determining module is further configured to:
if the number of the nodes of the target network is larger than a threshold value, dividing the target network to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
Yet another aspect of the present invention provides a computer-readable storage medium storing a computer program for executing the analysis method according to the present invention.
Yet another aspect of the present invention provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the analysis method.
In the scheme of the invention, the users meeting the first setting condition are marked by setting the first setting condition, so that the number of the user marks meeting the money laundering characteristic is increased, and an auxiliary effect is provided for the analysis and mining of subsequent group result. And constructing a transaction network according to the transaction behavior data of the users and the transaction behaviors among the users, and analyzing and excavating a group result, namely a target result, which accords with the characteristics of the group plan from the transaction network by utilizing the relevance between nodes in the transaction network. By adopting the scheme of the invention, as the transaction network is divided into the transaction networks in a continuously refined manner, the network processing modes of different division results are different, so that the analysis accuracy of the target result is improved to a certain extent, and the analysis efficiency of the target result is also improved.
Drawings
FIG. 1 is a flow chart illustrating a method for analyzing data according to an embodiment of the present invention;
FIG. 2 shows a schematic of the architecture of a trading network;
FIG. 3 shows a schematic of the structure of a subnetwork;
FIG. 4 shows a schematic of the structure of a community network divided by sub-networks;
fig. 5 is a schematic structural diagram of a data analysis apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a data analysis method according to an embodiment of the present invention, where the method includes:
step S101, collecting transaction behavior data of a plurality of users.
The transaction behavior data comprises identification information of a user, two parties of a transaction, time of occurrence of the transaction behavior, flow direction of the transaction data, size of the transaction data, type of the transaction behavior, channel of the transaction behavior and the like, and under different scenes, the content of the transaction behavior data is different. For example, in the field of electronic banking, transaction data is funds, identification information of a user is an identity card number, an account name, a bank account number, a home address, a working unit, an age, a sex and the like, two parties of a transaction are two bank account numbers or two accounts, the flow direction of the transaction data is the flow direction of the funds, the funds are transferred from a transfer-out account number to a transfer-in account number, the size of the transaction data is the size of the transaction funds, the types of transaction behaviors are internet protocol payment, customer account transfer, consumption and the like, and a transaction channel is a specific transfer-out mode of the funds. For example, in the consumption field, the two transaction parties are a merchant and a consumer, the transaction data is an article, wherein the flow of the transaction data flows from the merchant to the consumer, and the size of the transaction data is the value of the transaction article, and may be a specific transaction fund or other forms besides the fund.
In one example, after collecting transaction behavior data of a plurality of users, cleaning the transaction behavior data comprises:
processing the missing value of the transaction behavior data, and determining the missing proportion of the transaction behavior data;
if the missing proportion is larger than a proportion threshold value, deleting the transaction behavior data;
and if the missing proportion is smaller than a proportion threshold value, filling the transaction behavior data.
The missing proportion of the data is determined by counting the missing condition of each field in the transaction behavior data, and different fields are given different weights, for example, in the transaction behavior data, the flow direction of the transaction data and the size of the transaction data are important indexes for judging whether the transaction behavior is normal, so the weight given to the flow direction of the transaction data and the size of the transaction data is high. The identity card number, account name and bank account number included in the user identification information are important for transaction behavior, so that the assigned weight is high. For example, the home address, the work unit, and the sex are not true, and the home address, the work unit, and the sex are not necessary for determining whether the transaction behavior is abnormal, and thus the weight is given to the home address, the work unit, and the sex. When the missing proportion is larger than the proportion threshold, important data content such as the size of the missing transaction data and the missing bank account number is indicated to be missing in the transaction behavior data, and in this case, the transaction behavior data can be deleted. When the missing proportion is smaller than the proportion threshold value, it is indicated that the transaction behavior data is missing data with smaller weight, such as missing user gender and work unit information, in this case, the missing transaction behavior data can be filled, and the missing related data can be supplemented by looking up the original record.
By cleaning the collected transaction behavior data, the validity of the data can be improved, and the invalid transaction behavior data is prevented from participating in the analysis process and interfering the analysis result. The missing value processing is carried out on the transaction behavior data, the missing condition of the data can be rapidly determined, the data cleaning efficiency can be improved, and the data cleaning effect can be improved.
And step S102, determining users meeting first set conditions according to the transaction behavior data.
In one example, determining from the transaction behavior data a user that satisfies a first set of conditions includes:
the first setting conditions are provided with a plurality of conditions, and users meeting all the second setting conditions are marked.
The first setting condition is a rule set established for different application scenarios or different business process specifications according to domain knowledge or expert experience. For example, in the financial field, the abnormal transactions are mainly characterized by short-time high-frequency transactions, fast-in fast-out transactions, special transaction time and the like, and based on the characteristics of the abnormal transactions, the contents of the first setting conditions are as follows:
(a) the accumulated transaction times of the same account number are more than 10 within 2 hours;
(b) the number of the same charge account charge-out related channels is more than 2 within 1 day or the number of the same charge account related channels is more than 2 within 1 day; and the ratio of the transfer amount in 7 days to the transfer amount in 7 days is 95 to 105 percent;
(c) the transfer time points of the same account number in 1 year are concentrated at 00: 00-06: 00, and the concentration is more than 20%;
(d) the ratio of the transfer amount in 1 day to the transfer amount in 1 day is between 90% and 110%, the ratio of the transfer amount in 1 year to the transfer amount in 1 year is between 90% and 110%, the ratio of the number of transfer-in strokes in 1 day to the number of transfer-out strokes in 1 day is greater than 3, and the ratio of the number of transfer-in strokes in 1 year to the number of transfer-out strokes in 1 year is greater than 3; or the concentration rate of the money concentrated in 10 yuan, 20 yuan, 50 yuan and 100 yuan within 1 year is more than 40 percent
The first setting condition is not limited to the above (a) to (d), and may include:
(e) the number of accounts related to the charge-off of the same account in 1 year is more than or equal to 50, and the number of banks related to the charge-off of the same account in 1 year is more than or equal to 10;
(f) the posting amount of the same public account is more than or equal to 40 ten thousand yuan, all funds of the account are transferred within 7 days after the posting, and the number of posting accounts of the same public account is more than or equal to 10 within 7 days after the posting;
(g) the accumulated amount of the accounts to be checked in is more than or equal to 1000 ten thousand yuan within 1 year, the bank line number of the bank numbers of the check-in related check-out accounts of the same account within 1 year is more than or equal to 5, and the check-in related check-out accounts of the same account within 1 year is more than or equal to 50;
the specific content of the first setting condition of the present invention may be other content according to the characteristics of the abnormal transaction besides the conditions (a) to (g), and the first setting condition of the present embodiment includes the contents listed in (a) to (d). Traversing all users, determining whether the transaction behavior data of the users meet all first set conditions or not based on the transaction behavior data of the users, and if the transaction behavior data of the users meet all the first set conditions, marking the users, wherein the marked content is the target users, namely users with abnormal transaction behaviors. If the transaction behavior data of the user does not meet the first set condition or only meets one or some of the first set conditions, the user is not marked. The target users include users screened by the first setting condition and users with known risks, for example, before the transaction behavior data of the user a is collected, the business expert has already reviewed the transaction behavior data of the user a, and finds that the user a has abnormal transaction behavior, so the user a is labeled as the target user.
For example, transaction behavior data of 10000 users are collected and marked as U1 to U10000, wherein U50, U880 and U3200 are users which are analyzed and determined by a service expert in an auditing process and have abnormal transaction behaviors. Users who satisfy all the first setting conditions (a) to (d) are U100, U1708, U2546, U3800, U4270, U5001, U5566, U6893, U7198, U7715, U8239, U8976; the target user thus labeled is { U50, U100, U880, U1708, U2546, U3200, U3800, U4270, U5001, U5566, U6893, U7198, U7715, U8239, U8976 }.
Step S103, a trading network is constructed according to trading behaviors between the users, wherein nodes of the trading network are the users, and edges for connecting two nodes in the trading network represent that the trading behaviors exist between the two nodes;
the method comprises the steps that a transaction network is constructed according to transaction behavior data of a plurality of users, nodes of the transaction network are data used for identifying uniqueness of the users, such as identity numbers, account names, card numbers and the like, edges of the transaction network are used for connecting two nodes with transaction behaviors, arrows of the edges point to indicate flow directions of transaction line data, the number of the edges indicates the number of times of the transaction behaviors occurring between the two nodes, and the thickness of the edges indicates the size of the transaction data.
Fig. 2 is a schematic diagram of a transaction network constructed according to transaction records, taking the financial field as an example, nodes in the transaction network are bank account numbers used for identifying uniqueness of a user, such as bank card numbers where transactions occur, edges connecting two nodes indicate that a transaction occurs between two account numbers, directions of arrows on the edges point from a roll-out account number to a roll-in account number, the number of the edges indicates the number of times of transactions between two account numbers, the thickness of the edges indicates the size of a transaction amount between two account numbers, and the thicker the edges indicate that the transaction amount is larger.
For example, 10000 transaction behavior data of users are collected, 10000 nodes of the transaction network are respectively marked as U1 to U10000, and an edge of the transaction network is constructed according to the transaction behaviors among the users, wherein an arrow of the edge is directed from a roll-out party to a roll-in party. The user U100 and the user U200 have a connected edge, which indicates that a transaction occurs between the user U100 and the user U200, and the user U100 and the user U300 have no connected edge, which indicates that no transaction occurs between the user U100 and the user U300.
By constructing the transaction network, the transaction behaviors among the users are displayed more clearly in the form of the transaction network, and the relationship between the users can be analyzed from the huge transaction network according to the relationship between the nodes.
And step S104, segmenting the transaction network to obtain a plurality of sub-networks.
In one example, partitioning the transaction network into a plurality of sub-networks includes:
the method comprises the steps of dividing a plurality of nodes with transaction behaviors in a sub-network, for example, dividing the transaction network into a plurality of sub-networks through a connected graph algorithm or a high-density sub-network division algorithm.
Step S105, determining that the sub-network with the nodes meeting the first set condition is a target network, and if the number of the nodes of the target network is less than a threshold value, determining that all the nodes in the target network form a first target set.
If an annotated node is present in a sub-network, the sub-network is determined to be the target network, for example, when one or more of the above-mentioned annotated users U50, U100, U880, U1708, U2546, U3200, U3800, U4270, U5001, U5566, U6893, U7198, U7715, U8239, U8976 are present in the sub-network, the sub-network is the target network. Setting the threshold value to 60, assuming that there are 50 nodes in the target network, and there are 5 labeled nodes, then taking the target network where these 50 nodes are located as the first target set, i.e. the group result with group committal risk.
And step S106, dividing the sub-networks without the nodes meeting the first set condition to obtain a plurality of community networks.
In one example, dividing the sub-network into a plurality of community networks includes:
and calculating the similarity among the nodes in the sub-network, and dividing the nodes with the similarity meeting a threshold value into a community network.
A community discovery algorithm is adopted to divide the sub-network into a plurality of communities, wherein close transaction behaviors exist among a plurality of nodes divided in one community. The community discovery algorithm can adopt one of a Louvain algorithm, a label propagation algorithm, a Kernighan-Lin algorithm, a spectrum dichotomy algorithm, a GN algorithm and a Newman algorithm. In this embodiment, a GN algorithm is adopted to divide a sub-network, and the specific steps include:
(1) calculating the edge betweenness of each edge in the sub-network;
(2) deleting the side with the largest edge betweenness;
(3) recalculating edge betweenness of the residual edges in the target network;
(4) calculating the modularity of each community network, and if the modularity meets a threshold value, ending the division; and (4) if the modularity does not meet the threshold, repeating the steps (2) to (4) until the modularity meets the threshold requirement.
As shown in fig. 4, a schematic diagram of a plurality of community networks obtained by dividing a sub-network is provided, and it is assumed that a sub-network is divided into four community networks by a community discovery algorithm, where transaction behaviors among the four community networks are sparse, and a close transaction behavior exists among nodes in each community network.
Step S107, if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
The second setting condition is set by professional knowledge in the field or business experts based on experience, taking electronic bank fund transaction as an example, and the second setting condition is as follows:
(1) if the transaction time of the node user is abnormal, for example, the proportion of the transaction times of 0-6 points in the morning of the transaction time of the node to the total transaction times is more than 0.6, the node is proved to have abnormal transaction behaviors;
(2) the trading behavior of the node user has the characteristic of fast forward and fast forward, for example, the node transfers funds of over 1000 ten thousand in 1 hour, the transferred funds are transferred to different account numbers in 2 hours, and the node meeting the condition has abnormal trading behavior;
(3) the transaction behavior of the node user has the characteristic of inconsistent daily consumption, such as the fact that the fund source of the node is inconsistent with the age and occupation of the node, and the node has abnormal transaction behavior;
besides, the second setting condition may be set to other contents, for example, more than 10 funds are set to be credited within one hour according to the characteristics of short time and high frequency, and the contents of the second setting condition are not particularly limited by the present invention.
And if one node in the community network meets the condition (1), forming a second target set by all the nodes in the community network. For example, 10 nodes are shared in the community network, which indicates that there is abnormal transaction behavior between the 10 nodes, i.e. there is a risk of a group partner scenario, and therefore the 10 nodes are output as a group partner result.
In one example, after determining that the sub-network in which the labeled node exists is the target network, the method further includes:
if the number of the nodes of the target network is larger than a threshold value, dividing the target network to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
For example, if the number of nodes in the target network is 120 and is greater than the threshold 60, the set of nodes in the target network cannot be regarded as the first result, and the target network needs to be further divided. And dividing the target network into a plurality of community networks by adopting a community discovery algorithm, wherein the division has close transaction behaviors among a plurality of nodes in one community network. The community discovery algorithm can adopt one of a Louvain algorithm, a label propagation algorithm, a Kernighan-Lin algorithm, a spectrum dichotomy algorithm, a GN algorithm and a Newman algorithm. In this embodiment, a GN algorithm is adopted to divide a target network, and the specific steps include:
(1) calculating the edge betweenness of each edge in the target network;
(2) deleting the side with the largest edge betweenness;
(3) recalculating edge betweenness of the residual edges in the target network;
(4) calculating the modularity of each community network, and ending if the modularity meets a threshold value
Dividing; and (4) if the modularity does not meet the threshold, repeating the steps (2) to (4) until the modularity meets the threshold requirement.
In one example, the second set condition is provided in plurality, and when a node in the community network satisfies any one of the second set conditions, it is determined that all nodes in the community network form the second target set. The second setting condition is set by professional knowledge in the field or business experts based on experience, taking electronic bank fund transaction as an example, and the second setting condition is as follows:
(1) if the transaction time of the node user is abnormal, for example, the proportion of the transaction times of 0-6 points in the morning of the transaction time of the node to the total transaction times is more than 0.6, the node is proved to have abnormal transaction behaviors;
(2) the trading behavior of the node user has the characteristic of fast forward and fast forward, for example, the node transfers funds of over 1000 ten thousand in 1 hour, the transferred funds are transferred to different account numbers in 2 hours, and the node meeting the condition has abnormal trading behavior;
(3) the transaction behavior of the node user has the characteristic of inconsistent daily consumption, such as the fact that the fund source of the node is inconsistent with the age and occupation of the node, and the node has abnormal transaction behavior;
besides, the second setting condition may be set to other contents, for example, more than 10 funds are set to be credited within one hour according to the characteristics of short time and high frequency, and the contents of the second setting condition are not particularly limited by the present invention.
And if the nodes meeting the first set condition exist in the community network, all the nodes in the community network form a second target set.
The first target set and the second target set are both group results with abnormal transaction behaviors, nodes in the first target set have the risk of group committing, and nodes in the second target set have the risk of group committing, so that the first target set and the second target set are target results obtained by mining finally.
In the scheme of the invention, the users meeting the first setting condition are marked by setting the first setting condition, so that the number of the user marks meeting the money laundering characteristic is increased, and an auxiliary effect is provided for the analysis and mining of subsequent group result. In the existing scheme, users with abnormal trading behaviors are labeled only by depending on experience of service experts, and the labeled users are small in quantity, so that the final output group result recall rate is low, namely the accuracy rate is low. The invention constructs a transaction network according to the transaction behavior data of the users and the transaction behaviors among the users, and analyzes and digs a group result, namely a target result, which accords with the characteristics of a group work plan from the transaction network by utilizing the relevance between nodes in the transaction network. In the prior art, only a service expert is relied on to analyze a user, so that on one hand, the efficiency of service expert analysis is low, and the method is not suitable for scenes with huge transaction behaviors; on the other hand, the business experts can only identify that a single user has abnormal transaction behavior, whether the user with the abnormal transaction has other partnerships or not is unknown by the business experts, and therefore, greater risks exist for monitoring in the field of electronic banking. By adopting the scheme of the invention, as the transaction network is divided into the transaction networks in a continuously refined manner, the network processing modes of different division results are different, so that the analysis accuracy of the target result is improved to a certain extent, and the analysis efficiency of the target result is also improved.
As shown in fig. 5, another aspect of the present invention provides an apparatus for analyzing data, the apparatus including:
the acquisition module 201 is used for acquiring transaction behavior data of a plurality of users;
the marking module 202 is used for determining users meeting a first set condition from the transaction behavior data;
a building module 203, configured to build a transaction network according to a transaction behavior between the users, where a node of the transaction network is the user, and an edge in the transaction network, which is used to connect two nodes, indicates that a transaction behavior exists between the two nodes;
a partitioning module 204, configured to partition the transaction network into a plurality of sub-networks;
a first determining module 205, configured to determine that a sub-network having nodes meeting a first set condition exists as a target network, and if the number of nodes in the target network is less than a threshold, determine that all nodes in the target network form a first target set;
a dividing module 206, configured to divide a sub-network where no node meeting a first set condition exists to obtain multiple community networks;
a second determining module 207, configured to determine that all nodes in the community network form a second target set if a node meeting a second set condition exists in the community network.
In one example, the acquisition module 201 is further configured to:
processing the missing value of the transaction behavior data, and determining the missing proportion of the transaction behavior data;
if the missing proportion is larger than a proportion threshold value, deleting the transaction behavior data;
and if the missing proportion is smaller than a proportion threshold value, filling the missing transaction behavior data.
In one example, the first determining module 205 is further configured to:
if the number of the nodes of the target network is larger than a threshold value, dividing the target network to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
Yet another aspect of the present invention provides a computer-readable storage medium storing a computer program for executing the analysis method according to the present invention.
Yet another aspect of the present invention provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the analysis method.
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to the various embodiments of the present application described in the "exemplary methods" section of this specification, above.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present application described in the "exemplary methods" section above of this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (12)

1. A method of analyzing data, the method comprising:
collecting transaction behavior data of a plurality of users;
determining users meeting a first set condition according to the transaction behavior data;
establishing a trading network according to trading behaviors between the users, wherein a node of the trading network is the user, and an edge for connecting two nodes in the trading network represents that the trading behaviors exist between the two nodes;
segmenting the transaction network to obtain a plurality of sub-networks;
determining a sub-network with nodes meeting a first set condition as a target network, and if the number of the nodes of the target network is less than a threshold value, determining that all the nodes in the target network form a first target set;
dividing sub-networks without nodes meeting a first set condition to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
2. The analysis method of claim 1, wherein after collecting transaction behavior data of a plurality of users, the method further comprises: cleansing the transaction behavior data, comprising:
processing the missing value of the transaction behavior data, and determining the missing proportion of the transaction behavior data;
if the missing proportion is larger than a proportion threshold value, deleting the transaction behavior data;
and if the missing proportion is smaller than a proportion threshold value, filling missing values in the transaction behavior data.
3. The analysis method according to claim 1, wherein the determining, from the transaction behavior data, the user who satisfies a first set condition comprises:
the first setting conditions are multiple, and the users with the transaction behavior data meeting all the first setting conditions are determined to be the users meeting the first setting conditions.
4. The analysis method of claim 1, wherein the partitioning the transaction network into a plurality of sub-networks comprises:
a plurality of nodes with transaction behaviors are divided into a sub-network.
5. An assay method according to claim 1 or 3, wherein the method further comprises:
if the number of the nodes of the target network is larger than a threshold value, dividing the target network to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
6. The analysis method according to claim 5, wherein the second setting condition is plural, and the user whose transaction behavior data satisfies at least one of the second setting conditions is determined to be the user who satisfies the second setting condition.
7. The analysis method of claim 1, wherein the dividing the sub-network into a plurality of community networks comprises:
and calculating the similarity among the nodes in the sub-network, and dividing the nodes with the similarity meeting a threshold value into a community network.
8. An apparatus for analyzing data, the apparatus comprising:
the acquisition module is used for acquiring transaction behavior data of a plurality of users;
the marking module is used for determining users meeting first set conditions according to the transaction behavior data;
the construction module is used for constructing a transaction network according to transaction behaviors among the users, wherein nodes of the transaction network are the users, and edges for connecting two nodes in the transaction network represent that the transaction behaviors exist between the two nodes;
the segmentation module is used for segmenting the transaction network to obtain a plurality of sub-networks;
the first determining module is used for determining that a sub-network with nodes meeting a first set condition exists as a target network, and if the number of the nodes of the target network is smaller than a threshold value, determining that all the nodes in the target network form a first target set;
the dividing module is used for dividing the sub-networks without the nodes meeting the first set condition to obtain a plurality of community networks;
and the second determining module is used for determining that all the nodes in the community network form a second target set if the nodes meeting a second set condition exist in the community network.
9. The apparatus of claim 8, wherein the acquisition module is further configured to:
processing the missing value of the transaction behavior data, and determining the missing proportion of the transaction behavior data;
if the missing proportion is larger than a proportion threshold value, deleting the transaction behavior data;
and if the missing proportion is smaller than a proportion threshold value, filling the transaction behavior data.
10. The apparatus of claim 8, wherein: the first determination module is further to:
if the number of the nodes of the target network is larger than a threshold value, dividing the target network to obtain a plurality of community networks;
and if the nodes meeting a second set condition exist in the community network, determining that all the nodes in the community network form a second target set.
11. A computer-readable storage medium, which stores a computer program for executing the analysis method according to any one of the preceding claims 1 to 7.
12. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the analysis method of any one of claims 1 to 7.
CN202111460485.9A 2021-12-03 2021-12-03 Data analysis method and device, storage medium and electronic equipment Active CN113870021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111460485.9A CN113870021B (en) 2021-12-03 2021-12-03 Data analysis method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111460485.9A CN113870021B (en) 2021-12-03 2021-12-03 Data analysis method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113870021A true CN113870021A (en) 2021-12-31
CN113870021B CN113870021B (en) 2022-03-08

Family

ID=78985586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111460485.9A Active CN113870021B (en) 2021-12-03 2021-12-03 Data analysis method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113870021B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051287A (en) * 2023-03-28 2023-05-02 北京芯盾时代科技有限公司 Data analysis method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN110209660A (en) * 2019-06-10 2019-09-06 北京阿尔山金融科技有限公司 Cheat clique's method for digging, device and electronic equipment
CN111259952A (en) * 2020-01-14 2020-06-09 中国平安财产保险股份有限公司 Abnormal user identification method and device, computer equipment and storage medium
CN111666501A (en) * 2020-06-30 2020-09-15 腾讯科技(深圳)有限公司 Abnormal community identification method and device, computer equipment and storage medium
CN111738817A (en) * 2020-05-15 2020-10-02 苏宁金融科技(南京)有限公司 Method and system for identifying risk community
US20200334779A1 (en) * 2018-05-04 2020-10-22 Alibaba Group Holding Limited Fraud gang identification method and device
CN112100452A (en) * 2020-09-17 2020-12-18 京东数字科技控股股份有限公司 Data processing method, device, equipment and computer readable storage medium
CN112380531A (en) * 2020-11-11 2021-02-19 平安科技(深圳)有限公司 Black product group partner identification method, device, equipment and storage medium
CN112926990A (en) * 2021-03-25 2021-06-08 支付宝(杭州)信息技术有限公司 Method and device for fraud identification
CN113641827A (en) * 2021-06-29 2021-11-12 武汉众智数字技术有限公司 Phishing network identification method and system based on knowledge graph

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200334779A1 (en) * 2018-05-04 2020-10-22 Alibaba Group Holding Limited Fraud gang identification method and device
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN110209660A (en) * 2019-06-10 2019-09-06 北京阿尔山金融科技有限公司 Cheat clique's method for digging, device and electronic equipment
CN111259952A (en) * 2020-01-14 2020-06-09 中国平安财产保险股份有限公司 Abnormal user identification method and device, computer equipment and storage medium
CN111738817A (en) * 2020-05-15 2020-10-02 苏宁金融科技(南京)有限公司 Method and system for identifying risk community
CN111666501A (en) * 2020-06-30 2020-09-15 腾讯科技(深圳)有限公司 Abnormal community identification method and device, computer equipment and storage medium
CN112100452A (en) * 2020-09-17 2020-12-18 京东数字科技控股股份有限公司 Data processing method, device, equipment and computer readable storage medium
CN112380531A (en) * 2020-11-11 2021-02-19 平安科技(深圳)有限公司 Black product group partner identification method, device, equipment and storage medium
CN112926990A (en) * 2021-03-25 2021-06-08 支付宝(杭州)信息技术有限公司 Method and device for fraud identification
CN113641827A (en) * 2021-06-29 2021-11-12 武汉众智数字技术有限公司 Phishing network identification method and system based on knowledge graph

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051287A (en) * 2023-03-28 2023-05-02 北京芯盾时代科技有限公司 Data analysis method and device, electronic equipment and storage medium
CN116051287B (en) * 2023-03-28 2023-08-29 北京芯盾时代科技有限公司 Data analysis method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113870021B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN108009915B (en) Marking method and related device for fraudulent user community
US20160364794A1 (en) Scoring transactional fraud using features of transaction payment relationship graphs
US20210182859A1 (en) System And Method For Modifying An Existing Anti-Money Laundering Rule By Reducing False Alerts
CN110009365B (en) User group detection method, device and equipment for abnormally transferring electronic assets
CN113221104B (en) Detection method of abnormal behavior of user and training method of user behavior reconstruction model
CN113205402A (en) Account checking method and device, electronic equipment and computer readable medium
CN108595579A (en) Contact person's cohesion evaluation method, device, computer equipment and storage medium
CN112581270A (en) Risk account identification method and device, electronic equipment and storage medium
CN111798304A (en) Risk loan determination method and device, electronic equipment and storage medium
CN113870021B (en) Data analysis method and device, storage medium and electronic equipment
CN114782051A (en) Ether phishing account detection device and method based on multi-feature learning
CN109919608B (en) Identification method, device and server for high-risk transaction main body
CN112950359B (en) User identification method and device
CN112950290A (en) Mining method and device for economic dependence clients, storage medium and electronic equipment
CN111428092B (en) Bank accurate marketing method based on graph model
CN111105238A (en) Transaction risk control method and device
US20170206596A1 (en) Value at Risk Models for AML Compliance and Due Diligence
CN112581281A (en) Product recommendation method and device, storage medium and electronic equipment
CN115147117A (en) Method, device and equipment for identifying account group with abnormal resource use
CN113837874B (en) Data identification method and device, storage medium and electronic equipment
CN111369370A (en) Estimation table processing method, device, server and storage medium
CN112468556A (en) Service product information pushing method and device, computer equipment and medium
CN111160916A (en) Risk transaction identification method and device
Kang Fraud Detection in Mobile Money Transactions Using Machine Learning
US20220148031A1 (en) Tri-party process flow for control trials analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant