CN112347425B - Method and system for dense subgraph detection based on time sequence - Google Patents
Method and system for dense subgraph detection based on time sequence Download PDFInfo
- Publication number
- CN112347425B CN112347425B CN202110026174.5A CN202110026174A CN112347425B CN 112347425 B CN112347425 B CN 112347425B CN 202110026174 A CN202110026174 A CN 202110026174A CN 112347425 B CN112347425 B CN 112347425B
- Authority
- CN
- China
- Prior art keywords
- transaction
- time
- user
- matrix
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Accounting & Taxation (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Finance (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Algebra (AREA)
- Technology Law (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to a method and a system for dense subgraph detection based on time series, wherein the method for dense subgraph detection based on time series comprises the following steps: constructing a three-dimensional transaction matrix, a time set and a user set; then, calculating according to the three-dimensional transaction matrix to obtain a time-dimension transaction sum matrix and a user-dimension transaction sum matrix, and summing the transaction sums to obtain a first sum value; then averaging the first summation value through a time set and a user set to obtain an initial dense subgraph abnormal value; and finally, iteratively calculating and updating the abnormal value of the intensive subgraph by a greedy algorithm, recording the obtained maximum abnormal value of the intensive subgraph, and outputting a time set and a user set corresponding to the maximum abnormal value of the intensive subgraph as detection results. The method solves the problems of low accuracy and long calculation time consumption of fund transaction abnormal dense subgraph detection in the prior art, and improves the abnormal detection precision and the calculation speed.
Description
Technical Field
The present application relates to the field of computers, and more particularly, to a method and system for dense subgraph detection based on time series.
Background
With the rapid development of information technology, people increasingly use electronic banks to perform fund operation conveniently and rapidly, however, various phishing, telephone fraud, short message fraud and the like are generated, these various forms of fund fraud behaviors are becoming more and more serious, and based on these problems, the behavior of how to detect abnormal fund transaction of users such as banks, finance and the like has been more and more aroused by people.
In the related art, the fund transaction anomaly detection algorithms are all based on users and commodities and anomaly consideration between the users, such as algorithms like Fraudar, Dspot and FlowScope, and the algorithms cannot detect and obtain feedback results of fund anomaly transactions in real time, and the problems of large calculation amount and long time consumption exist.
At present, no effective solution is provided for the problems of low accuracy rate of abnormal and dense subgraph detection of fund transactions and long time consumption of calculation in the related technology.
Disclosure of Invention
The embodiment of the application provides a time-series-based dense subgraph detection method and a time-series-based dense subgraph detection system, which at least solve the problems of low accuracy and long calculation time consumption of abnormal fund transaction dense subgraph detection in the related technology.
In a first aspect, an embodiment of the present application provides a method for dense subgraph detection based on a time series, where the method includes:
constructing a three-dimensional transaction matrix, a time set and a user set, wherein the dimensionality of the three-dimensional transaction matrix comprises the following steps: transaction time and user, the elements of the three-dimensional transaction matrix comprising: a transaction amount;
calculating to obtain a time-dimension transaction sum matrix and a user-dimension transaction sum matrix according to the three-dimensional transaction matrix, and summing the transaction sums in the elements to obtain a first sum value;
averaging the first summation value through the time set and the user set to obtain an initial dense subgraph abnormal value;
and iteratively calculating and updating the dense subgraph abnormal value through a greedy algorithm, recording the obtained maximum dense subgraph abnormal value, and outputting a time set and a user set corresponding to the maximum dense subgraph abnormal value as detection results.
In some of these embodiments, the summing of the transaction amounts in the elements results in a first summed value:
Wherein the content of the first and second substances,is a set of transaction times that are,is a set of users who transfer money,is a collection of users that are to be paid,is a three-dimensional matrix of transactions and,is the empirical anomaly score.
In some of these embodiments, the empirical anomaly score comprises:
the experience anomaly score is related to the transaction time and the user, wherein the transaction is anomalous within a preset time, the experience anomaly score is greater than 0, and when the user is a white list user, the experience anomaly score is less than 0.
In some embodiments, the averaging of the first summation values by the time set and the user set results in an initial dense subgraph outlier:
Wherein the content of the first and second substances,is a set of transaction times that are,is a set of users who transfer money,is a collection of users that are to be paid,is the first summation value.
In some embodiments, the iteratively calculating and updating the dense subgraph outliers by a greedy algorithm comprises:
acquiring a minimum element in the time-dimension transaction sum matrix and the user-dimension transaction sum matrix to obtain a minimum value;
screening out the corresponding elements of the minimum elements in the time set or the user set to obtain a new time set or a new user set;
calculating to obtain a second summation value through the minimum value, and averaging the second summation value through the new time set and the user set to obtain an updated dense subgraph abnormal value;
and recalculating the three-dimensional transaction matrix, and recalculating the new time-dimension transaction sum matrix and the user-dimension transaction sum matrix through the new three-dimensional transaction matrix.
In some of these embodiments, after recalculating the transaction sum matrix for the new time dimension and the transaction sum matrix for the user dimension, the method includes:
judging whether an empty set exists in the transaction sum matrix of the new time dimension and the transaction sum matrix of the user dimension;
in the case where there is an empty set, the iterative computation terminates.
In some embodiments, after obtaining the transaction amount sum matrix in the time dimension and the transaction amount sum matrix in the user dimension, the method includes:
and constructing an N-branch tree of the transaction sum matrix of the time dimension and the transaction sum matrix of the user dimension.
In a second aspect, an embodiment of the present application provides a system for dense subgraph detection based on time series, where the system includes:
the construction module is used for constructing a three-dimensional transaction matrix, a time set and a user set, wherein the dimensionality of the three-dimensional transaction matrix comprises the following steps: transaction time and user, the elements of the three-dimensional transaction matrix comprising: a transaction amount;
a calculation module for calculating a time dimension transaction sum matrix and a user dimension transaction sum matrix according to the three-dimensional transaction matrix, and summing the transaction amounts in the elements to obtain a first sum value,
averaging the first summation value through the time set and the user set to obtain an initial dense subgraph abnormal value;
and the output module is used for iteratively calculating and updating the dense subgraph abnormal value through a greedy algorithm, recording the obtained maximum dense subgraph abnormal value, and outputting a time set and a user set corresponding to the maximum dense subgraph abnormal value as detection results.
In a third aspect, an embodiment of the present application provides an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the methods for dense subgraph detection based on time series.
In a fourth aspect, the present application provides a storage medium, where a computer program is stored, where the computer program is configured to execute any one of the methods for dense subgraph detection based on time series described above when running.
Compared with the related technology, the dense subgraph detection method based on the time series provided by the embodiment of the application constructs a three-dimensional transaction matrix, a time set and a user set based on transaction data, wherein the dimensionality of the three-dimensional transaction matrix comprises the following steps: transaction time and user, elements of the three-dimensional transaction matrix include: a transaction amount; then, calculating according to the three-dimensional transaction matrix to obtain a time-dimension transaction sum matrix and a user-dimension transaction sum matrix, and summing the transaction sums in the elements to obtain a first sum value; then averaging the first summation value through a time set and a user set to obtain an initial dense subgraph abnormal value; and finally, iteratively calculating and updating the abnormal value of the intensive subgraph by a greedy algorithm, recording the obtained maximum abnormal value of the intensive subgraph, and outputting a time set and a user set corresponding to the maximum abnormal value of the intensive subgraph as a detection result, so that the problems of low detection accuracy and long calculation time consumption of the abnormal intensive subgraph in the fund transaction in the prior art are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a time-series based dense subgraph detection method according to an embodiment of the application;
FIG. 2 is a block diagram of a dense subgraph detection system based on time series according to an embodiment of the application;
FIG. 3 is a schematic diagram of a screening matrix element according to an embodiment of the present application;
fig. 4 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The dense subgraph detection method based on the time sequence can be applied to anti-fraud scenes of fund transactions such as banks, finance and the like, and the specific implementation scheme is as follows: constructing a three-dimensional transaction matrix, a time set and a user set based on transaction data, wherein the dimensionality of the three-dimensional transaction matrix comprises: transaction time and user, elements of the three-dimensional transaction matrix include: a transaction amount; then, calculating according to the three-dimensional transaction matrix to obtain a time-dimension transaction sum matrix and a user-dimension transaction sum matrix, and summing the transaction sums in the elements to obtain a first sum value; then, averaging the first summation value through the constructed time set and the user set to obtain an initial dense subgraph abnormal value; and finally, iteratively calculating and updating the abnormal value of the intensive subgraph by a greedy algorithm, recording the abnormal value of the maximum intensive subgraph, outputting a time set and a user set corresponding to the abnormal value of the maximum intensive subgraph as a detection result to obtain an intensive group with abnormal fund transaction, solving the problems of low detection accuracy and long calculation time consumption of the intensive subgraph with abnormal fund transaction in the prior art, automatically segmenting the time dimension by the algorithm when selecting the characteristics, automatically screening abnormal information, more accurately and conveniently detecting the abnormal dimension, improving the accuracy of the analysis result, reducing the financial risk, and quickly calculating a large amount of data on complex linear time by the greedy algorithm to effectively improve the calculation speed.
The present embodiment provides a method for dense subgraph detection based on time series, and fig. 1 is a flowchart of a method for dense subgraph detection based on time series according to an embodiment of the present application, as shown in fig. 1, the flowchart includes the following steps:
step S101, a three-dimensional transaction matrix, a time set and a user set are constructed based on transaction data, wherein the dimensionality of the three-dimensional transaction matrix comprises the following steps: transaction time and user, elements of the three-dimensional transaction matrix include: transaction amount, optionally, the first dimension of the three-dimensional transaction matrix S is transaction time, tableIndicating when the user is conducting a transaction; the second dimension is the transfer user and the third dimension is the collection user, wherein the transaction amount is the corresponding value element in the matrix, as shown in table 1, 10 thousands of transfers are made from the transfer user 0 to the collection user 1, and then the transaction amount 10 is input at the corresponding position of the three-dimensional transaction matrix S. Optionally, the time set in this embodimentIndicating how many transaction times are contained in the current set, and the initial value is to contain all time values. For example, if there are two total transactions at 10:00 and 10:10, respectively, then(ii) a The user set includes, but is not limited to, a transfer user set and a collection of receiving users, and the transfer user setIndicating how many users the transfer user currently contains, and the initial value is to include all users, e.g. the transfer user has 0, 1, 2, 3, respectively, then(ii) a Collection of payee usersIndicating how many users the payee currently contains, and the initial value contains all users, e.g. the payee has 0, 1, 2, 3, respectively, then(ii) a Taking a bank transaction as an example, at 10:00, 0 user transfers 10 ten thousands, 9 ten thousands, 8 ten thousands yuan to 1, 2, 3 users respectively, at 10:10, user 2 transfers 2 ten thousands to user 3, as shown in table 1 below, a three-dimensional transaction matrix S with dimension (2,4,4) is constructed, wherein the first dimension 2 represents 2 transaction times of 10:00 and 10:10, the second and third dimensions 4 represent transfer users 0, 1, 2, 3 and collection users 0, 1, 2, 3 respectively, and the matrix S is as follows (1)Shown in the figure:
TABLE 1
Time | Transfer user | User of collection | Transfer amount |
10:00 | 0 | 1 | 10w |
10:00 | 0 | 2 | 9w |
10:00 | 0 | 3 | 8w |
10:10 | 2 | 3 | 2w |
Compared with the prior art that the fund transaction abnormity detection algorithm is based on users, commodities and abnormity consideration between the users, the feedback result of the fund abnormal transaction cannot be obtained through real-time detection, and the problems of large calculation amount and long consumed time exist, the time dimension is added in the embodiment, when the characteristics are selected, the time dimension is automatically segmented, the abnormal information is automatically screened out, and the detection accuracy of the abnormal group is improved;
step S102, a time-dimension transaction sum matrix and a user-dimension transaction sum matrix are obtained through calculation according to the three-dimensional transaction matrix, and transaction amounts in the three-dimensional transaction matrix are summed to obtain a first sum value. Optionally, the time dimension transaction sum matrix in this embodimentMeaning the sum of the amounts involved at different transaction times, e.g.,the sum of the transaction amounts related to the transaction time 0 and the transaction time 1 is respectively 27 ten thousand and 2 ten thousand; user dimension transaction sum matrix including, but not limited to, transfer user dimension transaction sum matrixTransaction sum matrix with payee dimensionWherein the transaction amount sum matrix of the transfer user dimensionIndicates the total amount of transaction money transferred by the different transfer users, for example,representing the sum of the money transferred by the transfer users 0, 1, 2 and 3 respectively; transaction amount sum matrix of payee dimensionsIndicating the total amount of the transaction received by the different payee users, e.g.,representing the total amount of the transaction received by the respective payee 0, 1, 2, 3; first sum valueRepresenting the sum of the transaction amounts involved in the time dimension, transfer user dimension and collection user dimension of the three-dimensional transaction matrix, resulting in a total transaction amount, e.g.,when the transaction time is 0 and 1, the transfer user 0 transfers 10 thousands, 9 thousands and 8 thousands of money to the collection users 1, 2 and 3 respectively, and the sum of transaction amounts of 2 thousands of money transferred from the transfer user 2 to the collection user 3 is the first sum value 29;
and step S103, averaging the first summation values through the time set and the user set to obtain an initial dense subgraph abnormal value. Optionally, in this embodiment, the first summation value obtained in step S102 is averaged through the total number of elements in the time set and the user set to obtain an initial dense subgraph abnormal value, for example, the time set obtained through calculation is obtainedUser collection of account transfersAnd collection of payee users2+4+4=10, on the first summation value obtained in step S102Averaging to obtain initial dense subgraph abnormal values;
Step S104, performing iterative computation and updating of the initial dense subgraph abnormal value through a greedy algorithm, recording the maximum dense subgraph abnormal value therein, and outputting a time set and a user set corresponding to the maximum dense subgraph abnormal value as a detection result, optionally, in the embodiment, the initial dense subgraph abnormal value is updated through the iterative computation of the greedy algorithm, wherein the iterative computation step includes: acquiring a minimum element in a time-dimension transaction sum matrix and a user-dimension transaction sum matrix to obtain a minimum value; then, screening out the corresponding elements of the minimum elements in the time set or the user set to obtain a new time set or a new user set; then, a second summation value is obtained through the minimum value calculation, and the second summation value is averaged through a new time set and a user set to obtain an updated dense subgraph abnormal value; and finally, reconstructing the three-dimensional transaction matrix, recalculating through the newly-established three-dimensional transaction matrix to obtain a new time-dimension transaction sum matrix and a user-dimension transaction sum matrix, and iteratively calculating the steps until one of the time-dimension transaction sum matrix and the user-dimension transaction sum matrix is an empty set. After the iterative computation is completed, recording a maximum dense subgraph abnormal value obtained in the iterative computation, wherein a time set and a user set corresponding to the maximum dense subgraph abnormal value are final abnormal dense groups, and high-risk abnormal operation exists;
in some embodiments, the specific iterative computation process for updating the initial dense subgraph abnormal value by greedy algorithm iterative computation is as follows: taking bank transaction as an example, a time-dimension transaction sum matrix is obtainedTransaction amount sum matrix of transfer user dimensionTransaction sum matrix with payee dimensionSince there is the same minimum element 0, the matrix is randomly selectedThe second element 0 in (1) is the total transaction amount transferred by the transfer user 1, corresponding to the user 1 in the transfer user set, and the minimum value is obtained;
Then, the corresponding element of the minimum element 0 in the transfer user set is screened out: user 1, get a new set of transfer usersThe change is not changed;
finally passes through the minimumCalculating to obtain a second summation valueAs shown in the following formula (5):
set of transit timesCollection of usersAnd a new set of transfer usersAveraging the second summation values to obtain updated dense subgraph abnormal valuesAs shown in the following formula (6):
wherein the content of the first and second substances,is the value of the second sum and,is a set of users who have new money transfers,is a collection of users that are to be paid,is a transaction time set;
calculating to obtain updated dense subgraph abnormal valuesThen, before the maximum dense subgraph abnormal value in the updated dense subgraph abnormal values is recorded, a three-dimensional transaction matrix S is reconstructed, and the maximum dense subgraph abnormal value in the matrix S is recordedAll rows in (1) are set to 0, i.e.And recalculating to obtain a time-dimension transaction sum matrix through the newly-established three-dimensional transaction matrix STransaction amount sum matrix of transfer user dimensionTransaction sum matrix with payee dimensionUsing the obtained new time dimension transaction sum matrixTransaction amount sum matrix of transfer user dimensionTransaction sum matrix with payee dimensionRepeating the iterative calculation process until the transaction sum matrix of the time dimensionTransaction amount sum matrix of transfer user dimensionTransaction sum matrix with payee dimensionUntil at least one of them is empty, e.g.;
After the iterative computation is completed, recording the maximum dense subgraph abnormal value obtained in the iterative computation, wherein the maximum dense subgraph abnormal value is determined by judging conditions, namely the updated dense subgraph abnormal value is obtained in the iterative computationComparison ofAndif the numerical value in betweenIs greater thanThen remain updatedFor example, comparing the initial dense subgraph abnormal values obtained in step S2And the new dense subgraph abnormal value obtained by updating in the step S5Due to the magnitude ofGreater than initiallyIs thus recordedAnd carrying out iterative comparison in such a way until the whole iterative computation is finished, recording a finally reserved value which is the maximum dense subgraph abnormal value, and recording a time set corresponding to the maximum dense subgraph abnormal valueUser collection of account transfersAnd collection of payee usersThe three obtained sets are final abnormal dense groups;
in this embodiment, a transaction sum matrix of time dimensions is iteratively screened through a greedy algorithmTransaction amount sum matrix of transfer user dimensionTransaction sum matrix with payee dimensionCalculating the minimum value of the data to obtain an updated dense subgraph abnormal value until the updated dense subgraph abnormal value is obtained、Andone of the three matrixes is an empty set, iteration is finished, and the maximum dense subgraph abnormal value obtained in iterative computation is recorded. Compared with the prior art that the subgraph is solved by violence, the time complexity isThe embodiment screens through a greedy algorithm、Andthe minimum value in the sub-graph is deleted and then the sub-graph is solved, and the complexity of the computation time isThe method has the advantages that the complexity of linear time is obviously reduced, the calculation time is greatly reduced, the calculation of a large amount of data is greatly advantageous, the calculation time and memory resources can be effectively saved, and the calculation speed is increased. In addition, the abnormal intensive group with high-risk operation can be effectively screened out through the iterative computation, so that the monitoring of abnormal users and abnormal operation by financial institutions such as banks and the like is facilitated, and the financial risk is reduced.
Through the steps S101 to S104, compared with the prior art, the fund transaction anomaly detection algorithm is based on the user and the commodity, and the anomaly consideration between the user and the user, and there are problems that the feedback result of the fund anomaly transaction cannot be detected in real time, the calculation amount is large, and the consumed time is long. The embodiment detects an abnormally-dense subgraph in a time dimension, and constructs a three-dimensional transaction matrix, a time set and a user set based on transaction data, wherein the dimension of the three-dimensional transaction matrix comprises: transaction time and user, elements of the three-dimensional transaction matrix include: a transaction amount; then, calculating according to the three-dimensional transaction matrix to obtain a time-dimension transaction sum matrix and a user-dimension transaction sum matrix, and summing the transaction sums in the elements to obtain a first sum value; then, averaging the first summation value through the constructed time set and the user set to obtain an initial dense subgraph abnormal value; and finally, iteratively calculating and updating the abnormal value of the intensive subgraph by a greedy algorithm, recording the obtained maximum abnormal value of the intensive subgraph, outputting a time set and a user set corresponding to the maximum abnormal value of the intensive subgraph as a detection result to obtain an intensive group with abnormal fund transaction, solving the problems of low detection accuracy and long calculation time consumption of the intensive subgraph with abnormal fund transaction in the prior art, automatically segmenting the time dimension by the algorithm when selecting the characteristics, automatically screening abnormal information, more accurately and conveniently detecting the abnormal dimension, improving the accuracy of the analysis result, reducing the financial risk, quickly calculating a large amount of data on complex linear time by the greedy algorithm, and effectively improving the calculation speed.
In some embodiments, all elements in the three-dimensional transaction matrix are summed to obtain a first summation valueAs shown in the following formula (2):
wherein the content of the first and second substances,is a set of transaction times that are,is a set of users who transfer money,is a collection of payee users, S is a three-dimensional transaction matrix,is an empirical anomaly score, optionally an empirical anomaly scoreAnd the transaction timeRelated to the user, wherein the transaction is abnormal within a preset time, and the experience abnormal score isIf the user is a white list user, the experience abnormal score is larger than 0Less than 0, the specific value being according toEmpirically, for example, if 0-6 transactions are at a greater risk, then the time periodA value greater than 0, wherein if the transaction spans a time period, such as transaction times of 10:00 and 1:00, then 1:00 is at a time period with a greater risk of transaction between 0 and 6 points, the first summation valueNeeds to add oneValues, e.g. user-definedAnd when the 10:00 is not in the time period with larger transaction risk of 0-6 points, the addition is not needed; for some users considered normal, such as white-listed users, the risk of white-listed users is lower, soCan be customized to be negative, e.g., if B1 is a white-listed user among users { A1, A2, A3, B1}, then the first summation value is calculatedWhen necessary, a negative one is addedValue, e.g.(ii) a Conversely, if it is a blacklisted user, it is self-defined because of the higher riskIs positive and the value can be set higher, e.g., B1 is a blacklisted user among users { A1, A2, A3, B1}, then the first one is calculatedSum valueWhen necessary, a positive one is addedValue, e.g.(ii) a Further, if black and white users exist among all users, such as users B1 and B2 among users { A1, A2, A3, B1, B2, C1} are white users and C1 is a black list user, the first sum value is calculatedWhen necessary to addHas a value ofAndthe sum of the values,
namely, it is(ii) a If the black-and-white list users exist in the users in the time period with the transaction time of 0-6 points and the transaction risk is large, the users in the black-and-white list exist,,All need to be customized according to the above rules, and sum is carried out to obtain the final productValue is added to the first summation valueIn the formula (2).
Taking the bank transaction mentioned in steps S101 and S102 as an example, summing all the elements in the three-dimensional transaction matrix S to obtain a first sum value
In some embodiments, the obtained first summation value is averaged through the time set and the user set to obtain a first dense subgraph abnormal valueAs shown in the following formula (3):
wherein the content of the first and second substances,is a set of transaction times that are,is a set of users who transfer money,is a collection of users that are to be paid,is a first summation value;
taking the bank transaction mentioned in steps S101 and S102 as an example, the first summation value obtained by averaging the total number of elements in the time set and the user set is used to obtain a first dense subgraph abnormal valueAs shown in the following formula (4):
wherein the numerator is a first summation valueAnd denominators are the total number 10 of elements of the time set element number 2, the transfer user set element number 4 and the receipt user set element number 4.
In some of these embodiments, after the time dimension transaction amount sum matrix and the user dimension transaction amount sum matrix are obtained, an N-ary tree of the time dimension transaction amount sum matrix and the user dimension transaction amount sum matrix is constructed. Optionally, a node and an N-ary tree corresponding to the node are constructed, when other nodes are removed, the node corresponding to the node is updated, the node corresponding to the minimum element can be found more conveniently and quickly by searching the matrix in the tree construction mode, the calculation efficiency is improved, and in addition, the minimum element in the matrix can also be found in a direct table lookup mode. Preferably, the matrix of transaction sums over a time dimensionConstructing a binary tree by using the matrix, and constructing a transaction sum matrix of the dimensions of the transfer usersConstructing a quadtree, when removedWhen the second element in the matrix is 0, updating the score corresponding to the node to obtain,And。
it should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment also provides a system for dense subgraph detection based on time series, which is used for implementing the above embodiments and preferred embodiments, and the description of the system is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 2 is a block diagram of a dense subgraph detection system based on time series according to an embodiment of the present application, and as shown in fig. 2, the system includes a construction module 21, a calculation module 22, and an output module 23:
a building module 21, configured to build a three-dimensional transaction matrix, a time set, and a user set based on transaction data, where dimensions of the three-dimensional transaction matrix include: transaction time and user, elements of the three-dimensional transaction matrix include: a transaction amount; the calculation module 22 is used for calculating a transaction sum matrix of a time dimension and a transaction sum matrix of a user dimension according to the three-dimensional transaction matrix, summing transaction sums in the elements to obtain a first sum value, and averaging the first sum value through the time set and the user set to obtain a first dense subgraph abnormal value; and the output module 23 is configured to iteratively calculate and update the dense subgraph abnormal value through a greedy algorithm, record the maximum dense subgraph abnormal value in the updated dense subgraph abnormal value, and output a time set and a user set corresponding to the maximum dense subgraph abnormal value as detection results.
Through the system, compared with the prior art that the fund transaction abnormity detection algorithm is based on users, commodities and abnormity consideration between the users, the feedback result of the fund abnormal transaction cannot be obtained through real-time detection, the problems of large calculation amount and long consumed time exist, the time dimension is added in the embodiment, when the characteristics are selected, the time dimension is automatically segmented, the abnormity information is automatically screened out, and the detection accuracy of the abnormity group is improved; the calculation module 22 calculates a time-dimension transaction sum matrix and a user-dimension transaction sum matrix according to the three-dimensional transaction matrix, sums the transaction amounts in the elements to obtain a first sum value, and averages the first sum value through a time set and a user set to obtain a first dense subgraph abnormal value; compared with the prior art in which subgraph is solved by violence, the output module 23 needs time complexity ofThis example screens through a greedy algorithm、Andthe minimum value in the sub-graph is deleted and then the sub-graph is solved, and the complexity of the computation time isThe method has the advantages that the complexity of linear time is obviously reduced, the calculation time is greatly reduced, the calculation of a large amount of data is greatly advantageous, the calculation time and memory resources can be effectively saved, and the calculation speed is increased. In addition, the abnormal intensive group with high-risk operation can be effectively screened out through the iterative computation, so that the monitoring of abnormal users and abnormal operation by financial institutions such as banks and the like is facilitated, and the financial risk is reduced.
The present invention will be described in detail with reference to the following application scenarios.
The invention aims to provide a method and a system for dense subgraph detection based on a time sequence, and the flow steps of the technical scheme for dense subgraph detection based on the time sequence in the embodiment comprise:
s1, taking bank transaction as an example, as shown in Table 1, a three-dimensional transaction matrix S and a time set are constructedUser collection of account transfersAnd collection of payee users;
S2, calculating the transaction amount according to the transaction time and the user in the three-dimensional transaction matrix to obtain a transaction amount sum matrix of time dimensionTransaction amount sum matrix of transfer user dimensionTransaction sum matrix with payee dimensionAnd summing all elements in the three-dimensional transaction matrix S to obtain a first summation valueThrough time aggregationUser collection of account transfersAnd collection of payee usersFor the first summation valueAveraging to obtain initial dense subgraph abnormal values;
S3, obtaining,,The smallest element, if there is the same smallest element, randomly selects one. SelectingSecond element 0 to obtain the minimum value;
S4, due toThe second element in the list corresponds to transfer user 1, so that the transfer users are collectedThe second element in the solution is screened out to obtain a new, Unchanged, FIG. 3 is based onThe schematic diagram of the screening matrix elements of the embodiment of the application is shown in fig. 3, and the transfer user setThe second element 1 in (a) is screened out, thus the three-dimensional transaction matrix S neutralizesAll elements concerned are absent and should be removed;
s5, passing through the minimum valueRecalculating to obtain a second sum averageThrough a new set of transfer usersConstant, and constantAveraging the second summation value to obtain an updated dense subgraph abnormal value
S6, recalculating the three-dimensional transaction matrix S and dividing the three-dimensional transaction matrix S into S matricesAll rows in (1) are set to 0, i.e.And then recalculated according to S3,,;
S7, judgment,,Whether or not there is an empty set, e.g.If not, continuing to execute S3, and if yes, executing S8;
s8, recording the maximum dense subgraph abnormal value obtained in the iterative computation after the iteration is finished, wherein the maximum dense subgraph abnormal value is determined by judging conditions, namely the updated dense subgraph abnormal value is obtained in the iterative computationComparison ofAndif the numerical value in betweenIs greater thanThen remain updatedE.g. comparing the initial dense obtained in the above step S2Subgraph outliersAnd the new dense subgraph abnormal value obtained by updating in the step S5The numerical value in between, becauseGreater than initiallyIs thus recordedThe iterative comparison is carried out until the iterative computation from S3 to S7 is finished, and the record retains the final value which is the maximum dense subgraph abnormal valueThat is, the final output risk score is obtained, wherein the transfer user set corresponding to the maximum dense subgraph abnormal valueSet of payee usersAnd time aggregationIs the exception group that is ultimately output.
Results in the above example:;;;(ii) a That is, user 0 given 1, 2, 3 transfers at time 0, i.e., 10, is an unusually dense group with a high risk score of 5.4.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
In addition, in combination with the method for dense subgraph detection based on time series in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any one of the methods of time-series based dense subgraph detection in the above embodiments.
In one embodiment, fig. 4 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 4, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 4. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of dense subgraph detection based on time series.
Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (9)
1. A method of dense subgraph detection based on time series, the method comprising:
constructing a three-dimensional transaction matrix, a time set and a user set, wherein the dimensionality of the three-dimensional transaction matrix comprises the following steps: transaction time and user, the elements of the three-dimensional transaction matrix comprising: a transaction amount;
calculating to obtain a time-dimension transaction sum matrix and a user-dimension transaction sum matrix according to the three-dimensional transaction matrix, and summing the transaction sums in the elements to obtain a first sum value:
Wherein the content of the first and second substances,is a set of transaction times that are,is a set of users who transfer money,is a collection of payee users, S is a three-dimensional transaction matrix,is the empirical anomaly score;
averaging the first summation value through the time set and the user set to obtain an initial dense subgraph abnormal value;
and iteratively calculating and updating the dense subgraph abnormal value through a greedy algorithm, recording the obtained maximum dense subgraph abnormal value, and outputting a time set and a user set corresponding to the maximum dense subgraph abnormal value as detection results.
2. The method of claim 1, wherein the empirical anomaly score comprises:
the experience anomaly score is related to the transaction time and the user, wherein the transaction is anomalous within a preset time, the experience anomaly score is greater than 0, and when the user is a white list user, the experience anomaly score is less than 0.
3. The method of claim 1, wherein the averaging of the first summation values over the time set and the user set results in an initial dense subgraph outlier:
4. The method of claim 1, wherein the iteratively computing and updating the dense subgraph outliers by a greedy algorithm comprises:
acquiring a minimum element in the time-dimension transaction sum matrix and the user-dimension transaction sum matrix to obtain a minimum value;
screening out the corresponding elements of the minimum elements in the time set or the user set to obtain a new time set or a new user set;
calculating to obtain a second summation value through the minimum value, and averaging the second summation value through the new time set and the user set to obtain an updated dense subgraph abnormal value;
and recalculating the three-dimensional transaction matrix, and recalculating the new time-dimension transaction sum matrix and the user-dimension transaction sum matrix through the new three-dimensional transaction matrix.
5. The method of claim 4, wherein after recalculating the transaction sum matrix for the new time dimension and the transaction sum matrix for the user dimension, the method comprises:
judging whether an empty set exists in the transaction sum matrix of the new time dimension and the transaction sum matrix of the user dimension;
in the case where there is an empty set, the iterative computation terminates.
6. The method of claim 1, wherein after obtaining the transaction sum matrix in the time dimension and the transaction sum matrix in the user dimension, the method comprises:
and constructing an N-branch tree of the transaction sum matrix of the time dimension and the transaction sum matrix of the user dimension.
7. A system for dense subgraph detection based on time series, the system comprising:
the construction module is used for constructing a three-dimensional transaction matrix, a time set and a user set, wherein the dimensionality of the three-dimensional transaction matrix comprises the following steps: transaction time and user, the elements of the three-dimensional transaction matrix comprising: a transaction amount;
a calculation module for calculating to obtain a time dimension transaction sum matrix and a user dimension transaction sum matrix according to the three-dimensional transaction matrix, and summing the transaction sums in the elements to obtain a first sum value:
Wherein the content of the first and second substances,is a set of transaction times that are,is a set of users who transfer money,is a collection of payee users, S is a three-dimensional transaction matrix,is the value of the empirical anomaly score,
averaging the first summation value through the time set and the user set to obtain an initial dense subgraph abnormal value;
and the output module is used for iteratively calculating and updating the dense subgraph abnormal value through a greedy algorithm, recording the obtained maximum dense subgraph abnormal value, and outputting a time set and a user set corresponding to the maximum dense subgraph abnormal value as detection results.
8. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method for time-series based dense subgraph detection according to any one of claims 1 to 6.
9. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method for time-series based dense subgraph detection according to any one of claims 1 to 6 when running.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110026174.5A CN112347425B (en) | 2021-01-08 | 2021-01-08 | Method and system for dense subgraph detection based on time sequence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110026174.5A CN112347425B (en) | 2021-01-08 | 2021-01-08 | Method and system for dense subgraph detection based on time sequence |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112347425A CN112347425A (en) | 2021-02-09 |
CN112347425B true CN112347425B (en) | 2021-05-28 |
Family
ID=74427882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110026174.5A Active CN112347425B (en) | 2021-01-08 | 2021-01-08 | Method and system for dense subgraph detection based on time sequence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347425B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113935574B (en) * | 2021-09-07 | 2023-09-29 | 中金支付有限公司 | Abnormal transaction monitoring method, device, computer equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100813008B1 (en) * | 2006-12-06 | 2008-03-13 | 한국전자통신연구원 | Apparatus and method for predicting gene modules using gene expression data and transcription factor binding information |
CN107194184B (en) * | 2017-05-31 | 2020-11-17 | 成都数联易康科技有限公司 | Method and system for detecting abnormality of people in hospital based on time sequence similarity analysis |
EP3701532A1 (en) * | 2017-10-27 | 2020-09-02 | King Abdullah University Of Science And Technology | A graph-based constant-column biclustering device and method for mining growth phenotype data |
CN108090836A (en) * | 2018-01-30 | 2018-05-29 | 南京信息工程大学 | Based on the equity investment method for weighting intensive connection convolutional neural networks deep learning |
US11159568B2 (en) * | 2018-06-21 | 2021-10-26 | Microsoft Technology Licensing, Llc | Account management using account activity usage restrictions |
-
2021
- 2021-01-08 CN CN202110026174.5A patent/CN112347425B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112347425A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dou et al. | Enhancing graph neural network-based fraud detectors against camouflaged fraudsters | |
US20140279745A1 (en) | Classification based on prediction of accuracy of multiple data models | |
CN112633426B (en) | Method and device for processing data class imbalance, electronic equipment and storage medium | |
CN110929047A (en) | Knowledge graph reasoning method and device concerning neighbor entities | |
CN112214499B (en) | Graph data processing method and device, computer equipment and storage medium | |
CN114187112A (en) | Training method of account risk model and determination method of risk user group | |
CN112541575B (en) | Method and device for training graph neural network | |
CN111090780A (en) | Method and device for determining suspicious transaction information, storage medium and electronic equipment | |
CN112347425B (en) | Method and system for dense subgraph detection based on time sequence | |
CN114638704A (en) | Illegal fund transfer identification method and device, electronic equipment and storage medium | |
CN111611532A (en) | Character relation completion method and device and electronic equipment | |
CN111738356A (en) | Object feature generation method, device, equipment and storage medium for specific data | |
CN112163929B (en) | Service recommendation method, device, computer equipment and storage medium | |
CN114881158A (en) | Defect value filling method and device based on random forest and computer equipment | |
CN115174129A (en) | Abnormal node detection method and device, computer equipment and storage medium | |
CN112650741A (en) | Abnormal data identification and correction method, system, equipment and readable storage medium | |
CN111444010B (en) | Consensus method based on computing resource computing power certification | |
Xu et al. | NC-GNN: Consistent neighbors of nodes help more in graph neural networks | |
CN112380494B (en) | Method and device for determining object characteristics | |
CN117436882A (en) | Abnormal transaction identification method, device, computer equipment and storage medium | |
CN116664190A (en) | Electronic coupon recommendation method, electronic coupon recommendation device, computer equipment and storage medium | |
CN117319488A (en) | Message pushing method, device, computer equipment and storage medium | |
CN117708151A (en) | Data processing method and device and computer equipment | |
CN117933414A (en) | Federal learning method for meeting Bayesian court nodes | |
CN115982398A (en) | Graph structure data processing method, system, computer device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210918 Address after: Room 209, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Patentee after: TONGDUN TECHNOLOGY Co.,Ltd. Address before: Room 704, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Patentee before: TONGDUN HOLDINGS Co.,Ltd. |
|
TR01 | Transfer of patent right |