CN115062163A - Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium - Google Patents

Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium Download PDF

Info

Publication number
CN115062163A
CN115062163A CN202210732268.9A CN202210732268A CN115062163A CN 115062163 A CN115062163 A CN 115062163A CN 202210732268 A CN202210732268 A CN 202210732268A CN 115062163 A CN115062163 A CN 115062163A
Authority
CN
China
Prior art keywords
abnormal
transaction
nodes
degree
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210732268.9A
Other languages
Chinese (zh)
Inventor
李伟玲
黄龙华
罗鹏飞
黄文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210732268.9A priority Critical patent/CN115062163A/en
Publication of CN115062163A publication Critical patent/CN115062163A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method, a device, an electronic device and a medium for identifying abnormal tissues. The method and the device can be used in the technical field of artificial intelligence. The method for identifying abnormal tissue comprises the following steps: constructing a knowledge graph according to the acquired transaction data of the historical time period, wherein the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are constructed according to the account information, and edges between the nodes are constructed according to the transaction information; dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities; determining abnormal transaction links from m communities of the knowledge graph according to the abnormal account information, wherein the number of the communities where nodes in the abnormal transaction links are located is less than or equal to 2, and the abnormal account information is acquired based on a preset rule; acquiring data characteristics of each node of the abnormal transaction link, and calculating the average abnormal degree of each community according to the data characteristics; and determining a community with the average abnormality degree meeting a set threshold as an abnormal organization.

Description

Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, an electronic device, and a medium for identifying abnormal tissue.
Background
With the development of network technology, the illegal transaction behavior of identity disguising through a technical means is not much different from the normal transaction behavior of common customers, and the confidentiality is extremely strong and is difficult to discover. In the prior art, a method can summarize a plurality of sets of rules through historical illicit data, and identify the illicit act according to the rules. Another method may be to screen transactions of a transaction network manually by analyzing the flow of funds to the relationship, by providing specific account information from a financial institution.
Disclosure of Invention
In view of the above, the present disclosure provides a method, an apparatus, an electronic device, and a computer-readable storage medium for identifying an abnormal organization comprehensively, accurately, efficiently, and saving resources.
One aspect of the present disclosure provides a method for identifying an abnormal tissue, including: establishing a knowledge graph according to acquired transaction data of historical time intervals, wherein the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are established according to the account information, and edges between the nodes are established according to the transaction information; dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, wherein m is an integer greater than or equal to 1; determining abnormal transaction links from m communities of the knowledge graph according to abnormal account information, wherein the community number of nodes in the abnormal transaction links is less than or equal to 2, and the abnormal account information is obtained based on preset rules; acquiring data characteristics of each node of the abnormal transaction link, and calculating the average abnormal degree of each community according to the data characteristics; and determining a community with the average abnormality degree meeting a set threshold as an abnormal organization.
According to the identification method of abnormal organization of the embodiment of the disclosure, based on the knowledge graph, by determining the abnormal transaction link from m communities, the data characteristics of each node of the abnormal transaction link are obtained, calculating the average abnormality degree of each community according to the data characteristics, determining the communities with the average abnormality degree meeting a set threshold as abnormal organizations, so that the abnormal transaction links can be easily mined from massive transaction data, the abnormal organization is further identified, the identification method disclosed by the invention is comprehensive in coverage, the abnormal organization with strong secrecy can be efficiently and accurately found, the number of communities where the nodes in the abnormal transaction link are located is less than or equal to 2, namely the abnormal transaction link determination method does not allow the communities to be crossed, association invalid groups can be prevented, the accuracy of identifying the abnormal organization is further improved, and meanwhile, the computing resources are saved.
In some embodiments, the dividing the nodes of the knowledge-graph according to the association degree of the nodes to generate m communities includes: determining each node in the knowledge-graph as a group; traversing each group, and determining the intimacy between the group and each group having an edge relationship with the group; when the intimacy degree meets the intimacy degree threshold value, merging the group and the group with the edge relation according to the intimacy degree; taking the merged group as a new group, repeatedly executing the traversal of each group, and determining the intimacy between the group and each group with edge relation; when the intimacy degree does not meet the intimacy degree threshold value, stopping merging the group and the group with the edge relation; and when the intimacy between every two groups does not meet the intimacy threshold, taking the current m groups as m communities.
In some embodiments, said merging the group and the group having an edge relationship with the group according to the affinity comprises: sorting the intimacy degree according to the numerical value; and merging the two groups with the first or last-but-one intimacy degree sequence according to the sequencing result.
In some embodiments, the determining anomalous transaction links from the m communities of knowledge-graphs based on anomalous account information comprises: determining a first abnormal node in the knowledge graph according to the preset rule; determining a directed connected link where the first abnormal node in the knowledge graph is located according to the first abnormal node, wherein the directed connected link is formed by connecting nodes through directed edges; and taking at least part of links formed by nodes existing in two adjacent communities of the directional communication links as abnormal transaction links, wherein the abnormal transaction links comprise the first abnormal nodes.
In some embodiments, the obtaining the data characteristic of each node of the abnormal transaction link includes: determining a node attribute for each node of the anomalous traffic link; and acquiring data characteristics according to the node attributes.
In some embodiments, the calculating an average degree of abnormality for each community from the data features includes: constructing a feature vector according to the data features; determining a point abnormality degree according to the feature vector; and calculating the average abnormality degree of each community according to the point abnormality degree.
In some embodiments, said determining point outliers from said feature vectors comprises: calculating Euclidean distance according to the feature vector and a preset standard vector, wherein the Euclidean distance is used for measuring the similarity between the feature vector and the standard vector; and determining a point anomaly based on the Euclidean distance, wherein the point anomaly is proportional to the similarity.
In some embodiments, the calculating an average degree of abnormality for each community based on the point degree of abnormality comprises: calculating an average value of the point abnormality degrees of the nodes in the abnormal transaction links included in each community; and taking the average value as the average abnormality degree of the community.
Another aspect of the present disclosure provides an apparatus for identifying abnormal tissue, including: the construction module is used for constructing a knowledge graph according to the acquired transaction data of the historical time period, the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are constructed according to the account information, and edges between the nodes are constructed according to the transaction information; the generation module is used for dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, wherein m is an integer greater than or equal to 1; the first determination module is used for determining abnormal transaction links from m communities of the knowledge graph according to abnormal account information, the number of communities where nodes in the abnormal transaction links are located is less than or equal to 2, and the abnormal account information is obtained based on a preset rule; the calculation module is used for acquiring the data characteristics of each node of the abnormal transaction link and calculating the average abnormal degree of each community according to the data characteristics; and a second determination module for performing determination of a community for which the average degree of abnormality satisfies a set threshold as an abnormal organization.
Another aspect of the present disclosure provides an electronic device comprising one or more processors and one or more memories, wherein the memories are configured to store executable instructions that, when executed by the processors, implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an exemplary system architecture to which the methods, apparatus, and methods may be applied, in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of identifying abnormal tissue according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow diagram for partitioning nodes of a knowledge-graph according to their degree of association to generate m communities, according to an embodiment of the present disclosure;
FIG. 4 schematically shows a schematic diagram of a knowledge-graph according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart for merging the cohort with a cohort having an edge relationship thereto according to affinity, according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow diagram for determining anomalous transaction links from m communities of a knowledge-graph based on anomalous account information, according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart for obtaining data characteristics of each node of an anomalous traffic link in accordance with an embodiment of the disclosure;
FIG. 8 schematically illustrates a flow chart for calculating an average degree of abnormality for each community based on data characteristics, according to an embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow chart for determining point outliers based on feature vectors according to an embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow chart for calculating an average degree of abnormality for each community based on a point degree of abnormality, according to an embodiment of the present disclosure;
FIG. 11 schematically illustrates an account funds flow diagram according to an embodiment of the present disclosure;
FIG. 12 schematically illustrates a flow chart of a method of identifying abnormal tissue according to an embodiment of the present disclosure;
FIG. 13 schematically shows a schematic diagram of two accounts and an edge attribute, in accordance with an embodiment of the present disclosure;
FIG. 14 schematically shows a schematic diagram of a knowledge-graph according to an embodiment of the disclosure;
fig. 15 is a block diagram schematically illustrating the structure of an apparatus for identifying abnormal tissue according to an embodiment of the present disclosure;
FIG. 16 schematically shows a block diagram of a generation module according to an embodiment of the present disclosure;
fig. 17 schematically shows a block diagram of a merging unit according to an embodiment of the present disclosure;
FIG. 18 schematically illustrates a block diagram of a first determination module, according to an embodiment of the disclosure;
FIG. 19 schematically illustrates a block diagram of a computing module, according to an embodiment of the disclosure;
FIG. 20 schematically illustrates a block diagram of a computing module, according to an embodiment of the disclosure;
fig. 21 schematically shows a block diagram of a structure of an eighth determining unit according to an embodiment of the present disclosure;
FIG. 22 schematically shows a block diagram of a computing unit, according to an embodiment of the present disclosure;
fig. 23 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated. In the technical scheme of the disclosure, the data acquisition, collection, storage, use, processing, transmission, provision, disclosure, application and other processing are all in accordance with the regulations of relevant laws and regulations, necessary security measures are taken, and the public order and good custom are not violated.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features.
With the development of network technology, the illegal transaction behavior of identity camouflage through a technical means is not much different from the normal transaction behavior of a common client, and the privacy is extremely strong and difficult to discover. In the prior art, a method can summarize a plurality of sets of rules through historical illicit data, and identify the illicit act according to the rules. Another method may be to screen transactions of a transaction network manually by analyzing the flow of funds to the relationship, by providing specific account information from a financial institution.
However, the illegal behaviors are identified according to the rules, the rules are difficult to depict to customers, the coverage in the group behaviors is incomplete, and the group members with strong secrecy cannot be found; and the manual fund flow analysis method has the problems of complex transaction relationship, huge data, easy occurrence of mistakes and omissions, low efficiency and the like.
Embodiments of the present disclosure provide a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for identifying abnormal tissue. The method for identifying abnormal tissue comprises the following steps: constructing a knowledge graph according to the acquired transaction data of the historical time period, wherein the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are constructed according to the account information, and edges between the nodes are constructed according to the transaction information; dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, wherein m is an integer greater than or equal to 1; determining abnormal transaction links from m communities of the knowledge graph according to the abnormal account information, wherein the number of the communities where nodes in the abnormal transaction links are located is less than or equal to 2, and the abnormal account information is acquired based on a preset rule; acquiring data characteristics of each node of the abnormal transaction link, and calculating the average abnormal degree of each community according to the data characteristics; and determining a community with the average abnormality degree meeting a set threshold as an abnormal organization.
It should be noted that the method, the apparatus, the electronic device, the computer-readable storage medium, and the computer program product for identifying an abnormal tissue according to the present disclosure may be used in the field of artificial intelligence technology, and may also be used in any field other than the field of artificial intelligence technology, such as the financial field, and the field of the present disclosure is not limited herein.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which the method, apparatus, electronic device, computer-readable storage medium, and computer program product for identification of abnormal tissue may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the identification method of abnormal tissue provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the identification apparatus for abnormal tissue provided by the embodiments of the present disclosure may be generally disposed in the server 105. The method for identifying an abnormal tissue provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the identification apparatus for abnormal tissue provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The method for identifying an abnormal tissue according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 10 based on the scenario described in fig. 1.
Fig. 2 schematically shows a flow chart of a method of identifying abnormal tissue according to an embodiment of the present disclosure.
As shown in fig. 2, the method of identifying abnormal tissue according to this embodiment includes operations S210 to S250.
In operation S210, a knowledge graph is constructed according to the acquired transaction data of the historical time period, the transaction data including account information and transaction information between the account information, nodes of the knowledge graph are constructed according to the account information, and edges between the nodes are constructed according to the transaction information.
It may be understood that transaction data in a historical period may be obtained at the business data system, where the transaction data includes account information, where the account information may include an account name; the transaction information may include a transaction initiator account name and a transaction recipient account name. The nodes of the knowledge graph may be constructed according to account names, and the edges of the knowledge graph may be constructed according to the relationship between the transaction initiator account name and the transaction recipient account name in the transaction information.
In operation S220, the nodes of the knowledge graph are divided according to the association degree of the nodes to generate m communities, where m is an integer greater than or equal to 1.
As one practical manner, as shown in fig. 3, operation S220 divides the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, which includes operations S221 to S226.
In operation S221, each node in the knowledge-graph is determined as a group, whereby the knowledge-graph may be initialized.
In operation S222, each cluster is traversed to determine the affinity between the cluster and each cluster having an edge relationship with respect to fig. 4, for example, there are 12 nodes in the knowledge graph, which are nodes a, b, c, d, e, f, g, h, i, j, k, and l, respectively. When initializing the knowledge graph, a, b, c, d, e, f, g, h, i, j, k and l can be respectively used as a group, and in the following, taking group a as an example, each group is traversed, and the intimacy between the group and each group having an edge relationship with the group is determined, and groups b, c and d having an edge relationship with group a are determined, so that the intimacy between a and b, the intimacy between a and c and the intimacy between a and d can be respectively calculated.
In operation S223, when the intimacy degree satisfies the intimacy degree threshold, the group and the group having an edge relationship therewith are merged according to the intimacy degree, wherein the intimacy degree threshold is a standard threshold set as required.
In some specific examples, as shown in fig. 5, operation S223 merges the group and the group having an edge relationship therewith according to the affinity, including operation S2231 and operation S2232.
In operation S2231, the intimacy degree is sorted according to the magnitude of the value.
In operation S2232, the two groups ranked first or second to last in intimacy are merged according to the ranking result. The sorting of the affinities according to the numerical values may be performed in an ascending order of the affinities or in a descending order of the affinities. When the intimacy degree is sorted in an ascending order, two groups with the intimacy degree sorted in the last to last order are merged; when sorting the intimacy degree in descending order, merging the two groups with the first intimacy degree sorting.
For example, by operation S222, calculating that the intimacy degree between a and b is K1, the intimacy degree between a and c is K2, and the intimacy degree between a and d is K3, when the rank is K1 < K2 < K3, merging the group a and d corresponding to the K3 with the first last rank; when the ordering is K3 > K2 > K1, the group a and d corresponding to K3, which is the first in the ordering, are merged. The merging of the group and the group having an edge relationship thereto according to the intimacy degree can be facilitated by operations S2231 and S2232.
In operation S224, the merged group is regarded as a new group, and the traversal of each group is repeatedly performed to determine the affinity between the group and each group having an edge relationship therewith.
In operation S225, when the affinity does not satisfy the affinity threshold, merging the group and the group having an edge relationship therewith is stopped. For example, the intimacy degree threshold value may be set to be a positive number, and when the intimacy degree is a negative number, and the intimacy degree does not satisfy the intimacy degree threshold value at this time, merging of the group and the group having an edge relationship therewith is stopped.
In operation S226, when the intimacy degree between every two groups does not satisfy the intimacy degree threshold, the current m groups are regarded as m communities. It can be understood that, when every two groups in all the groups of the knowledge graph cannot be merged, the generation of the communities is completed, and at this time, the current m groups may be regarded as m communities.
The intimacy degree can be represented by Q, and the intimacy degree value can be obtained by the formula (1).
Figure BDA0003711557820000101
Where m represents the number of edges between the group being traversed and the other groups, ki, in represents the sum of the weights of the edges incident on the target group for the group being traversed, Σ tot Represents the total weight of the edges incident on the target group, and ki represents the total weight of the edges of the group being traversed.
Thus, the nodes of the knowledge graph can be divided according to the association degree of the nodes to generate m communities through operations S221 to S226.
In operation S230, an abnormal transaction link is determined from m communities of the knowledge graph according to abnormal account information, the number of communities in which nodes in the abnormal transaction link are located is less than or equal to 2, and the abnormal account information is obtained based on a preset rule.
It can be understood that, because members who implement improper financial activities have correlation in funds and transfer funds frequently, the flow direction of the funds mainly comprises three parts, specifically an upstream part, a midstream part and a downstream part, wherein the upstream part is mainly used for absorbing the funds, a plurality of accounts are dispersedly transferred in a short period, dispersedly transferred out or transferred out in a large amount, basically no balance is left, and the characteristic of collecting the funds is obvious; the midstream mainly transfers the capital of the upstream, the intermediary characteristic of the transaction is stronger, and the transaction has the characteristics of fast forward, multiple input and multiple output, and the time is more concentrated; the downstream is mainly divided into various small funds through the midstream funds.
Therefore, the absorbed funds can be transferred into a plurality of accounts in a dispersed way, transferred out in a dispersed way or transferred out in a large amount in a short period, basically no balance is left, the collected funds characteristics are obviously used as a preset rule for judging the account information to be abnormal account information, and the account meeting the abnormal account information is used as an upstream node on an abnormal transaction link; upstream funds can be transferred, the intermediary characteristics are strong, the characteristics of fast forward and multi-in and multi-out are realized, time is concentrated to be used as a preset rule for judging account information to be abnormal account information, and an account meeting the abnormal account information is used as a midstream node on an abnormal transaction link; the fund that passes through the midstream is divided into various small funds to serve as a preset rule for judging account information to be abnormal account information, and the account meeting the abnormal account information is used as a downstream node on an abnormal transaction link.
As a possible implementation manner, as shown in fig. 6, operation S230 determines abnormal transaction links from m communities of the knowledge graph according to the abnormal account information, including operation S231 to operation S233.
In operation S231, a first abnormal node in the knowledge-graph is determined according to a preset rule. It can be understood that, for example, according to the absorbed fund, a plurality of accounts are transferred in a dispersed manner, transferred out in a dispersed manner or transferred out in a large amount within a short period, basically no balance is left, a preset rule with obvious fund characteristics is collected, abnormal account information is determined, and an account meeting the abnormal account information is used as a first abnormal node of the knowledge graph; the method can determine the abnormal account information according to the preset rule that the upstream fund is transferred, the intermediary characteristics are strong, the characteristics of fast forward and multiple forward are realized, and the time is concentrated, and the account meeting the abnormal account information is used as a first abnormal node of a knowledge graph; abnormal account information can be determined according to preset rules for splitting the funds passing through the midstream into various small funds, and accounts meeting the abnormal account information are used as first abnormal nodes of the knowledge graph.
In operation S232, according to the first abnormal node, a directional connected link where the first abnormal node in the knowledge graph is located is determined, where the directional connected link is a link formed by connecting nodes through a directional edge. With reference to fig. 4, it is assumed that the node a is determined as a first abnormal node through operation S231, and there are 6 directional connected links where the node a is located and connected through a directional edge, which are: a-d-f-e; a-d-f-g-1-k-i; a-d-f-g-l-k-j; a-d-f-g-l-i; a-b; a-c-i-h-j.
In operation S233, at least part of links formed by nodes existing in two adjacent communities of the directional connected links are used as abnormal transaction links, which include a first abnormal node. In the directed communication links a-d-f-e, the nodes a and d exist in the community A, and the nodes f and e exist in the community B. The community A and the community B have an edge relationship, so that the community A and the community B are two adjacent communities, and the link a-d-f-e can be used as an abnormal transaction link.
In the directional communication link a-d-f-g-l-k-i, the nodes a and d exist in a community A, the nodes f and g exist in a community B, and the nodes l, k and i exist in a community C. The first abnormal node a exists in the community A, and the community A and the community B have an edge relationship, so that the community A and the community B are two adjacent communities, the community C is not allowed to be crossed, and the link a-d-f-g can be used as an abnormal transaction link. In a directional communication link a-d-f-g-l-k-j; a-d-f-g-l-i; the method for determining the abnormal transaction link in a-b and a-c-i-h-j is the same, and the detailed description is omitted here.
The method for determining the abnormal transaction link without crossing the community can prevent the association of invalid groups, improve the accuracy of identifying abnormal organizations and save computing resources.
Determining abnormal transaction links from the m communities of the knowledge-graph according to the abnormal account information may be facilitated through operations S231 to S233.
In operation S240, data characteristics of each node of the abnormal traffic link are acquired, and an average degree of abnormality of each community is calculated according to the data characteristics.
As a possible implementation manner, as shown in fig. 7, operation S240 obtains data characteristics of each node of the abnormal transaction link, including operation S241 and operation S242.
In operation S241, a node attribute of each node of the abnormal transaction link is determined. It is understood that the transaction data includes account information, and as a possible implementation manner, the account information may further include at least one of a card number, an account issuer, an account opening agent, account opening time, an account opening website, whether to open an account remotely, whether to open an online bank, and an account amount; the transaction information may further include at least one of a transaction initiator card number, a transaction amount, a transaction time, a transaction recipient card number, a transaction mode, and a transaction address.
At least one of a card number, an account issuer, an account opening agent, account opening time, an account opening website, whether to open an account in different places, whether to open an online bank and account opening amount in the account information can be used as the attribute of the node; and taking at least one of a transaction starting party card number, a transaction amount, transaction time, a transaction receiving party card number, a transaction mode and a transaction address in the transaction information as an edge attribute. After the abnormal transaction link is determined, the node attribute of each node of the abnormal transaction link is determined, so that the account information of all nodes in the knowledge graph is not required to be acquired, and only the account information of the nodes on the abnormal transaction link is acquired, so that the computing resources can be saved, and the data processing speed is increased.
In operation S242, data characteristics are acquired according to the node attributes. For example, according to the attributes of the nodes and the attributes of the edges, data characteristics such as whether funds are transferred in dispersedly, whether funds are transferred out dispersedly, whether funds are fast-forwarded and fast-forwarded, whether an account enters a deep sleep period after opening an account, whether an account frequently crosses regions, whether a bank transaction crosses a bank, whether the account opening time of a plurality of bank cards, a place, whether a website is more centralized or not, whether the website is opened in a different place or not can be analyzed. Of course, the data characteristics are not so limited, and are presented here by way of example only and should not be construed as limiting the present disclosure. The data characteristic of each node of the abnormal transaction link can be obtained conveniently through the operations S241 and S242.
As a possible implementation manner, as shown in fig. 8, operation S240 calculates an average abnormality degree for each community according to the data characteristics, including operations S243 to S245.
In operation S243, a feature vector is constructed according to the data features. The data characteristics of whether the fund is transferred in a scattered way, whether the fund is transferred out in a scattered way, whether the fund is fast-forwarded and fast-forwarded, whether the account enters a deep sleep period after opening an account, whether the account frequently crosses regions, whether the account crosses bank transactions, whether the account opening time and the account opening location of a plurality of bank cards are compared and concentrated, and whether the account is opened in different places are exemplified, the data characteristics are expressed in a structured form, the condition is satisfied, the data characteristics are expressed as 1, and the data characteristics are expressed as 0 if the data characteristics are not satisfied.
The data characteristics of the node a are assumed to be fund dispersed transfer-in, fund dispersed transfer-out, fund fast-in fast-out, no deep sleep period after account opening, no frequent cross-region account, no cross-bank transaction, time for opening a plurality of bank cards, place, network point non-concentration and no remote account opening, the conditions for the fund dispersed transfer-in, the fund dispersed transfer-out and the fund fast-in fast-out are met and are represented by 1, the conditions for the deep sleep period after account opening, no frequent cross-region account, no cross-bank transaction, time for opening a plurality of bank cards, place, network point non-concentration and no remote account opening are not met and are represented by 0, and therefore a characteristic vector theta (1, 1, 1, 0, 0, 0, 0, 0, 0) of the node a is constructed.
In operation S244, a point abnormality degree is determined based on the feature vector.
As one way of accomplishing this, as shown in fig. 9, operation S244 determines a point abnormality degree from the feature vectors, including operation S2441 and operation S2442.
In operation S2441, a euclidean distance for measuring a similarity between the feature vector and the criterion vector is calculated from the feature vector and a preset criterion vector. It is understood that a vector constructed when the above data features all satisfy the condition may be taken as a standard vector, resulting in a standard vector β (1, 1, 1, 1, 1, 1). The euclidean distance is represented by D, and can be calculated by equation (2).
Figure BDA0003711557820000131
Where i represents the number of nodes in the anomalous traffic link.
In operation S2442, a point abnormality degree is determined based on the euclidean distance, wherein the point abnormality degree is proportional to the similarity degree. Assuming that the scale factor is set to c, the point abnormality degree may be cD. The determination of the degree of point abnormality from the feature vector can be easily achieved by operations S2441 and S2442.
In operation S245, an average degree of abnormality for each community is calculated based on the point degree of abnormality.
As one way of achieving this, as shown in fig. 10, operation S245 calculates an average degree of abnormality for each community based on the point degree of abnormality, including operation S2451 and operation S2452.
In operation S2451, an average value of the point abnormality degrees of the nodes in the abnormal trading links included in each community is calculated.
In operation S2452, the average value is used as the average abnormality degree of the community.
It is understood that each community may include a plurality of nodes on the abnormal transaction link, and referring to fig. 4, in conjunction with the abnormal transaction link determined in operation S233: a-d-f-e; a-d-f-g; a-b and a-c-i-h-j, and obtaining nodes a, b, c and d in the community A including the abnormal transaction link, so that the average abnormal degree of the community A is the average value of the abnormal degrees of the nodes a, b, c and d; the nodes f, e and g in the abnormal transaction link in the community B can be obtained, so that the average abnormal degree of the community B is the average value of the abnormal degrees of the nodes f, e and g; the nodes i, h and j in the community C including the abnormal transaction link can be obtained, so that the average abnormal degree of the community C is the average value of the abnormal degrees of the nodes i, h and j.
The calculation of the average abnormality degree of each community according to the point abnormality degree can be facilitated by operations S2451 and S2452. The calculation of the average degree of abnormality of each community according to the data characteristics can be facilitated by operations S243 to S245.
In operation S250, a community, of which the average abnormality degree satisfies a set threshold, is determined as an abnormal organization.
According to the identification method of abnormal organization of the embodiment of the disclosure, based on the knowledge graph, by determining the abnormal transaction link from m communities, the data characteristics of each node of the abnormal transaction link are obtained, calculating the average abnormality degree of each community according to the data characteristics, determining the communities with the average abnormality degree meeting a set threshold as abnormal organizations, so that the abnormal transaction links can be easily mined from massive transaction data, the abnormal organization is further identified, the identification method disclosed by the invention is comprehensive in coverage, the abnormal organization with strong secrecy can be efficiently and accurately found, the number of communities where the nodes in the abnormal transaction link are located is less than or equal to 2, namely the abnormal transaction link determination method does not allow the communities to be crossed, association invalid groups can be prevented, the accuracy of identifying the abnormal organization is further improved, and meanwhile, the computing resources are saved.
The method of identifying abnormal tissue according to an embodiment of the present disclosure is described in detail below with reference to fig. 11 to 14. It is to be understood that the following description is illustrative only and is not intended to be in any way limiting of the present disclosure.
The disclosure provides a knowledge graph-based abnormal organization identification method, which is suitable for bank transaction fund transfer for discovering abnormality. Due to the fact that the abnormal accounts are related and have correlation on funds, the accounts are complex in group relation, transfer is frequent, and the abnormal organization can be accurately identified by utilizing the known fund flow direction. The capital flow consists of 3 sections, upstream, midstream, downstream, as shown in fig. 11. The upstream fund is mainly used for absorbing the fund, a plurality of accounts are transferred in a scattered manner, transferred out in a scattered manner or transferred out in a large amount within a short period, the balance is basically not left, and the characteristic of the collected fund is obvious; the midstream fund mainly transfers the upstream fund, the intermediary characteristic of the transaction is stronger, and the transaction has the characteristics of fast forward, multiple input and multiple output, and the time is more concentrated; the downstream funds are mainly split into various small funds through the midstream funds.
In view of the above thought, this disclosure finds individual abnormal accounts and then abnormal organizations through the fund chain flow: firstly, preprocessing transaction flow data in a time window to construct a fund flow direction knowledge graph; then, the account groups are divided into groups with close relations through a Louvian algorithm, then abnormal accounts are found according to historical rules and marked as important suspicious nodes, upstream and downstream abnormal account sets with close relations are found through a transaction group finding algorithm, and whether the abnormal account sets are abnormal or not is judged according to the suspicious degree.
The knowledge graph-based abnormal tissue identification method comprises the following steps:
(1) constructing a fund flow direction knowledge graph: and after bank flow data is processed, a knowledge graph is constructed according to the transaction relation and the fund flow direction between the accounts and is used for analyzing the fund flow direction.
(2) Constructing a transaction group: and dividing the account groups by adopting a Louvian community discovery algorithm, and finding out the account groups with close association relation according to the fund flow direction. The Louvian community discovery algorithm is based on a modularity community discovery algorithm, a hierarchical community structure can be discovered, nodes in a community are closely related, the number of nodes among the communities is as small as possible, and the modularity is a measurement method for evaluating the quality of community network division.
(3) Abnormal transaction link identification: and finding abnormal nodes through historical rules, finding out account nodes associated with the upper, middle and lower streams in a knowledge graph to form a complete transaction link, and allowing a cross-layer group at most when constructing the transaction link in order to avoid associating all transaction groups.
(4) And (3) abnormal account identification: and analyzing the abnormal account set, calculating the suspicious degree of each account in the group, counting the average suspicious degree among the groups, and selecting the groups with high degree from high to low.
Fig. 12 is a flowchart of a method for identifying an abnormal tissue.
1. Data acquisition: the method comprises the steps of obtaining bank flow transaction data in a certain time window, and carrying out structured processing on the data, wherein the data comprise account names, card numbers, transaction amounts, transaction time, opposite account names, opposite card numbers, transaction modes, IP addresses, MAC addresses and transaction network points, and the data are marked as original data, and are specifically shown in table 1.
TABLE 1
Figure BDA0003711557820000161
2. Data cleaning: dirty data is cleaned, and invalid transaction data with incomplete information and transfer failure are filtered out.
3. Constructing a fund flow direction knowledge graph:
1) and importing the transaction data into a graph database, wherein the transaction account and the opposite transaction account are nodes, the transaction relationship is an edge, and the fund flow is in the direction of the edge.
2) The map edge attribute is constructed, and a plurality of transaction records exist between two accounts, so that a feature array consisting of transaction times, transaction amount, time, mode, website, IP address and MAC address is constructed as the edge attribute, as shown in FIG. 13, two accounts d and f and the edge attribute are constructed.
4. Constructing a transaction group: after the knowledge graph is constructed, as shown in fig. 14, a Louvian algorithm is adopted to divide each node in the graph into different groups according to the degree of association, which is very helpful for identifying abnormal tissues, and the group division steps are as follows:
1) initialization, each node in fig. 14 is regarded as an independent group, the number of groups is the same as the number of nodes, and the weights of all edges are regarded as the same.
2) Starting to transfer nodes among the groups, for each node i, sequentially trying to allocate the node i to the group where each neighbor node is located, and calculating the modularity change before and after allocation, wherein the calculation formula is as follows:
Figure BDA0003711557820000171
where m denotes the number of edges in the network, ki, in denotes the sum of the weights incident on the group C by node i, Σ tot Denotes the total weight of the incident group C, and ki denotes the total weight of the incident node i.
3) And 2) repeating, and continuing to perform inter-group node transfer evaluation until the group to which all the nodes belong does not change any more, namely the node transfer among the groups is finished.
4) And reconstructing the graph, reconstructing all nodes in the same group into a new group, updating the weight of the edge between the nodes in the group into the weight of the ring of the new node, and updating the weight of the edge between the groups into the weight of the edge between the new nodes.
5) Repeating 2) until the modularity of the whole graph is not changed any more, wherein the calculation formula of the modularity is as follows:
Figure BDA0003711557820000172
sigma in represents the sum of the weights of the sides in the group C, and sigma tot represents the sum of the weights of all the sides connected with the nodes in the group C; the constructed groups are shown in fig. 14, each circle forms a group, and there may be an association between the groups.
5. Abnormal transaction link identification:
1) and finding abnormal account information in the original data according to the historical rule.
2) And finding out corresponding nodes in the knowledge graph through the abnormal transaction records and the abnormal transaction accounts, and marking the nodes as important abnormal nodes.
3) And according to the abnormal nodes, finding out the related account nodes in the knowledge graph of the upstream, the middle and the downstream, and constructing a transaction chain node. The specific steps of the trade link point discovery algorithm are as follows:
a. all nodes are initialized and treated as a single transaction chain for each node.
b. Finding an import node and an export node of the abnormal node according to the abnormal node, judging whether the abnormal node is an upstream node or not by using a rule according to the characteristics of upstream funds, and marking the abnormal node as the upstream node; and when the risk-free account is encountered, removing the risk-free account, and adding the risk-free account into the transaction chain by the import node and the export node.
c. And b) repeating the operation b in the step 3) for the new transaction chain node, if the node is an upstream node, not tracking the sink node, and only adding the sink node until the sink node of the new node cannot be found.
d. If the upstream and downstream nodes directly associated with the abnormal node are not in the same group, the algorithm only allows a cross-layer group, for example, the abnormal node a is in the group A, B and c are in the group B, D is in the group D, and the transaction relationship of a-B-c-D exists, then the transaction link of a is a-B-c, and is not allowed to cross to the group D, so as to prevent association with invalid groups.
6. Acquiring account data:
1) acquiring account opening data of the account aiming at the nodes in the abnormal transaction link in the step 5, wherein the account opening data comprises the following steps: the account opening person, the account opening agent, the account opening time, the account opening network point, whether to open an account in different places, whether to open an online bank and the account opening amount.
2) The account opening data is used as a label of each account node and attached to the nodes of the knowledge graph, so that the data can be analyzed conveniently.
7. Fraud account identification:
1) according to the discovery of transaction link nodes, node attributes, edges and edge attributes on the knowledge graph, relevant data characteristics are analyzed, and the method comprises the following steps:
a. account transaction behavior characteristics: whether the fund is transferred in a scattered way, whether the fund is transferred out in a scattered way, and whether the fund is fast-forwarded and fast-forwarded.
b. The account behavior characteristics are as follows: whether the account enters a deep sleep period after the account is opened or not, whether the account frequently carries out cross-regional or cross-bank transaction or not.
c. The correlation characteristics among accounts are as follows: whether the time, place and network point of opening account of a plurality of bank cards are centralized or not and whether the account is opened in different places or not.
2) And (3) abnormal degree calculation of the fraud account:
a. selecting 9 features above an account to form a feature vector, expressing the features in a structured form, expressing 1 if the conditions are met, or else, expressing 0, and setting a standard feature vector beta (1, 1, 1, 1, 1, 1, 1, 1, 1, 1); for example, the behavior data of an existing abnormal account a in a certain period of time is characterized by the following abnormal behaviors, namely, fund scatter transfer, fast forward and fast backward, and the feature vector is θ (1, 1, 1, 0, 0, 0, 0, 0, 0, 0), the degree of abnormality of the account a is the euclidean distance value between the vector β and θ, and the calculation formula is as follows:
Figure BDA0003711557820000181
where i represents the number of nodes in the anomalous traffic link.
b. And calculating the similarity between the characteristic vector of each account and the standard characteristic vector by adopting an Euclidean distance formula, wherein the higher the similarity is, the higher the abnormal degree of the node is, and then calculating the average abnormal degree of each group node.
c. And selecting the first N group nodes with higher abnormal degree from high to low in sequence, and finally, visually displaying the abnormal group nodes through a knowledge graph.
According to the identification method, the abnormal team behaviors can be identified from the concealed behaviors through the flow direction of the financial chain, compared with the single individual illegal behaviors, the whole abnormal chain can be more comprehensively excavated by excavating the abnormal organization, the unknown risk of identifying financial activities can be effectively enhanced, and the wind control management capability of a financial institution is improved. In addition, the complex transaction relation is visually displayed through the knowledge graph, so that the risk analysis can be performed by the staff more visually, and the working efficiency is improved.
Based on the above abnormal tissue identification method, the present disclosure also provides an abnormal tissue identification device 10. The identification device 10 for abnormal tissue will be described in detail below with reference to fig. 15 to 22.
Fig. 15 schematically shows a block diagram of the structure of the identification apparatus 10 of abnormal tissue according to the embodiment of the present disclosure.
The apparatus 10 for identifying an abnormal tissue includes a construction module 1, a generation module 2, a first determination module 3, a calculation module 4, and a second determination module 5.
Building block 1, the building block 1 being configured to perform operation S210: and constructing a knowledge graph according to the acquired transaction data of the historical time period, wherein the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are constructed according to the account information, and edges between the nodes are constructed according to the transaction information.
A generating module 2, the generating module 2 being configured to perform operation S220: and dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, wherein m is an integer greater than or equal to 1.
A first determining module 3, the first determining module 3 being configured to perform operation S230: determining abnormal transaction links from m communities of the knowledge graph according to the abnormal account information, wherein the number of the communities where nodes in the abnormal transaction links are located is less than or equal to 2, and the abnormal account information is obtained based on a preset rule.
A calculating module 4, the calculating module 4 being configured to perform operation S240: and acquiring the data characteristics of each node of the abnormal transaction link, and calculating the average abnormal degree of each community according to the data characteristics.
A second determining module 5, the second determining module 5 being configured to perform operation S250: and determining the community with the average abnormality degree meeting the set threshold value as the abnormal organization.
Fig. 16 schematically shows a block diagram of the structure of the generation module 2 according to an embodiment of the present disclosure. The generation module 2 comprises a first determination unit 21, a second determination unit 22, a merging unit 23, a repeat execution unit 24, a termination unit 25 and a third determination unit 26.
A first determining unit 21, the first determining unit 21 being configured to perform the determining of each node in the knowledge-graph as a group.
A second determining unit 22, wherein the second determining unit 22 is configured to perform a traversal of each group, and determine an affinity between the group and each group having an edge relationship therewith.
And the merging unit 23, where the merging unit 23 is configured to perform merging, according to the intimacy degree, the group and the group having an edge relationship with the group when the intimacy degree satisfies the intimacy degree threshold.
And the repeated execution unit 24 is configured to execute, regarding the merged group as a new group, repeatedly executing traversal of each group, and determining affinity between the group and each group having an edge relationship with the group.
A terminating unit 25, wherein the terminating unit 25 is configured to stop merging the group and the group having the edge relationship with the group when the intimacy degree does not satisfy the intimacy degree threshold value.
A third determining unit 26, wherein the third determining unit 26 is configured to perform that the current m groups are taken as m communities when the intimacy degree between every two groups does not meet the intimacy degree threshold value.
Fig. 17 schematically shows a block diagram of the merging unit 23 according to an embodiment of the present disclosure. The merging unit 23 comprises a sorting element 231 and a merging element 232.
A sorting element 231, the sorting element 231 being configured to perform sorting the intimacy degree according to the magnitude of the value.
A merging element 232, the merging element 232 is configured to perform merging of two groups that are sorted by intimacy first or last according to the sorting result.
Fig. 18 schematically shows a block diagram of the first determination module 3 according to an embodiment of the present disclosure. The first determination module 3 includes a fourth determination unit 31, a fifth determination unit 32, and a sixth determination unit 33.
A fourth determining unit 31, wherein the fourth determining unit 31 is configured to determine the first abnormal node in the knowledge-graph according to a preset rule.
And a fifth determining unit 32, where the fifth determining unit 32 is configured to determine, according to the first abnormal node, a directional connected link where the first abnormal node in the knowledge graph is located, where the directional connected link is a link formed by connecting nodes through a directional edge.
A sixth determining unit 33, wherein the sixth determining unit 33 is configured to execute at least part of links formed by nodes existing in two adjacent communities of the directional connected links as abnormal transaction links, and the abnormal transaction links include the first abnormal node.
Fig. 19 schematically shows a block diagram of the computing module 4 according to an embodiment of the present disclosure. The calculation module 4 includes a seventh determination unit 41 and an acquisition unit 42.
A seventh determining unit 41, the seventh determining unit 41 is configured to perform determining a node attribute of each node of the abnormal transaction link.
And an obtaining unit 42, where the obtaining unit 42 is configured to perform obtaining the data characteristics according to the node attributes.
Fig. 20 schematically shows a block diagram of the computing module 4 according to an embodiment of the present disclosure. The calculation module 4 comprises a construction unit 43, an eighth determination unit 44 and a calculation unit 45.
A constructing unit 43, where the constructing unit 43 is configured to execute constructing a feature vector according to the data feature.
An eighth determining unit 44, the eighth determining unit 44 is configured to perform determining the point abnormality degree based on the feature vector.
And the calculation unit 45 is used for calculating the average abnormality degree of each community according to the point abnormality degree by the calculation unit 45.
Fig. 21 schematically shows a block diagram of the eighth determining unit 44 according to an embodiment of the present disclosure. The eighth determination unit 44 comprises a first calculation element 441 and a first determination element 442.
A first calculating element 441, wherein the first calculating element 441 is used for calculating a euclidean distance according to the feature vector and a preset standard vector, and the euclidean distance is used for measuring the similarity between the feature vector and the standard vector.
A first determining element 442, the first determining element 442 configured to perform determining a point anomaly based on the euclidean distance, wherein the point anomaly is proportional to the similarity.
Fig. 22 schematically shows a block diagram of the calculation unit 45 according to an embodiment of the present disclosure. The calculation unit 45 comprises a second calculation element 451 and a second determination element 452.
A second calculation element 451, the second calculation element 451 being configured to perform calculation of an average value of the point abnormality degrees of the nodes in the abnormal trade links included in each community.
A second determining element 452, the second determining element 452 being configured to perform taking the average value as the average degree of abnormality of the community.
According to the identification apparatus 10 of abnormal organization of the embodiment of the present disclosure, by determining abnormal trade links from m communities based on a knowledge graph, data characteristics of each node of the abnormal trade links are acquired, calculating the average abnormality degree of each community according to the data characteristics, determining the communities with the average abnormality degree meeting a set threshold as abnormal organizations, so that the abnormal transaction links can be easily mined from massive transaction data, the abnormal organization is further identified, the identification method disclosed by the invention is comprehensive in coverage, the abnormal organization with strong secrecy can be efficiently and accurately found, the number of communities where the nodes in the abnormal transaction link are located is less than or equal to 2, namely the abnormal transaction link determination method does not allow the communities to be crossed, association invalid groups can be prevented, the accuracy of identifying the abnormal organization is further improved, and meanwhile, the computing resources are saved.
In addition, according to the embodiment of the present disclosure, any plurality of the building module 1, the generating module 2, the first determining module 3, the calculating module 4, and the second determining module 5 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module.
According to an embodiment of the present disclosure, at least one of the building module 1, the generating module 2, the first determining module 3, the calculating module 4, and the second determining module 5 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or in a suitable combination of any of them.
Alternatively, at least one of the building module 1, the generating module 2, the first determining module 3, the calculating module 4 and the second determining module 5 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
Fig. 23 schematically shows a block diagram of an electronic device adapted to implement the above method according to an embodiment of the present disclosure.
As shown in fig. 23, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The driver 910 is also connected to an input/output (I/O) interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. The program code is for causing a computer system to perform the methods of the embodiments of the disclosure when the computer program product is run on the computer system.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (11)

1. A method for identifying an abnormal tissue, comprising:
establishing a knowledge graph according to acquired transaction data of historical time intervals, wherein the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are established according to the account information, and edges between the nodes are established according to the transaction information;
dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, wherein m is an integer greater than or equal to 1;
determining abnormal transaction links from m communities of the knowledge graph according to abnormal account information, wherein the number of the communities where nodes in the abnormal transaction links are located is less than or equal to 2, and the abnormal account information is acquired based on a preset rule;
acquiring data characteristics of each node of the abnormal transaction link, and calculating the average abnormal degree of each community according to the data characteristics; and
and determining the communities with the average abnormality degree meeting a set threshold value as abnormal organizations.
2. The method of claim 1, wherein the dividing the nodes of the knowledge-graph according to the association degree of the nodes to generate m communities comprises:
determining each node in the knowledge graph as a group;
traversing each group, and determining the intimacy between the group and each group having an edge relationship with the group;
when the intimacy degree meets the intimacy degree threshold value, merging the group and the group with the edge relation according to the intimacy degree;
taking the merged group as a new group, repeatedly executing the traversal of each group, and determining the intimacy between the group and each group with edge relation;
when the intimacy degree does not meet the intimacy degree threshold value, stopping merging the group and the group with the edge relation; and
and when the intimacy between every two groups does not meet the intimacy threshold, taking the current m groups as m communities.
3. The method of claim 2, wherein merging the group and the group having an edge relationship with the group according to the affinity comprises:
sorting the intimacy degree according to the numerical value; and
and combining the two groups with the first or last affinity ranking according to the ranking result.
4. The method of claim 1, wherein determining anomalous transaction links from m communities of knowledge-graph based on anomalous account information comprises:
determining a first abnormal node in the knowledge graph according to the preset rule;
determining a directed connected link where the first abnormal node in the knowledge graph is located according to the first abnormal node, wherein the directed connected link is formed by connecting nodes through directed edges; and
and taking at least part of links formed by nodes existing in two adjacent communities of the directional communication links as abnormal transaction links, wherein the abnormal transaction links comprise the first abnormal nodes.
5. The method of claim 1, wherein the obtaining data characteristics of each node of the anomalous traffic link comprises:
determining a node attribute for each node of the anomalous traffic link; and
and acquiring data characteristics according to the node attributes.
6. The method according to any one of claims 1 to 5, wherein the calculating an average degree of abnormality for each community according to the data characteristics comprises:
constructing a feature vector according to the data features;
determining a point abnormality degree according to the feature vector; and
and calculating the average abnormality degree of each community according to the point abnormality degree.
7. The method of claim 6, wherein said determining point outliers from said feature vectors comprises:
calculating Euclidean distance according to the feature vector and a preset standard vector, wherein the Euclidean distance is used for measuring the similarity between the feature vector and the standard vector; and
and determining a point anomaly based on the Euclidean distance, wherein the point anomaly is in direct proportion to the similarity.
8. The method of claim 6, wherein calculating an average degree of abnormality for each community based on the point degree of abnormality comprises:
calculating an average value of the point abnormality degrees of the nodes in the abnormal transaction links included in each community; and
and taking the average value as the average abnormality degree of the community.
9. An apparatus for identifying abnormal tissue, comprising:
the construction module is used for constructing a knowledge graph according to the acquired transaction data of the historical time period, the transaction data comprises account information and transaction information between the account information, nodes of the knowledge graph are constructed according to the account information, and edges between the nodes are constructed according to the transaction information;
the generating module is used for dividing the nodes of the knowledge graph according to the association degree of the nodes to generate m communities, wherein m is an integer greater than or equal to 1;
the first determination module is used for determining abnormal transaction links from m communities of the knowledge graph according to abnormal account information, the number of communities where nodes in the abnormal transaction links are located is less than or equal to 2, and the abnormal account information is obtained based on a preset rule;
the calculation module is used for acquiring the data characteristics of each node of the abnormal transaction link and calculating the average abnormal degree of each community according to the data characteristics; and
a second determination module to perform determination of a community for which the average degree of abnormality satisfies a set threshold as an abnormal organization.
10. An electronic device, comprising:
one or more processors;
one or more memories for storing executable instructions that, when executed by the processor, implement the method of any of claims 1-8.
11. A computer-readable storage medium having stored thereon executable instructions that when executed by a processor implement a method according to any one of claims 1 to 8.
CN202210732268.9A 2022-06-24 2022-06-24 Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium Pending CN115062163A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210732268.9A CN115062163A (en) 2022-06-24 2022-06-24 Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210732268.9A CN115062163A (en) 2022-06-24 2022-06-24 Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium

Publications (1)

Publication Number Publication Date
CN115062163A true CN115062163A (en) 2022-09-16

Family

ID=83203134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210732268.9A Pending CN115062163A (en) 2022-06-24 2022-06-24 Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium

Country Status (1)

Country Link
CN (1) CN115062163A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435963A (en) * 2023-12-19 2024-01-23 百融云创科技股份有限公司 Digital asset fraud group determination method, device, electronic equipment and storage medium
WO2024093960A1 (en) * 2022-11-01 2024-05-10 马上消费金融股份有限公司 Verification method and verification apparatus for abnormal transaction coping strategy

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024093960A1 (en) * 2022-11-01 2024-05-10 马上消费金融股份有限公司 Verification method and verification apparatus for abnormal transaction coping strategy
CN117435963A (en) * 2023-12-19 2024-01-23 百融云创科技股份有限公司 Digital asset fraud group determination method, device, electronic equipment and storage medium
CN117435963B (en) * 2023-12-19 2024-04-12 百融云创科技股份有限公司 Digital asset fraud group determination method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US20210174440A1 (en) Providing virtual markers based upon network connectivity
Shi et al. Toward a better measure of business proximity
CN109087079B (en) Digital currency transaction information analysis method
US20190095992A1 (en) Method and system to facilitate decentralized money services software as a service
US20220083531A1 (en) Abnormal event analysis
CN115062163A (en) Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium
Goldsmith et al. Analyzing hack subnetworks in the bitcoin transaction graph
CN110135978B (en) User financial risk assessment method and device, electronic equipment and readable medium
CN110148053B (en) User credit line evaluation method and device, electronic equipment and readable medium
CN111199474A (en) Risk prediction method and device based on network diagram data of two parties and electronic equipment
CN110807129B (en) Method and device for generating multi-layer user relation graph set and electronic equipment
CN113159934A (en) Method and system for predicting passenger flow of network, electronic equipment and storage medium
CN114638695A (en) Credit evaluation method, device, equipment and medium
CN113538137A (en) Capital flow monitoring method and device based on double-spectrum fusion calculation
Fujiwara et al. Money flow network among firms’ accounts in a regional bank of Japan
CN111798304A (en) Risk loan determination method and device, electronic equipment and storage medium
US20230139364A1 (en) Generating user interfaces comprising dynamic base limit value user interface elements determined from a base limit value model
CN113191681A (en) Site selection method and device for network points, electronic equipment and readable storage medium
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
WO2020214187A1 (en) Identifying and quantifying sentiment and promotion bias in social and content networks
Shatnawi et al. Big data analytics tools and applications: survey
CN114723548A (en) Data processing method, apparatus, device, medium, and program product
CN114154752A (en) Enterprise risk prediction method, device, electronic equipment, medium and program product
Rosenquist et al. On the Dark Side of the Coin: Characterizing Bitcoin Use for Illicit Activities
Garin et al. Machine learning in classifying bitcoin addresses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination