CN116167865A - Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium - Google Patents

Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium Download PDF

Info

Publication number
CN116167865A
CN116167865A CN202211549331.1A CN202211549331A CN116167865A CN 116167865 A CN116167865 A CN 116167865A CN 202211549331 A CN202211549331 A CN 202211549331A CN 116167865 A CN116167865 A CN 116167865A
Authority
CN
China
Prior art keywords
risk
community
bill
node
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211549331.1A
Other languages
Chinese (zh)
Inventor
沈洋
王珂瑶
赵秀丽
徐慧琴
石宁
徐迎田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN202211549331.1A priority Critical patent/CN116167865A/en
Publication of CN116167865A publication Critical patent/CN116167865A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a system, a terminal device and a storage medium for identifying abnormal clients of bills based on community discovery, and relates to the field of computer systems. The accuracy rate of the list is high through feedback of the business personnel of each branch, the time cost of on-site investigation and the cost of manpower and material resources are greatly saved, the great approval and popularization of each branch are obtained, and corresponding actual measures such as shutdown business or enhanced risk monitoring are adopted for abnormal clients confirmed in the list.

Description

Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium
Technical Field
The invention relates to the field of computer systems, in particular to a method, a system, terminal equipment and a storage medium for identifying abnormal bill clients based on community discovery.
Background
The bill market is an important component of the financial market in China, and is also an important settlement and financing market of entity economy. In recent years, as the traditional asset business of commercial banks, the economic function of the bill market service entity is further exerted under the background that the macroscopic economy speed-up slows down and the economic development enters a new normal state. With the wide involvement of small and medium banks and non-silver financial institutions in bill business, particularly the intervention of folk bill intermediaries, bill participation bodies are diversified gradually, and the institutions often have larger differences in the aspects of own risk preference, internal risk control, management system construction, business personnel allocation and the like, so that new risk factors are brought to bill markets. In particular, the infiltration of the civil bill agency to the bill market increases the possibility of moral risk and transaction fraud, and the risk is diffused to other market bodies along a complex transaction chain, so that the difficulty of bill business risk prevention and control is increased to a certain extent. Currently, abnormal clients such as bill intermediaries, blank companies and the like transact bill business to become an important factor affecting the compliance risk management of commercial banks. Analysis finds that the profit patterns and main characteristics of bill intermediaries and empty shell companies are as follows:
1. profit mode
(1) Earning buying and selling gap: first, a company is registered and established, and the actual operation is conducted to control the daily operation. And secondly, collecting ticket source information in a ticket market, continuously expanding ticket sources, finding out ticket holding enterprises, negotiating with the ticket holding enterprises to paste the prices, and acquiring the ticket in a mode of endorsement transfer and payment of enterprise price. Again, forging trade background materials including value added tax invoices, finding a partner bank through the operated company to make a posting.
(2) Earning intermediary service fees: a channel is provided for buying and selling bills by establishing a 'cooperative relationship' with an individual bank, a quotation and matching service is provided for bill buyers and sellers by utilizing the bank channel, an enterprise is illicitly helped to obtain flowing funds, and an intermediary service fee is charged.
2. Main features
(1) The ticket source needs to be expanded continuously and is not limited by the authenticity of trade background, so a large number of tickets (the number of tickets, the amount of tickets and the number of enterprises seeking to be posted from the ticket source are large).
(2) In order to not squeeze the fund pool, the cash register or endorsement transfer operation can be carried out immediately after the cash register receives the ticket, so that the average ticket holding number is lower.
(3) To preempt the market, the business will not "stop" and the transaction will be relatively frequent.
(4) The most typical features that distinguish from other suspicious transactions: there are a number of "integer-like" instrument transactions.
The prior art mainly relies on the traditional manual examination mode for bill intermediation and empty shell company identification, not only consumes manpower and material resources, but also has poor actual effect. Experience is relied upon in the process of customer admission inspection and subsequent service reinspection.
In the prior art, the risk of the existing bill service is analyzed at the bill service level, and the risk that a blank company and a bill intermediary are important precautions of a commercial bank is mentioned, but the technologies only put forward corresponding management suggestions of the bill service at the service operation level, and the foothold is still on manual examination in the traditional mode, so that the manpower and more values are not liberated by utilizing the natural big data advantage of the electric bill.
Meanwhile, the prior art utilizes the Luwen algorithm to search the case-related cluster of the transaction data, and is mainly used for economic reconnaissance work. Although the Luwen algorithm technology is used, the application field is in economic case, and only the search function is realized.
The business bank bill business authorities need to identify abnormal clients of all clients of the present bank on-hand bill, and the main purpose is to screen empty companies and bill intermediaries in the abnormal clients, discover fraud risks and compliance risks in bill business in time, and prompt and prevent corresponding risks. Therefore, the customer quality is improved, the negative influence of abnormal customers mixed in the entity enterprises on the bank bill business is reduced, the bank funds can be ensured to reach the real enterprises and the real demands, and the economical capacity of the financial service entity is improved.
The identification of abnormal bill clients by commercial banks is still in the stage of relying on experience judgment, subjective speculation and on-site manual examination, and the increasing client level in the electric bill era forms a great challenge for the traditional method which consumes a great amount of time, manpower and material resources. Therefore, by means of the natural big data advantage of the ticket age, the off-site investigation of ticket abnormal clients by using a model algorithm by using technological forces becomes urgent.
Disclosure of Invention
The embodiment of the invention provides a method, a system, terminal equipment and a storage medium for identifying abnormal bill clients based on community discovery.
A method for identifying abnormal bill clients based on community discovery specifically comprises the following steps:
step 1, designing point-edge relation of bill clients, wherein the step is to analyze risk indexes related to the bill clients and design nodes and edges of bill close client patterns;
processing node and edge data, wherein the step is to process the point-edge design scheme in the step 1 into three node files, one attribute file and two edge files;
step 3, building a bill client knowledge graph, wherein the bill client knowledge graph is built in Neo4j by using the nodes, the attributes and the edge files processed in the step 2, and different nodes are marked by using different colors;
step 4, dividing communities by using a Louvain algorithm, wherein the Louvain algorithm is an algorithm for community discovery based on modularity;
step 5, dividing community groups according to the community risk probability, and setting three risk probability level thresholds p low ,p mid ,p high The risk probability is less than or equal to p low Is a community group with low risk probability, and the risk probability is larger than p low P is less than or equal to mid Is a risk-in-stroke probability community group, and the risk probability is greater than p mid P is less than or equal to high Is a community group with high risk probability;
step 6, calculating the risk score of each bill client, taking the risk score as one dimension in the client risk evaluation according to the community risk grade obtained in the step 5, and combining other risk indexes of the bill impression client and corresponding weights thereof to obtain the comprehensive risk score of the client;
step 7, calculating the risk level of each bill client, converting the risk score of each bill client into a score of 0-100 according to the mapping relation, and dividing the risk score into 10 risk level levels 1 ,level 2 ,...,level 10
Step 8, selecting clients with risk scores higher than a limit value to generate an abnormal client list, and selecting clients with risk scores higher than a set risk threshold level risk Forming an abnormal client list by the clients of (1);
step 9, issuing an abnormal client list to the operation institution for verification and confirmation;
and step 10, calculating a model identification effect according to the feedback result of the operation institution.
Further: the nodes in the step 1 are three types, namely bill client nodes, associated person nodes and account checking IP nodes, wherein the associated person nodes comprise natural persons related to bill clients and bill services, such as legal persons, stakeholders, high-level management, actual control persons, client managers and the like.
Further: the calculation formula of the modularity is as follows:
Figure BDA0003981469340000051
where m is the number of connections in the network, v and w are any two nodes in the network, A when there is a connection between them vw =1, otherwise a vw =0;k W Representing the degree of node w; delta (c) V ,c w ) For determining whether the nodes v and w are in the same community, if so V ,c w ) =1, otherwise δ (c V ,c w )=0;
The simplified form is:
Figure BDA0003981469340000052
wherein, sigma in Is the number of edges in community c; sigma (sigma) tot Is the sum of the degrees of nodes in community c;
the calculation formula of the modularity increment is as follows:
Figure BDA0003981469340000053
Figure BDA0003981469340000054
wherein, sigma in Is the number of edges in community c; sigma (sigma) tot Is the degree of the node within community c; k (k) i Is the degree of node i; k (k) i,in Is the sum of the number of connections between node i and nodes within community c.
Further: the Louvain algorithm is divided into three phases, namely:
step 1, each node is made to belong to a community c, n nodes in the network exist at the moment, n communities exist, and the module degree Q at the moment is calculated 0 Then let node i no longer belong to the community c in which it is located i Dividing node i and node j into communities, and calculating the modularity Q at the moment 1 Calculating module gain Δq=q 1 -Q 0 If delta Q If the node i is more than 0, the node i should be divided into communities where j is located, otherwise, the node i should not be divided into communities where j is located;
step 2, the communities divided in the step 1 are aggregated into a node, and the whole network is reconstructed;
and 3, when the modularity is no longer increased, the iteration is automatically stopped.
Further: the risk probability calculation formula of the community is as follows:
Figure BDA0003981469340000061
wherein c i Is community i, n risk For community c i The number of clients that have been marked as anomalous clients, n norisk For community c i The number of clients not marked as anomalous clients.
Further: the calculation formula of the comprehensive risk score is as follows:
Figure BDA0003981469340000062
wherein r is 0 Is the community risk level, w 0 Representing the weight, r 1 ,r 2 ,...,r k Is other risk index of interest in bill business, w 1 ,w 2 ,...,w k Is the weight corresponding to these risk indicators.
Further: the node attributes in the step 1 are two types, namely the bill client attribute and the bill risk index.
Further: the edges in the step 1 are two types, one type is that the associated person node points to the bill client node, and the other type is that the reconciliation I P node points to the bill client node.
Further: the system comprises a data acquisition module, a data processing module, an algorithm module, a logic module and a display module;
the data acquisition module is used for acquiring bill customer information;
the algorithm module obtains corresponding index information based on the obtained bill client information and matched with a corresponding algorithm;
the logic module is used for carrying out logic judgment and screening and rejection on the index information;
the display module is used for displaying the index information after the judgment to the management institution.
Further: the terminal device may include: the system comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the terminal device is running, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to execute the steps of the deep learning model training method as described in the previous embodiment.
Further: a storage medium storing a computer program which, when executed by a processor, performs the steps of the method described above.
The invention has the beneficial effects that: according to the method, a relationship map of bill close clients is established, community division is carried out on a bill client network according to a Louvain algorithm, risk probability of each community is calculated, risk scores and risk grades of each client are calculated by combining other risk characteristic indexes of bill clients, client groups with higher risk scores and risk grades are selected to enter an abnormal client investigation list according to the sequence, and the list is issued to an operation institution for manual verification. The accuracy rate of the list is high through feedback of the business personnel of each branch, the time cost of on-site investigation and the cost of manpower and material resources are greatly saved, the great approval and popularization of each branch are obtained, and corresponding actual measures such as shutdown business or enhanced risk monitoring are adopted for abnormal clients confirmed in the list.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 shows a schematic flow chart of the method of the present invention;
fig. 2 shows a schematic diagram of the composition of the device of the invention;
fig. 3 shows a schematic diagram of the composition of the terminal device of the present invention;
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present invention, and it should be understood that the drawings in the present invention are for the purpose of illustration and description only and are not intended to limit the scope of the present invention. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present invention. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.
In addition, the described embodiments of the invention are only some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
It should be noted that the term "comprising" will be used in embodiments of the invention to indicate the presence of the features stated hereafter, but not to exclude the addition of other features. It should also be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. In the description of the present invention, it should also be noted that the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Figure 1 shows a flow chart of the steps of the method of the invention.
The method for identifying abnormal bill clients based on community discovery specifically comprises the following steps:
step 1, designing point-edge relation of bill clients, wherein the step is to analyze risk indexes related to the bill clients and design nodes and edges of bill impression client patterns, and the nodes in the step are three types, namely bill client nodes, associated person nodes and reconciliation I P nodes, wherein the associated person nodes comprise natural persons related to bill clients and bill services, such as legal persons, stakeholders, high-rise management, actual control persons, client managers and the like; the node attributes of the step are two types, namely a bill client attribute and a bill risk index, wherein the bill client attribute comprises a client name, a client number, a client belonging branch, a client belonging agency, a client belonging secondary industry and the like, and the bill risk index comprises an endorsement out-of-limit number, a single-day maximum endorsement out-of-limit number, whether an abnormal list client, whether a sensitive industry and the like; the sides of the step are two types, one type is that the associated person node points to the bill client node, the other type is that the reconciliation I P node points to the bill client node, and the two types of sides both represent subordinate relations;
and 2, processing data of nodes and edges, wherein the step is to process the point-edge design scheme in the step 1 into three node files, one attribute file and two edge files.
And 3, building a bill client knowledge graph, wherein the bill client knowledge graph is built in Neo4j by using the nodes, the attributes and the edge files processed in the step 2, and different nodes are marked by using different colors.
And 4, dividing communities by using a Louvain algorithm, wherein the Louvain algorithm is from an article Fast unfolding of communities in large networks published by Vincent et al and is an algorithm for community discovery based on modularity. Modularity is a quantization index used for measuring the quality of community division. If a community division algorithm can divide points with dense connections into communities and the connections between the communities are sparse, the value of the network modularity obtained by division is larger, so that the community division with larger modularity is better.
The calculation formula of the modularity is as follows:
Figure BDA0003981469340000101
where m is the number of connections in the network, v and w are any two nodes in the network, A when there is a connection between them vw =1, otherwise a vw =0;k W Representing the degree of node w; delta (c) V ,c w ) For determining whether the nodes v and w are in the same community, if so V ,c w ) =1, otherwise δ (c V ,c w )=0;
The simplified form is:
Figure BDA0003981469340000102
/>
wherein, sigma in Is the number of edges in community c; sigma (sigma) tot Is the sum of the degrees of nodes in community c;
the calculation formula of the modularity increment is as follows:
Figure BDA0003981469340000111
Figure BDA0003981469340000112
wherein, sigma in Is an edge within community cA number; sigma (sigma) tot Is the degree of the node within community c; k (k) i Is the degree of node i; k (k) i,in Is the sum of the number of connections between node i and nodes within community c.
The Louvain algorithm is divided into three phases, namely:
step 1, each node is made to belong to a community c, n nodes in the network exist at the moment, n communities exist, and the module degree Q at the moment is calculated 0 Then let node i no longer belong to the community c in which it is located i Dividing node i and node j into communities, and calculating the modularity Q at the moment 1 Calculating module gain Δq=q 1 -Q 0 If delta Q If the node i is more than 0, the node i should be divided into communities where j is located, otherwise, the node i should not be divided into communities where j is located;
step 2, the communities divided in the step 1 are aggregated into a node, and the whole network is reconstructed;
and 3, when the modularity is no longer increased, the iteration is automatically stopped.
Calculating the risk probability of each community, and supposing c i Is community i, n risk For community c i The number of clients that have been marked as anomalous clients, n norisk For community c i The number of clients not marked as anomalous clients.
The risk probability calculation formula of the community is as follows:
Figure BDA0003981469340000113
step 5, setting three risk probability level thresholds p low ,p mid ,p high The risk probability is less than or equal to p low Is a community group with low risk probability, and the risk probability is larger than p low P is less than or equal to mid Is a risk-in-stroke probability community group, and the risk probability is greater than p mid P is less than or equal to high Is a community group with high risk probability.
And 6, calculating the risk score of each bill client, taking the risk score as one dimension in the client risk evaluation according to the community risk grade obtained in the step 5, and combining other risk indexes of the bill impression client and corresponding weights thereof to obtain the comprehensive risk score of the client. The calculation formula of the comprehensive risk score is as follows:
Figure BDA0003981469340000121
wherein r is 0 Is the community risk level, w 0 Representing the weight, r 1 ,r 2 ,...,r k Is other risk index of interest in bill business, w 1 ,w 2 ,...,w k Weights corresponding to the risk indexes;
step 7, calculating the risk level of each bill client, converting the risk score of each bill client into a score of 0-100 according to the mapping relation, and dividing the risk score into 10 risk level levels 1 ,level 2 ,...,level 10 . The higher the score, the higher the risk level, and the greater the probability that the customer is an anomalous customer; conversely, the lower the score, the lower the risk level, and the less likely the customer is an anomalous customer;
step 8, selecting clients with risk scores higher than a limit value to generate an abnormal client list, and selecting clients with risk scores higher than a set risk threshold level risk Forming an abnormal client list by the clients of (1);
step 9, issuing an abnormal client list to the operation institution for verification and confirmation;
and step 10, calculating a model identification effect according to the feedback result of the operation institution.
Assuming that the issued abnormal client lists are accumulated for N total, M client hits are checked, the total abnormal clients found by the management organization are K total abnormal clients, wherein S abnormal clients are found in the abnormal client lists issued by the model, the calculation formulas of the model accuracy rate and the recall rate are as follows:
Figure BDA0003981469340000131
as shown in fig. 2, the system corresponding to the method comprises a data acquisition module, a data processing module, an algorithm module, a logic module and a display module;
the data acquisition module is used for acquiring bill customer information;
the algorithm module obtains corresponding index information based on the obtained bill client information and matched with a corresponding algorithm;
the logic module is used for carrying out logic judgment and screening and rejection on the index information;
the display module is used for displaying the index information after the judgment to the management institution.
As shown in fig. 3, the terminal device 6 may include: processor 601, storage medium 602, and bus 603, storage medium 602 storing machine-readable instructions executable by processor 601, when the terminal device is running, the processor 601 communicates with storage medium 602 via bus 603, and processor 601 executes the machine-readable instructions to perform the steps of the deep learning model training method as described in the previous embodiments. The specific implementation manner and the technical effect are similar, and are not repeated here.
For ease of illustration, only one processor is described in the above terminal device. It should be noted, however, that in some embodiments, the terminal device of the present invention may also include multiple processors, and thus, the steps performed by one processor described in the present invention may also be performed jointly by multiple processors or separately.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (9)

1. A method for identifying abnormal bill clients based on community discovery is characterized by comprising the following steps:
step 1, designing point-edge relation of bill clients, wherein the step is to analyze risk indexes related to the bill clients and design nodes and edges of bill close client patterns;
processing node and edge data, wherein the step is to process the point-edge design scheme in the step 1 into three node files, one attribute file and two edge files;
step 3, building a bill client knowledge graph, wherein the bill client knowledge graph is built in Neo4j by using the nodes, the attributes and the edge files processed in the step 2, and different nodes are marked by using different colors;
step 4, dividing communities by using a Louvain algorithm, wherein the Louvain algorithm is an algorithm for community discovery based on modularity;
step 5, dividing community groups according to the community risk probability, and setting three risk probability level thresholds p low ,p mid ,p high The risk probability is less than or equal to p low Is a community group with low risk probability, and the risk probability is larger than p low P is less than or equal to mid Is a risk-in-stroke probability community group, and the risk probability is greater than p mid P is less than or equal to high Is a community group with high risk probability;
step 6, calculating the risk score of each bill client, taking the risk score as one dimension in the client risk evaluation according to the community risk grade obtained in the step 5, and combining other risk indexes of the bill impression client and corresponding weights thereof to obtain the comprehensive risk score of the client;
step 7, calculating the risk level of each bill client, converting the risk score of each bill client into a score of 0-100 according to the mapping relation, and dividing the risk score into 10 risk level levels 1 ,level 2 ,…,level 10
Step 8, selecting clients with risk scores higher than a limit value to generate an abnormal client list, and selecting clients with risk scores higher than a set risk threshold level risk Forming an abnormal client list by the clients of (1);
step 9, issuing an abnormal client list to the operation institution for verification and confirmation;
and step 10, calculating a model identification effect according to the feedback result of the operation institution.
2. The method of claim 1, wherein the nodes in step 1 are three types, namely a bill client node, a correspondents node and a reconciliation IP node.
3. The method of claim 1, wherein the modularity is calculated as:
Figure FDA0003981469330000021
where m is the number of connections in the network, v and w are any two nodes in the network, A when there is a connection between them vw =1, otherwise a vw =0;k W Representing the degree of node w; delta (c) V ,c w ) For determining whether the nodes v and w are in the same community, if so V ,c w ) =1, otherwise δ (c V ,c w )=0;
The calculation formula of the modularity increment is as follows:
Figure FDA0003981469330000022
Figure FDA0003981469330000023
wherein, sigma in Is the number of edges in community c; sigma (sigma) tot Is the degree of the node within community c; k (k) i Is the degree of node i; k (k) i,in Is the sum of the number of connections between node i and nodes within community c.
4. The method according to claim 1, characterized in that the Louvain algorithm is divided into three phases, respectively:
stage 1, first make each sectionThe point belongs to a community c, n nodes and n communities exist in the network, and the module degree Q at the moment is calculated 0 Then let node i no longer belong to the community c in which it is located i Dividing node i and node j into communities, and calculating the modularity Q at the moment 1 Calculating module gain Δq=q 1 -Q 0 If delta Q >0, the node i should be divided into communities where j is located, otherwise the node i should not be divided into communities where j is located;
step 2, the communities divided in the step 1 are aggregated into a node, and the whole network is reconstructed;
and 3, when the modularity is no longer increased, the iteration is automatically stopped.
5. The method of claim 1, wherein the risk probability calculation formula for the community is as follows:
Figure FDA0003981469330000031
wherein c i Is community i, n risk For community c i The number of clients that have been marked as anomalous clients, n norisk For community c i The number of clients not marked as anomalous clients.
6. The method of claim 1, wherein the composite risk score is calculated as follows:
Figure FDA0003981469330000032
wherein r is 0 Is the community risk level, w 0 Representing the weight, r 1 ,r 2 ,…,r k Is other risk index of interest in bill business, w 1 ,w 2 ,…,w k Is the weight corresponding to these risk indicators.
7. The system for identifying abnormal bill clients based on community discovery is characterized by comprising a data acquisition module, a data processing module, an algorithm module, a logic module and a display module;
the data acquisition module is used for acquiring bill customer information;
the algorithm module obtains corresponding index information based on the obtained bill client information and matched with a corresponding algorithm;
the logic module is used for carrying out logic judgment and screening and rejection on the index information;
the display module is used for displaying the index information after the judgment to the management institution.
8. A terminal device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the terminal device is operating, the processor executing the machine-readable instructions to perform the steps of the method of any of claims 1 to 6 when executed.
9. A storage medium, characterized in that the storage medium has a computer program stored thereon,
the computer program is executed by a processor to perform the steps of the method according to any of claims 1 to 6.
CN202211549331.1A 2022-12-05 2022-12-05 Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium Pending CN116167865A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211549331.1A CN116167865A (en) 2022-12-05 2022-12-05 Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211549331.1A CN116167865A (en) 2022-12-05 2022-12-05 Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116167865A true CN116167865A (en) 2023-05-26

Family

ID=86410095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211549331.1A Pending CN116167865A (en) 2022-12-05 2022-12-05 Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116167865A (en)

Similar Documents

Publication Publication Date Title
Beraja et al. Data-intensive innovation and the state: Evidence from AI firms in China
CN111476660B (en) Intelligent wind control system and method based on data analysis
CN107832964A (en) Bank client relation loop analysis method and system
CN108596443A (en) A kind of Electricity customers method for evaluating credit rating based on multi-dimensional data
CN107103548A (en) The monitoring method and system and risk monitoring and control method and system of network behavior data
CN106776897A (en) A kind of user's portrait label determines method and device
WO2023082969A1 (en) Data feature combination pricing method and system based on shapley value and electronic device
Renigier-Biłozor et al. Forced sale discount on property market–How to assess it?
CN113989019A (en) Method, device, equipment and storage medium for identifying risks
CN112419030B (en) Method, system and equipment for evaluating financial fraud risk
CN112613977A (en) Personal credit loan admission credit granting method and system based on government affair data
CN112330342A (en) Method and system for optimally matching enterprise name and system user name
CN101226614A (en) Method for estimation of network assets essentiality
CN111506876A (en) Data prediction analysis method, system, equipment and readable storage medium
Arnaudo et al. The digital trasformation in the Italian banking sector
Pham et al. Innovation and bank efficiency in Vietnam and Pakistan
CN105427171A (en) Data processing method of Internet lending platform rating
CN117094764A (en) Bank integral processing method and device
CN116167865A (en) Community discovery-based bill abnormal customer identification method, system, terminal equipment and storage medium
CN115907840A (en) Transaction risk prediction method and device for transaction risk prediction
CN112199360A (en) Data processing method, device, equipment and medium
CN111460052A (en) Low-security fund supervision method and system based on supervised data correlation analysis
CN112700322B (en) Order sampling detection method, order sampling detection device, electronic equipment and storage medium
CN112116356B (en) Asset characteristic information processing method and device
KR102308098B1 (en) An apparatus and method for providing user interfaces of managing transaction information based on automatic matching between accounts receivables and deposit information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination