CN113989018A - Risk management method, risk management device, electronic equipment and medium - Google Patents

Risk management method, risk management device, electronic equipment and medium Download PDF

Info

Publication number
CN113989018A
CN113989018A CN202111244065.7A CN202111244065A CN113989018A CN 113989018 A CN113989018 A CN 113989018A CN 202111244065 A CN202111244065 A CN 202111244065A CN 113989018 A CN113989018 A CN 113989018A
Authority
CN
China
Prior art keywords
data
graph
knowledge
enterprise
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111244065.7A
Other languages
Chinese (zh)
Inventor
张珺珺
苏宗国
陈道斌
金阳
孟岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202111244065.7A priority Critical patent/CN113989018A/en
Publication of CN113989018A publication Critical patent/CN113989018A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a risk management method, a risk management device, an electronic device and a medium. The risk management method and the risk management device can be used in the technical field of big data. The risk management method comprises the following steps: acquiring enterprise data, wherein the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer greater than or equal to 1, and n is an integer greater than or equal to 1; constructing a knowledge graph according to enterprise data, wherein the party and event information are nodes of the knowledge graph, the event relationship is the side of the knowledge graph, and the event information is used for explaining the corresponding side; calculating reduction weight of n event information edges between two parties based on the knowledge graph constructed by each category data
Figure DDA0003318923360000011
Calculating an aggregation weight ω of m class-data edges between two parties based on the reduction weightsvu(ii) a And performing risk management on the enterprise according to the aggregation weight.

Description

Risk management method, risk management device, electronic equipment and medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a risk management method, apparatus, electronic device, and medium.
Background
At the present stage, the small micro financing market in China has huge potential, and according to data display of relevant departments, the market operation main body in China exceeds one hundred million, wherein the number of individual industrial and commercial businesses exceeds seven million, and a wide operation space is reserved. As an important support for economic development and social stability, small micro-enterprises play an indispensable role in promoting orderly talent flow, maintaining market vitality, promoting technological innovation and the like.
Compared with large and medium-sized enterprises, small and small enterprises are still in a weak position in market competition, and the stability and continuity of operation are difficult to guarantee due to financing problems. The bank loan is an important means for enterprise financing, and in order to relieve the operating pressure of small and micro enterprises and meet strong financing requirements, the small and micro loan business of banks is gradually expanded, and various credit products are produced. A concomitant problem is that the lower loan repayment capacity of small micro-enterprises may result in a high overdue repayment rate of loans.
The traditional financial wind control system mainly depends on various indexes in expert rules and a Basel protocol, small and micro enterprises have natural weakness in providing self information, and the non-transparent and internalized asymmetric data information of the small and micro enterprises causes that banks are difficult to control the substantive credit risk of the customers of the small and micro enterprises, and the management of credit products of the small and micro enterprises is more difficult than that of large enterprises. If a non-differential wind control model is adopted, the risks of most of small and micro enterprises cannot be reasonably predicted, and a brand-new data analysis processing method needs to be introduced to solve the problem of the small and micro credit business.
Disclosure of Invention
In view of the above, the present disclosure provides a method, an apparatus, an electronic device, and a computer-readable storage medium for risk management based on a knowledge graph, which have good risk prediction effect and are convenient to use.
One aspect of the present disclosure provides a method for risk management based on a knowledge graph, comprising: acquiring enterprise data, wherein the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer greater than or equal to 1, and n is an integer greater than or equal to 1; constructing a knowledge graph according to the enterprise data, wherein the parties and the event information are nodes of the knowledge graph, the event relation is an edge of the knowledge graph, and the event information is used for explaining the corresponding edge; calculating reduction weights of n event information edges between the two parties based on the knowledge graph constructed for each of the category data
Figure BDA0003318923340000023
Calculating an aggregation weight ω of m class-data edges between the two parties according to the reduction weightvu(ii) a And carrying out risk management on the enterprises according to the aggregation weight.
According to the risk management method based on the knowledge graph, the knowledge graph is established, the aggregation weight of the edges in the knowledge graph is calculated, risk management can be performed on an enterprise, for example, by means of label propagation of the knowledge graph, the influence of the risk of a certain client loan duration on related clients such as upstream and downstream of the client loan duration can be considered, and therefore the general financial service capability of small and micro enterprises is improved. According to the method and the device, the event information is used as the nodes of the knowledge graph to explain the corresponding edges, so that the edges do not need to be accompanied by attributes, the redundancy of the edges can be reduced, and the reaction of the knowledge graph is quicker when the knowledge graph is used.
In some embodiments, the reduction weight is calculated as follows:
Figure BDA0003318923340000021
wherein v represents a head node in two of the parties, u represents a tail node in two of the parties, and E (v, u) represents n of the events with a head node of v and a tail node of uA set of information edges, E (v) a set of directed edges with a head node of v,
Figure BDA0003318923340000022
indicating the initial weight of the edge l' at the corresponding event occurrence time, t indicating the current time,
Figure BDA0003318923340000031
the temporal weight of the edge l' is represented,
Figure BDA0003318923340000032
indicating the initial weight of the edge/at the time of the corresponding event occurrence,
Figure BDA0003318923340000033
represents the temporal weight of the edge l; and
the calculation method of the aggregation weight is as follows:
Figure BDA0003318923340000034
wherein R (v, u) represents a set of m said class data edges with a head node v and a tail node u, CrRepresenting constant coefficients corresponding to different ones of the category data.
In some embodiments, the m said category data comprises at least one of flow data, investment data, warranty data and corporate-associated data.
In some embodiments, the n pieces of event information include event information under the same category data for n time periods.
In some embodiments, before said building a knowledge-graph from said enterprise data, further comprises cleansing said enterprise data, wherein cleansing said enterprise data comprises: one of data deduplication, feature in data completion, and anomalous feature processing in data.
In some embodiments, said building a knowledge-graph from said enterprise data comprises: constructing a schema layer from the party and the m category data, wherein the schema layer comprises nodes established from the party and m categories of the m category data, and edges between the nodes established from events in each of the category data; and importing the data in the category data into the corresponding mode layer.
In some embodiments, the category data is structured data, and importing the data in the category data into a corresponding schema layer includes: converting the category data into resource description framework data; and importing the resource description framework data into a corresponding mode layer.
In some embodiments, said risk managing the enterprise according to the aggregation weight comprises: performing label propagation on the nodes of the knowledge graph according to the aggregation weight; carrying out community division according to the label propagation result to obtain the community scale; calculating the degree centrality of the nodes of the knowledge graph according to the community scale; and performing risk prediction by taking the centrality as an input of a risk prediction model.
In some embodiments, said risk managing the enterprise according to the aggregation weight comprises: performing label propagation on the nodes of the knowledge graph according to the aggregation weight; and predicting the risk according to the label propagation result.
In some embodiments, the method of knowledge-graph-based risk management further comprises visualizing the knowledge-graph, wherein the visualizing the knowledge-graph comprises: at least one of node retrieval based on the knowledge graph, sub-graph walking based on the knowledge graph, path exploration based on the knowledge graph, and self-loop exploration based on the knowledge graph.
In some embodiments, the node retrieving comprises: responding to the input node name, and displaying a node related to the node name and associated information of the node; the subgraph wandering comprises: responding to the operation of manually clicking the event relation, and displaying a node corresponding to the clicked event relation and all edges sent by the node; the path exploration comprises the following steps: acquiring and displaying a path relation between two nodes; and the self-loop exploration comprises: and acquiring and displaying the nodes with the closed-loop path relation.
Another aspect of the present disclosure provides a knowledge-graph based risk management apparatus comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring enterprise data, the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer which is greater than or equal to 1, and n is an integer which is greater than or equal to 1; a construction module, configured to construct a knowledge graph according to the enterprise data, where the party and the event information are nodes of the knowledge graph, the event relationship is an edge of the knowledge graph, and the event information is used to explain the corresponding edge; a first calculation module for calculating reduction weights of n event information edges between the two parties based on the knowledge graph constructed for each of the category data
Figure BDA0003318923340000041
A second calculation module for calculating an aggregation weight ω of m class-data edges between the two parties according to the reduction weightvu(ii) a And the management module is used for carrying out risk management on the enterprises according to the aggregation weight.
Another aspect of the present disclosure provides an electronic device comprising one or more processors and one or more memories, wherein the memories are configured to store executable instructions that, when executed by the processors, implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an exemplary system architecture to which the methods, apparatus, and methods may be applied, in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a knowledge-graph based risk management method according to an embodiment of the present disclosure;
FIG. 3 schematically shows a schematic diagram of a knowledge-graph according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram for building a knowledge-graph from enterprise data, in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram for importing data in category data into a corresponding schema layer according to an embodiment of the disclosure;
FIG. 6 schematically illustrates a flow diagram for risk management of an enterprise according to aggregation weights, according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow diagram for risk management of an enterprise according to aggregation weights, according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow diagram for visualizing a knowledge-graph according to an embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow diagram for visualizing a knowledge-graph according to an embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow diagram for visualizing a knowledge-graph according to an embodiment of the present disclosure;
FIG. 11 schematically illustrates a flow diagram for visualizing a knowledge-graph according to an embodiment of the present disclosure;
FIG. 12 schematically illustrates a block diagram of a knowledge-graph based risk management apparatus according to an embodiment of the present disclosure;
FIG. 13 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure. In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features.
At the present stage, the small micro financing market in China has huge potential, and according to data display of relevant departments, the market operation main body in China exceeds one hundred million, wherein the number of individual industrial and commercial businesses exceeds seven million, and a wide operation space is reserved. As an important support for economic development and social stability, small micro-enterprises play an indispensable role in promoting orderly talent flow, maintaining market vitality, promoting technological innovation and the like.
Compared with large and medium-sized enterprises, small and small enterprises are still in a weak position in market competition, and the stability and continuity of operation are difficult to guarantee due to financing problems. The bank loan is an important means for enterprise financing, and in order to relieve the operating pressure of small and micro enterprises and meet strong financing requirements, the small and micro loan business of banks is gradually expanded, and various credit products are produced. A concomitant problem is that the lower loan repayment capacity of small micro-enterprises may result in a high overdue repayment rate of loans.
The traditional financial wind control system mainly depends on various indexes in expert rules and a Basel protocol, small and micro enterprises have natural weakness in providing self information, and the non-transparent and internalized asymmetric data information of the small and micro enterprises causes that banks are difficult to control the substantive credit risk of the customers of the small and micro enterprises, and the management of credit products of the small and micro enterprises is more difficult than that of large enterprises. If a non-differential wind control model is adopted, the risks of most of small and micro enterprises cannot be reasonably predicted, and a brand-new data analysis processing method needs to be introduced to solve the problem of the small and micro credit business.
At present, most of the existing life management models still have certain limitations: modeling is only carried out through the data characteristics of the small and micro enterprise customers, and most of data in the financial industry relates to incidence relation data of a plurality of customers, so that data information is not fully utilized; meanwhile, the influence of the risk of the loan duration of a certain client on related clients such as upstream and downstream clients cannot be considered, an important angle of loan duration management is ignored, and the ability of the small and micro enterprises for promoting financial services is influenced.
Embodiments of the present disclosure provide a risk management method based on a knowledge graph, a risk management apparatus, an electronic device, a computer-readable storage medium, and a computer program. The risk management method based on the knowledge graph comprises the following steps: acquiring enterprise data, wherein the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer greater than or equal to 1, and n is an integer greater than or equal to 1; constructing a knowledge graph according to enterprise data, wherein the party and event information are nodes of the knowledge graph, the event relationship is the side of the knowledge graph, and the event information is used for explaining the corresponding side; knowledge graph constructed based on each category dataCalculating reduction weights for n event information edges between two parties
Figure BDA0003318923340000081
Calculating an aggregation weight ω of m class-data edges between two parties based on the reduction weightsvu(ii) a And performing risk management on the enterprise according to the aggregation weight.
It should be noted that the knowledge-graph-based risk management method, risk management apparatus, electronic device, computer-readable storage medium, and computer program of the present disclosure may be used in the field of big data, and may also be used in any field other than the field of big data, such as the financial field, and the field of the present disclosure is not limited herein.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which the knowledgegraph-based risk management method, risk management apparatus, electronic device, computer-readable storage medium, and computer program may be applied, according to embodiments of the disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the risk management method based on knowledge graph provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the risk management devices provided by embodiments of the present disclosure may be generally disposed in the server 105. The risk management method based on knowledge graph provided by the embodiments of the present disclosure may also be performed by a server or a cluster of servers different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the risk management apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The knowledge-graph-based risk management method according to the embodiment of the present disclosure will be described in detail with reference to fig. 2 to 11 based on the scenario described in fig. 1.
FIG. 2 schematically shows a flow diagram of a knowledge-graph based risk management method according to an embodiment of the present disclosure.
As shown in fig. 2, the method for risk management based on a knowledge graph of this embodiment includes operations S210 to S250.
In operation S210, enterprise data is acquired, wherein the enterprise data includes a principal and an event relationship between the principal, the event relationship includes m category data, each category data includes n event information, where m is an integer greater than or equal to 1, and n is an integer greater than or equal to 1.
It should be noted that the parties may all be enterprises, the parties may all be individuals, and the parties may also be enterprises and individuals. Here, an event relationship may exist between a business and a business, an event relationship may exist between an individual and an event relationship may exist between a business and an individual.
Wherein the event relationship comprises m category data, as a possible implementation, the m category data comprising at least one of capital stream data, investment data, warranty data, and corporate-related data.
It is to be understood that the fund flow data may be fund flow transaction data between parties, that is, between enterprises, between individuals, between enterprises, or between enterprises and individuals.
Investment data may be understood as investment relationship data between parties, i.e. may be investment relationships between enterprises, may be investment relationships between individuals, may be investment relationships between enterprises and between individuals.
The guarantee data can be understood as guarantee relationship data between parties, that is, guarantee relationship between enterprises, guarantee relationship between individuals, and guarantee relationship between enterprises and individuals.
Corporate-enterprise association data may be understood as association data between an enterprise and individuals, such as legal representatives of an enterprise and the enterprise, a second principal, insurance legal beneficiaries, financial supervisors, stakeholders, general managers, corporate contacts, directors, and other principals.
As a possible implementation manner, the n event information pieces include event information pieces under the same category data of n time periods. For example, under the category data of the capital flow data, an event occurring in each time period is classified as one event information, further, for example, setting the time period as one quarter, there may be 4 time periods in one year, and there are 4 event information under the capital flow data between the parties. Each event information may include, but is not limited to, the amount of funds that flow out of the party, the amount of funds that are transacted, the time of the transaction, and the amount of funds that flow into the party.
As another example, if an event occurring in each time period is classified as one event information under the category data of the investment data, and further, for example, if the time period is set to a quarter and there may be 4 time periods in a year, there are 4 event information under the investment data between the parties. Each event information may include, but is not limited to, the investing party, the amount of the investment, the time of the investment occurrence, and the funded party.
For another example, if an event occurring in each time period is classified as one event information under the category data of the collateral data, and further, for example, if the time period is set to a quarter and there may be 4 time periods in a year, there are 4 event information under the collateral data between the parties. Each event information may include, but is not limited to, the principal of the wager, the amount of the wager, the time of the wager, and the principal of the wager.
For another example, under the category data of the human-enterprise association data, events occurring in each time period are classified as one event information, and the association relationships established between an enterprise and legal representatives, second responsible persons, insurance legal beneficiaries, financial supervisors, stakeholders, general managers, unit contacts, presidents and other responsible persons of the enterprise in each time period are respectively one event information. Further for example, setting the time period to be one quarter, there may be 4 time periods in a year, and the enterprise has 4 event information with the legal representative of the enterprise, the second responsible person, the insurance legal beneficiary, the financial supervisor, the shareholder, the general manager, the unit contact, the director, and other responsible persons, respectively. Each event information may include, but is not limited to, business parties, individual job positions, start job hours, and end job hours.
Of course, the time period is not limited to be divided into quarters, and the time period may be divided into years, months, days or any time intervals, and the time period is not limited too much.
In operation S220, a knowledge graph is constructed according to the enterprise data, wherein the party and the event information are nodes of the knowledge graph, the event relationship is an edge of the knowledge graph, and the event information is used to explain the corresponding edge.
For example, in conjunction with fig. 3, there is a relationship between the fund flow transaction and the investment between enterprise a and enterprise B, wherein enterprise a is the fund flow, the amount of the transaction fund is 5 ten thousand, and the transaction time is t1Enterprise B is the fund inflow party; the enterprise A is an investor, the investment amount is 100 ten thousand, and the transaction time is t2Enterprise B is the sponsor.
The enterprise A and the enterprise C have a guarantee relationship, wherein the enterprise A is a guarantee party, the guarantee amount is 50 thousands, and the guarantee time is t3Enterprise C is the insured party; a person-enterprise association relationship exists between the enterprise A and the individual a, wherein the individual a is the shareholder of the enterprise A, and the initial holding time is t4The time of ending the dutchman is t5(ii) a The enterprise D and the enterprise B have a guarantee relationship, wherein the enterprise B is a guarantee party, the guarantee amount is 40 thousands, and the guarantee time is t6Enterprise D is the insured party.
A fund flow transaction exists between the enterprise D and the individual a, wherein the enterprise D is a fund flow-out party, the amount of the transaction fund is 7 thousands, and the transaction time is t7Person a is the money influx; a person-enterprise association relationship exists between the enterprise C and the individual b, the individual b is a financial supervisor of the enterprise C, and the starting time of the person-enterprise association relationship is t8The time of ending the dutchman is t9(ii) a The enterprise C and the person C have a human-enterprise association relationship, and the person C is the director of the enterprise C, wherein the initial holding time is t10The time of ending the dutchman is t11
Based on the enterprise data, the enterprise A, the enterprise B, the enterprise C, the enterprise D, the person a, the person B and the person C can be used as nodes of a knowledge graph, and all the nodes are connected by edges according to event relations; the event information' enterprise A is a fund flow-out party, the transaction fund amount is 5 ten thousand, and the transaction time is t1Enterprise B is the fund inflow party, enterprise A is the investor, the investment amount is 100 ten thousand, and the transaction time is t2The enterprise B is the receiver, the enterprise A is the guarantor, the guarantor amount is 50 thousands, the guarantor time is t3Enterprise C is the insured party and individual a is the shareholder of enterprise a, where the time of job initiation is t4The time of ending the dutchman is t5"," enterprise B is the guarantor, the guarantor amount is 40 ten thousands, the guarantor time is t6Enterprise D is the insured party, enterprise D is the fund flow party, the transaction fund amount is 7 ten thousand, and the transaction time is t7Person a is the fund inflow and person b is the finance director of enterprise C, wherein the time of the start of the job is t8The time of ending the dutchman is t9"and" person C is the director of Business C, with a time of first employment t10The time of ending the dutchman is t11"as nodes of the knowledge-graph, for interpreting the corresponding edges.
As a possible implementation manner, as shown in FIG. 4, the operation S220 of building a knowledge graph according to enterprise data includes operations S221-S222.
In operation S221, a schema layer is constructed according to the party and the m category data, wherein the schema layer includes nodes established according to the party and m categories of the m category data, and edges between the nodes established according to events in each category data.
In operation S222, data in the category data is imported into the corresponding mode layer.
The mode layer may be understood as a structural layer of the knowledge graph, and after the structure of the knowledge graph is built through operation S211, specific data in the category data may be imported into the corresponding mode layer, so that the knowledge graph may be conveniently built through operation S221 and operation S222.
Further, as shown in fig. 5, the category data is structured data, and the operation S222 of importing the data in the category data into the corresponding mode layer includes operations S2221 to S2222.
In operation S2221, the category data is converted into resource description framework data.
In operation S2222, the resource description framework data is imported into the corresponding schema layer.
Through the operation S2221 to the operation S2222, the data in the category data can be conveniently imported into the corresponding mode layer, so that the knowledge graph can be conveniently constructed.
In operation S230, reduction weights of n event information edges between two parties are calculated based on the knowledge graph constructed for each category data
Figure BDA0003318923340000141
As an implementable manner, the reduction weight is calculated as follows:
Figure BDA0003318923340000142
where v represents the head node of the two parties, u represents the tail node of the two parties, E (v, u) represents the set of n event information edges with the head node v and the tail node u, E (v) represents the set of directed edges with the head node v,
Figure BDA0003318923340000143
indicating the initial weight of the edge l' at the corresponding event occurrence time, t indicating the current time,
Figure BDA0003318923340000144
the temporal weight of the edge l' is represented,
Figure BDA0003318923340000145
indicating the initial weight of the edge/at the time of the corresponding event occurrence,
Figure BDA0003318923340000146
representing the temporal weight of the edge l.
In operation S240, an aggregation weight ω of m class-data edges between two parties is calculated according to the reduction weightvu. As a practical way, the calculation method of the aggregation weight is as follows:
Figure BDA0003318923340000147
wherein R (v, u) represents a headSet of m class data edges with node v and end node u, CrRepresenting constant coefficients corresponding to different category data.
In operation S250, risk management is performed on the enterprise according to the aggregation weight.
As one possible implementation manner, as shown in fig. 6, the operation S250 of risk management on the enterprise according to the aggregation weight includes operations S251 to S254.
In operation S251, label propagation is performed on nodes of the knowledge-graph according to the aggregation weights. For example, a target node may be set, the target node is used as a center, a plurality of neighbor nodes are connected to the target node, the connection edge of the target node and each neighbor node corresponds to an aggregation weight, and a label of which neighbor node is propagated by the target node may be determined according to the aggregation weight.
For further example, the target node is a, the target node a has a neighboring node B, a neighboring node C and a neighboring node D, the label of the target node a is label 1, the labels of the neighboring node B and the neighboring node C are label 2, and the label of the neighboring node D is label 3. Determining whether the new label of the target node A is the label 2 or the label 3 according to the aggregation weight, wherein if the target node A transmits the label 2, the new label of the target node A is the label 2; if the target node a propagates label 3, the new label of the target node a is label 3.
In operation S252, community division is performed according to the tag propagation result, and a community scale is obtained. It can be understood that after the propagation of the labels is performed on the knowledge graph, new labels of the nodes can be obtained, and the nodes can be divided into communities according to the new labels, for example, the node with the label 1 is divided into the community 1, the node with the label 2 is divided into the community 2, the node with the label 3 is divided into the community 3, and the number of the nodes and the edges of each community can be understood as the scale of the community.
In operation S253, according to the community scale, the centrality of the nodes of the knowledge-graph is calculated.
In operation S254, risk prediction is performed using the centrality as an input to a risk prediction model. Therefore, the risk of each node can be conveniently predicted by the risk prediction model by inputting the centrality into the risk prediction model, for example, which node is an overdue client, which node is a non-overdue client and which node is an upcoming overdue client.
As another possible implementation manner, as shown in fig. 7, the operation S250 of performing risk management on the enterprise according to the aggregation weight includes operations S255 to S256.
In operation S255, label propagation is performed on nodes of the knowledge-graph according to the aggregation weights. For example, a target node may be set, the target node is used as a center, a plurality of neighbor nodes are connected to the target node, the connection edge of the target node and each neighbor node corresponds to an aggregation weight, and a label of which neighbor node is propagated by the target node may be determined according to the aggregation weight.
For further example, the target node is a, the target node a has a neighboring node B, has a neighboring node C and a neighboring node D, the label of the target node a is a non-risk client, the labels of the neighboring node B and the neighboring node C are non-risk clients, and the label of the neighboring node D is a risk client. Determining whether the new label of the target node A propagates the risk client or the non-risk client according to the aggregation weight, wherein if the target node A propagates the risk client, the new label of the target node A is the risk client; and if the target node A transmits the non-risk client, the new label of the target node A is the non-risk client.
In operation S256, risk prediction is performed according to the result of tag propagation. It can be understood that after the label propagation is performed on the knowledge graph, a new label of each node can be obtained, and a prediction result of whether the node is a risk client or a non-risk client can be obtained according to the new label.
According to the risk management method based on the knowledge graph, the knowledge graph is established, the aggregation weight of the edges in the knowledge graph is calculated, risk management can be performed on an enterprise, for example, by means of label propagation of the knowledge graph, the influence of the risk of a certain client loan duration on related clients such as upstream and downstream of the client loan duration can be considered, and therefore the general financial service capability of small and micro enterprises is improved. According to the method and the device, the event information is used as the nodes of the knowledge graph to explain the corresponding edges, so that the edges do not need to be accompanied by attributes, the redundancy of the edges can be reduced, and the reaction of the knowledge graph is quicker when the knowledge graph is used.
In some embodiments of the present disclosure, in conjunction with fig. 2, before the operation S220 of building a knowledge graph from enterprise data, an operation S001 is further included.
In operation S001, cleansing enterprise data, wherein cleansing enterprise data comprises: one of data deduplication, feature in data completion, and anomalous feature processing in data. In some specific examples, the data deduplication may be understood as removing duplicate data in the data, for example, the enterprise a included in the enterprise data is a guarantor with a guarantee amount of 200 ten thousand, the enterprise B is an insured person and the enterprise B is an insured person, and the receiving enterprise a guarantees that 200 ten thousand of the guarantee amount is the duplicate data, and one of the duplicate data may be deleted in the data deduplication operation.
In some specific examples, the feature filling in the data may be understood as having a missing feature in the data, and the missing feature may be filled, for example, if enterprise a is a guarantor and the guarantee amount is null, and if enterprise B is an insured person, the guarantee amount is the missing feature, and the feature may be filled.
In some specific examples, the abnormal feature processing in the data may be understood as a feature with an abnormality in the data, and the abnormal feature may be replaced or deleted, for example, if the enterprise a included in the enterprise data is a guarantor and the guaranty amount is a negative number, and if the enterprise B is an insured person, the guaranty amount is an abnormal feature, and the feature may be replaced or deleted.
The enterprise data which is more standard can be obtained by cleaning the enterprise data and used as basic data for establishing the knowledge graph, so that the established knowledge graph is more accurate.
In some embodiments of the present disclosure, as shown in fig. 2, the method of knowledge-graph-based risk management further includes operation S260.
In operation S260, a knowledge-graph is visualized, wherein, as shown in fig. 8-11, operation S260 the visualization knowledge-graph includes at least one of operation S261 to operation S264.
In operation S261, node retrieval is performed based on the knowledge-graph. In some examples, the node retrieval may include: in response to the input node name, the node related to the node name and the associated information of the node are presented, for example, with the target node a as an input point, all the neighbor nodes connected to the target node a, the neighbor nodes of the neighbor nodes, the edges between the target node a and each neighbor node, the edges between the neighbor nodes and the neighbor nodes, and the event information node for explaining the corresponding edges may be presented. Therefore, the position and the importance of the target node A in the whole knowledge graph can be known.
In operation S262, a subgraph walk is performed based on the knowledge-graph. In some examples, the subgraph walk may include: and responding to the operation of manually clicking the event relation, and displaying the node corresponding to the clicked event relation and all edges sent by the node. It is understood that when a user wants to view nodes and edges related to an event relationship, the event relationship can be clicked based on the knowledge graph, and then the node corresponding to the clicked event relationship and all the edges issued by the node can be displayed.
In operation S263, a path exploration is performed based on the knowledge-graph. In some examples, path exploration includes: and acquiring and displaying the path relation between the two nodes, wherein the path relation between the two nodes can be acquired and displayed by clicking the two nodes based on the knowledge graph.
In operation S264, at least one of the self-loop explorations is performed based on the knowledge-graph. In some examples, the self-loop exploration includes: and acquiring and displaying the nodes with the closed-loop path relation. For example, the node A invests in the node B, the node B guarantees the node C, the node D is an intelligibility manager of the node C, and the fund of the node D flows into the node A, so that the closed loop from the node A to the node A exists in the knowledge graph, and the path relation and all the nodes between the node A and the node A can be acquired and displayed through the visualized knowledge graph.
The usage of the knowledge graph is diversified through operations S261 to S264, and the usability and the practicability of the knowledge graph can be increased.
The knowledge-graph based risk management method according to an embodiment of the present disclosure is described in detail below. It is to be understood that the following description is illustrative only and is not intended to be in any way limiting of the present disclosure.
According to the embodiment of the disclosure, the method mainly comprises the steps of knowledge graph construction and knowledge graph application, wherein the knowledge graph construction mainly comprises knowledge acquisition, knowledge modeling, knowledge extraction, knowledge storage and knowledge visualization.
Specifically, the knowledge acquisition is to screen out structured data supporting knowledge graph construction requirements from the inline data; the knowledge modeling means that an ontology model is constructed by using domain general knowledge and domain expert experience and is used as a mode layer of a knowledge graph; the knowledge extraction is carried out according to the structure of the ontology model, and the structured data in the knowledge acquisition link is converted into RDF format data (resource description framework data), so that the construction of a knowledge map data layer is completed; the knowledge storage is to store the constructed knowledge map into a map database facing the data in the RDF format; knowledge visualization accesses the graph database through http service to realize knowledge graph visualization and present to credit business personnel.
In knowledge acquisition, a target client of the present disclosure is a small micro enterprise that successfully uses an over-business fast loan product, and only a sample of the loan date (due date) from 5 months in 2018 to 4 months in 2019 is taken in the time range. The length of the sliding time period is set to 12 months, that is, the observation period of each sample is 12 months, so that the time of 12 months before the expiration date of each small micro-business is taken as the time window.
Considering that small micro-enterprises are less in mutual connection and sparse in connection, risk propagation paths among the small micro-enterprises cannot be comprehensively calculated; in addition, in actual business, small and micro enterprises need to be placed in the whole financial business network for modeling, and not only the associated information among the small and micro enterprises is concerned, so that the associated non-small and micro enterprise loan clients are expanded by taking the target small and micro enterprise clients as the center and taking the capital flow relationship, the investment relationship, the guarantee relationship and the enterprise relationship as the paths, and specific expansion rules are as follows.
(1) The relationship of fund flow: and taking the target customer as a center, and taking all fund flow transaction data of the target customer in a corresponding time window, wherein due to the huge transaction quantity, the transaction data of each small and micro enterprise are summarized according to months, namely, the transaction amounts of the same transaction opponents are summed and a new transaction record is generated.
(2) The investment relation is as follows: and taking all investment data existing in the target client in a corresponding time window as the center.
(3) The guarantee relationship is as follows: and taking the target client as a center, taking all the guarantee ring data of the target client in the corresponding time window, and splitting the guarantee ring, namely generating a new guarantee data record by the target client and any other client in the guarantee ring.
(4) The relation of people and enterprises: and taking the target client as the center, and taking all the person-enterprise associated data which exist in the corresponding time window, wherein the person-enterprise associated data comprises a second responsible person, a legal representative, an insurance legal beneficiary, a financial supervisor, a shareholder, a general manager, a unit contact person, a president and other responsible persons of the target client. And for the associated individual customers, all the fund flow transaction data in the corresponding time window are also taken, and the processing mode is the same as that of the fund flow relation data.
After the client expansion is completed, data cleaning work is needed, and corresponding adjustment is mainly performed according to different characteristics of different data, including recording duplication removal, characteristic vacancy filling, exception handling and the like.
(1) Recording and removing the weight: in the original record of the fund flow transaction data, if both transaction parties are inline customers, the same transaction will be recorded twice, and the difference of the two records lies in the difference of loan directions, so that the loan directions of all records need to be unified, and repeated records are removed.
(2) Characteristic filling: firstly, the reasons of feature missing need to be analyzed, and how to process missing features is determined according to different reasons, and three processing modes of deleting samples, deleting feature variables and filling missing values are basically provided. Specifically, the non-my customers do not have ID primary keys in the data, and due to the fact that the number of related samples is large, deletion processing is not suitable, and missing value filling is conducted in a mode that MD5 encrypted customer accounts are used for replacing the ID primary keys.
(3) Exception handling: the determination of the abnormal value of the data is firstly determined by combining the results of the analysis of the business and exploratory data, secondly determined by some methods commonly used in statistics, and then the influence of the abnormal value on the model training is eliminated by correspondingly processing the abnormal value. For example, the transaction amount in the fund flow transaction data should be a positive number, and the fund flow transaction data with a negative transaction amount is deleted.
In knowledge modeling, a knowledge graph is mainly divided into a data layer and a mode layer on the basis of a logic structure, the data layer contains a large amount of fact information, namely (entities, relations and entities) or (entities, attributes and attribute values) and other triple representation forms, and the data are stored in a graph database to form a large-scale entity relation network so as to form the knowledge graph. The mode layer is the core of the knowledge graph, is built on the data layer, and stores the refined knowledge. An ontology model is typically employed to manage schema layers, i.e., the supporting capabilities of the ontology model for axioms, rules, and constraints are used to specify relationships between entities and objects such as types and properties of entities.
The method adopts a top-down mode to construct the financial knowledge graph of the small and micro enterprise, firstly, an ontology is defined for the knowledge graph, or the ontology is called as a data mode, and then a data layer of the knowledge graph is generated. In the process of defining the body, firstly, the concept at the top layer is started, and then, the thinning is carried out step by step to form a hierarchical structure with a good structure; after the ontology is defined, entity data are added into the concepts of the ontology one by one, and the knowledge graph is automatically generated by the bottom-layer entity data according to the incidence relation among the concepts of the ontology.
According to the method, relevant knowledge in the financial field is deeply mined, a data set in a knowledge acquisition stage is integrally investigated, and a financial ontology model of a small and micro enterprise is constructed by analyzing semantic association between concepts and attributes in the field. The ontology model comprises five types of entity nodes of enterprises and/or personal clients, enterprise-related events, fund transaction events, guarantee events and investment events, and four types of association relations of enterprises, individuals, fund inflow-outflow, guarantee-guaranteed and investment-invested, wherein each type of entity node has the following attributes.
(1) Enterprise/individual customer: business ID primary key, business name, and business category.
(2) The method comprises the following steps of (1) associating events by a person and an enterprise: the job title, job start time and job end time of the individual.
(3) A funds transaction event: a transaction event ID primary key, a transaction event amount, and a transaction event.
(4) A guarantee event: the warranty circle ID primary key and the warranty circle establishment time.
(5) An investment event: an investment event ID primary key, an investment event amount, an investment event start time, and an investment event end time.
The present disclosure instantiates the associative relationships between different customers as entity nodes, rather than simply a relationship-to-edge, such as instantiating a funds transaction event node instead of a funds transaction-to-edge. The reason is that, in the financial knowledge map of the small micro-enterprise, the relationship links and contains more information, such as transaction amount and transaction time, credit workers in an actual business scene pay attention to such attribute information, and a great deal of key information is lost by using only one edge in the map to represent the association relationship, so that four types of association relationship events are instantiated in the financial knowledge map of the small micro-enterprise, so that the attributes of the association relationship can be displayed more comprehensively and flatly.
At present, an RDF (Resource Description Framework) model is generally adopted in a knowledge graph data layer to represent data. RDF is a standard data model formulated by the W3C world Wide Web Consortium for representing and exchanging machine understandable information using Web identifiers (URIs) to identify resources and attributes and attribute values to describe resources. In the RDF graph, each resource has one (HTTP URI) as its unique address. RDF graphs are defined as a finite set of triples (s, p, o); each triplet representation is a fact statement sentence, where s is subject, p is predicate, and o is object; (s, p, o) indicates that s has a connection p with o, or indicates that s has an attribute p and its value is o. Triples in RDF are sometimes referred to as a statement (statement), which we also refer to as a piece of knowledge in the knowledge graph.
RDF-oriented databases are a way to extract knowledge from relational databases, where database names are mapped directly to classes in RDF, fields are mapped to attributes of the classes, and the relationships between different classes can be derived from tables representing relationships.
In the knowledge storage, a core problem for the data management of the knowledge graph in the RDF format is how to effectively store RDF data sets and quickly answer SPARQL queries, and generally, two completely different ideas exist. One is that we can use the existing mature database management system (e.g. relational database system) to store the data of the knowledge map, convert the SPARQL query facing the RDF knowledge map into a query facing such mature database management system, e.g. SQL query facing the relational database, and answer the query using the existing relational database product or related technology; and secondly, a Native knowledge graph data storage and query system (Native RDF, namely a graph database system) facing the RDF knowledge graph data is directly developed, and optimization is performed from the bottom layer of the database system in consideration of the characteristic of RDF knowledge graph management.
The present disclosure chooses a RDF graph-oriented native knowledge graph storage management scheme because the traditional relational database is overwhelmed when a large number of relationships need to be described, which can afford a situation where there are many more entities but the relationships between the entities are somewhat simpler. For the small micro-enterprise that the relationship between the entities is very complex, data is often required to be recorded in the relationship, and most operations on the data are related to the relationship, the RDF graph database is a more reasonable choice. The method not only can improve the running performance of people, but also can greatly improve the system development efficiency and reduce the maintenance cost.
In knowledge visualization, a knowledge graph platform of the small and micro enterprise can be set up, a man-machine interaction interface is provided for credit personnel, and the credit personnel is assisted to carry out dynamic management on the life cycle of the small and micro enterprise customer. The platform can realize the visualization of the knowledge graph of the small micro enterprise, and four main functions of node retrieval, subgraph wandering, path exploration and self-loop exploration based on the graph, and are described in detail as follows.
(1) And node retrieval: the node retrieval is the most basic and most core function of the knowledge graph of the small micro enterprise, and for a given target node name, the platform can realize quick response and display the target node and related information in a visual interface. The visual interface displays the attribute information of the target node and the neighbor nodes within a certain hop count on the map, namely the associated entities of the target entity. Meanwhile, the knowledge graph can realize deep association query operation, so credit workers can obtain the structure of the whole relationship network where the target node is located, such as a complete upstream and downstream enterprise chain where the target node is located, a complete guarantee circle where the target node is located, and the like, and therefore the position and the importance of the target node in the whole relationship network are known.
(2) And (3) sub-graph wandering: in the knowledge graph of the small micro enterprise, the number and the types of edges of each node are different, and correspondingly, the number and the types of neighbor nodes are different greatly, so that a certain fixed mode does not exist to completely describe the characteristics of the nodes, and a credit worker can acquire node association information in a targeted manner by a heuristic walking method on a node neighborhood subgraph. The subgraph walking mode is derived from a random walking algorithm based on a graph, the random walking is a method for extracting structural features of the graph, in brief, the random walking algorithm constructs a plurality of random walkers (random walker), each random walker is initialized from a certain node, and then in each step of random walking, a certain adjacent node of the current node is randomly accessed.
The greatest difference between the subgraph walking on the knowledge graph of the small micro-enterprise and the random walking is that the nodes visited next step can be selected in a manual mode instead of a random mode, for example, the guarantee relationship among enterprises is important but sparse data in the characteristics of the small micro-enterprise, and for a target small micro-enterprise node, if the target small micro-enterprise node has a plurality of continuous edges of the guarantee relationship on the knowledge graph, a credit worker can mainly select to walk along the continuous edges of the guarantee relationship; if the nodes do not have the guarantee relationship, other relationships can be selected for sub-graph walking, and comprehensive characteristic images of different enterprises are obtained in a targeted mode.
(3) Path exploration: for two given target node names, the small micro enterprise knowledge graph platform can explore whether a path on a graph exists between the two nodes, and credit personnel can check whether a small micro enterprise customer has a path with an intra-industry lead enterprise or a listed enterprise and whether a path with an intra-industry loan default enterprise exists or not, so that reference is provided for risk assessment. In addition, there may not be a direct association between two enterprises, that is, there is no connecting edge between two corresponding nodes in the graph, but there may be various relationships via intermediate nodes, such as a competitive relationship, a stock-control relationship, and a financing relationship, and a loan officer may obtain the indirectly associated relationship between the enterprises through the analysis of the path between the two enterprises.
(4) Self-loop exploration: for a given target node name, the small micro-enterprise knowledge-graph platform may retrieve whether it has one or more self-loop paths in the graph, and if so, present the one or more loops to credit personnel. The self-loop on the financial knowledge map of the small and micro enterprise means that the small and micro client may have a risk of loan transaction, namely, the loan issued to the client is transferred back to the client after flowing through other multi-party clients, and credit workers need to pay attention and analyze the loan independently.
The application of the knowledge graph is mainly divided into edge-weight modeling, community division, graph feature calculation and risk propagation prediction.
In the edge weight modeling, the constructed financial knowledge graph of the small micro-enterprise can be regarded as a directed multi-graph (directed multi-graph) after the attributes on the nodes are ignored, that is, on the basis of the directed simple graph, the number of edges between two nodes in the graph is more than one. For the financial knowledge map of the small and micro enterprise, the fund flow relation and the investment relation are directed edges, the direction of the fund flow relation represents the direction of the fund outflow and inflow, and the direction of the investment relation represents the investment and the invested party; meanwhile, there may be a plurality of relationships between two customers, and the two customers may have fund flow transactions in different months, and there may be a plurality of fund flow transactions connected. Therefore, the knowledge graph is a multiple directed graph, and for modeling the graph characteristics of the nodes, the multiple graphs need to be reduced into a simple graph first, and then the network characteristic indexes are intuitively calculated.
Based on the constructed knowledge graph, the association relation set contained in the graph is assumed to be R1,R2,R3...RnDividing G into multiple relational graphs according to different relations, e.g. Gr1,Gr2,Gr3...GrnEach relationship graph GriIn the knowledge graph, only the connecting edges of the relationship ri in the knowledge graph are included in GriThe connecting edge types between any two connected nodes are the same, but the connecting edge time and other attributes may be different, the weights are calculated for the connecting edges according to the difference of the attribute values, and finally the weights of all the connecting edges between the two nodes are summed and reduced to one edge.
Specifically, for a certain relationship graph GriIf node v in
Figure BDA0003318923340000251
The reduced new edge weight is
Figure BDA0003318923340000252
Wherein,
Figure BDA0003318923340000253
is shown as a drawing GriV represents a head node of the two parties, u represents a tail node of the two parties, E (v, u) represents a set of n event information edges with a head node v and a tail node u, E (v) represents a set of directed edges with a head node v,
Figure BDA0003318923340000254
indicating the initial weight of the edge l' at the corresponding event occurrence time, t indicating the current time,
Figure BDA0003318923340000255
the temporal weight of the edge l' is represented,
Figure BDA0003318923340000256
indicating the initial weight of the edge/at the time of the corresponding event occurrence,
Figure BDA0003318923340000257
representing the temporal weight of the edge l. The initial weight in the fund flow relationship is the transaction amount attribute value of the transaction event, the initial weight in the investment relationship is the investment amount attribute value of the investment event, and the initial weights in the other relationships are all 1; creation time t of edgelThe larger the span t from the current time, the smaller the temporal weight, and the lower the importance of this edge.
In the completion of the pair relationship graph GriAfter the multiple edges in the step (2) are aggregated, all the relationship maps are merged to obtain a new association relationship map. For the node v in the graph, if the node u belongs to Nout(v) Then the new edge weight is connected
Figure BDA0003318923340000261
Wherein N isout(v) Is the set of the edge-out neighbor nodes of the node v in the new incidence relation graph, R (v, u) represents the set of m class data edges with the head node v and the tail node u, CrRepresenting constant coefficients corresponding to different category data.
Finally, a directed simple graph with only one connecting edge in one direction between any two adjacent nodes is obtained, and the weight of the connecting edge is larger, and the association of the two nodes is tighter.
In the community division, because the number of nodes in the map is large, the difference of results is small when the node map features are calculated globally, a large number of repeated values appear, and the calculation results cannot be used as features to enter a module. Therefore, the new graph is subjected to community division based on the continuous edge weight by applying a label propagation algorithm, and graph characteristics of the nodes are calculated in the divided community subgraph.
The basic process of community division through label propagation is that firstly, each node in a graph is allocated with a different community label, then the labels are propagated in a knowledge graph, each node updates the label according to the condition of the label of a neighbor node at each propagation step, and specifically, each node selects the label category with the maximum sum of the link weights of the neighbor nodes. As the labels are propagated, the finally connected node sets reach a consensus, and the labels on the nodes do not change any more. The label propagation-based community division algorithm can select label propagation paths according to the weight of the connecting edges between the nodes, namely the connection tightness degree, and finally divide the enterprise entities with close association relation into the same community.
In order to fully utilize data information, the method defines the node characteristics calculated in the communities as local characteristic information; abstracting each community as a node, weighting and summing all connecting edges between two communities to be used as the connecting edges between the community nodes to obtain a community association network, and calculating the characteristics of the community nodes in the network to be used as the global characteristic information of original client nodes in the communities. The global features and the local features are comprehensively considered, and limited data can be fully utilized.
In graph feature calculation, six indexes of degree centrality, feature vector centrality, intermediary centrality, tight centrality, HITS value and PAGERANK value are selected to calculate local and global dimension features, a logistic regression model is established by combining the graph features and traditional variable features to predict overdue risks of small micro-client loans, and finally 12 variables are input into a model.
In the risk propagation prediction, except for the characteristics of a computational graph, the method also applies a label propagation algorithm to carry out classification prediction on default nodes, the label propagation algorithm considers that the label of each node is the same as the labels of most neighbors of the node, namely when a certain small micro enterprise client does not have overdue risk but has overdue risk in a plurality of neighbor clients closely related to the small micro enterprise client, the risk is possibly propagated to the non-overdue client, and the label propagation algorithm can predict the transmission of the risk and early warn in advance to avoid.
Firstly, adding a label to each node to represent whether the client loan is overdue or not, and then continuously and iteratively propagating on the graph to update the labels of the nodes. After a certain number of iterations, counting whether the labels of the nodes in the graph change or not, and finding that part of unknown label nodes are spread as risk nodes, wherein the unknown labels represent off-line customers who do not hold loans in own lines, and the spread as risk nodes can provide reference for later loan admission; a few customer labels are propagated from overdue to overdue customers, which means that the customers are easy to risk and are the key focus targets of the management of the lifetime.
Based on the above risk management method based on the knowledge graph, the present disclosure also provides a risk management device 10 based on the knowledge graph. The risk management device 10 will be described in detail below with reference to fig. 12.
Fig. 12 schematically shows a block diagram of the structure of the knowledge-graph based risk management device 10 according to an embodiment of the present disclosure.
The risk management device 10 based on the knowledge graph comprises an acquisition module 1, a construction module 2, a first calculation module 3, a second calculation module 4 and a management module 5.
An obtaining module 1, where the obtaining module 1 is configured to perform operation S210: acquiring enterprise data, wherein the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer greater than or equal to 1, and n is an integer greater than or equal to 1.
A building block 2, the building block 2 being configured to perform operation S220: and constructing a knowledge graph according to the enterprise data, wherein the party and the event information are nodes of the knowledge graph, the event relation is an edge of the knowledge graph, and the event information is used for explaining the corresponding edge.
A first computing module 3, the first computing module 3 being adapted to executeRow operation S230: calculating reduction weight of n event information edges between two parties based on the knowledge graph constructed by each category data
Figure BDA0003318923340000281
A second calculating module 4, the second calculating module 4 being configured to perform operation S240: calculating an aggregation weight ω of m class-data edges between two parties based on the reduction weightsvu
A management module 5, the management module 5 being configured to perform operation S250: and carrying out risk management on the enterprises according to the aggregation weight.
Since the risk management device 10 is configured based on a risk management method, the beneficial effects of the risk management device 10 are the same as those of the risk management method, and are not described herein again.
In addition, according to the embodiment of the present disclosure, any plurality of the obtaining module 1, the constructing module 2, the first calculating module 3, the second calculating module 4, and the managing module 5 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module.
According to an embodiment of the present disclosure, at least one of the obtaining module 1, the constructing module 2, the first calculating module 3, the second calculating module 4 and the managing module 5 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementation manners of software, hardware and firmware, or an appropriate combination of any several of them.
Alternatively, at least one of the obtaining module 1, the building module 2, the first calculating module 3, the second calculating module 4 and the managing module 5 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
FIG. 13 schematically illustrates a block diagram of an electronic device adapted to implement a knowledge-graph based risk management method according to an embodiment of the present disclosure.
As shown in fig. 13, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The driver 910 is also connected to an input/output (I/O) interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. The program code is for causing a computer system to perform the methods of the embodiments of the disclosure when the computer program product is run on the computer system.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (14)

1. A risk management method based on knowledge graph is characterized by comprising the following steps:
acquiring enterprise data, wherein the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer greater than or equal to 1, and n is an integer greater than or equal to 1;
constructing a knowledge graph according to the enterprise data, wherein the parties and the event information are nodes of the knowledge graph, the event relation is an edge of the knowledge graph, and the event information is used for explaining the corresponding edge;
calculating reduction weights of n event information edges between the two parties based on the knowledge graph constructed for each of the category data
Figure FDA0003318923330000011
Calculating an aggregation weight ω of m class-data edges between the two parties according to the reduction weightvu(ii) a And
and carrying out risk management on the enterprises according to the aggregation weight.
2. The method of knowledge-graph-based risk management according to claim 1, wherein the reduction weight is calculated as follows:
Figure FDA0003318923330000012
wherein v represents a head node in two of the parties, u represents a tail node in two of the parties, E (v, u) represents a set of n event information edges with a head node of v and a tail node of u, E (v) represents a set of directed edges with a head node of v,
Figure FDA0003318923330000013
indicating the initial weight of the edge l' at the corresponding event occurrence time, t indicating the current time,
Figure FDA0003318923330000014
the temporal weight of the edge l' is represented,
Figure FDA0003318923330000015
indicating the initial weight of the edge/at the time of the corresponding event occurrence,
Figure FDA0003318923330000016
represents the temporal weight of the edge l; and
the calculation method of the aggregation weight is as follows:
Figure FDA0003318923330000017
wherein R (v, u) represents a set of m said class data edges with a head node v and a tail node u, CrRepresenting constant coefficients corresponding to different ones of the category data.
3. The method of knowledge-graph-based risk management according to claim 1, wherein the m category data includes at least one of capital stream data, investment data, warranty data, and corporate-related data.
4. The method of knowledge-graph-based risk management according to claim 1, wherein the n event information comprises event information under the same category data for n time periods.
5. The method of knowledge-graph based risk management according to claim 1, further comprising, prior to said building a knowledge-graph from the enterprise data:
cleansing the enterprise data, wherein cleansing the enterprise data comprises: one of data deduplication, feature in data completion, and anomalous feature processing in data.
6. The method of knowledge-graph based risk management according to claim 1, wherein said building a knowledge-graph from the enterprise data comprises:
constructing a schema layer from the party and the m category data, wherein the schema layer comprises nodes established from the party and m categories of the m category data, and edges between the nodes established from events in each of the category data; and
and importing the data in the category data into a corresponding mode layer.
7. The method of knowledge-graph-based risk management according to claim 6, wherein the category data is structured data, and importing the data in the category data into a corresponding schema layer comprises:
converting the category data into resource description framework data; and
and importing the resource description framework data into a corresponding mode layer.
8. The method of knowledge-graph-based risk management according to claim 1, wherein said risk managing an enterprise according to the aggregation weight comprises:
performing label propagation on the nodes of the knowledge graph according to the aggregation weight;
carrying out community division according to the label propagation result to obtain the community scale;
calculating the degree centrality of the nodes of the knowledge graph according to the community scale; and
and performing risk prediction by taking the centrality as an input of a risk prediction model.
9. The method of knowledge-graph-based risk management according to claim 1, wherein said risk managing an enterprise according to the aggregation weight comprises:
performing label propagation on the nodes of the knowledge graph according to the aggregation weight; and
and predicting the risk according to the label propagation result.
10. The method of knowledge-graph-based risk management according to any one of claims 1-9, further comprising visualizing the knowledge-graph, wherein the visualizing the knowledge-graph comprises: at least one of node retrieval based on the knowledge graph, sub-graph walking based on the knowledge graph, path exploration based on the knowledge graph, and self-loop exploration based on the knowledge graph.
11. The knowledge-graph based risk management method of claim 10,
the node retrieval includes: responding to the input node name, and displaying a node related to the node name and associated information of the node;
the subgraph wandering comprises: responding to the operation of manually clicking the event relation, and displaying a node corresponding to the clicked event relation and all edges sent by the node;
the path exploration comprises the following steps: acquiring and displaying a path relation between two nodes; and
the self-loop exploration comprises the following steps: and acquiring and displaying the nodes with the closed-loop path relation.
12. A knowledge-graph-based risk management device, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring enterprise data, the enterprise data comprises a party and an event relation between the party, the event relation comprises m category data, each category data comprises n event information, m is an integer which is greater than or equal to 1, and n is an integer which is greater than or equal to 1;
a construction module, configured to construct a knowledge graph according to the enterprise data, where the party and the event information are nodes of the knowledge graph, the event relationship is an edge of the knowledge graph, and the event information is used to explain the corresponding edge;
a first calculation module for calculating reduction weights of n event information edges between the two parties based on the knowledge graph constructed for each of the category data
Figure FDA0003318923330000041
A second calculation module for calculating an aggregation weight ω of m class-data edges between the two parties according to the reduction weightvu(ii) a And
and the management module is used for carrying out risk management on the enterprises according to the aggregation weight.
13. An electronic device, comprising:
one or more processors;
one or more memories for storing executable instructions that, when executed by the processor, implement the method of any of claims 1-11.
14. A computer-readable storage medium having stored thereon executable instructions that when executed by a processor implement a method according to any one of claims 1 to 11.
CN202111244065.7A 2021-10-25 2021-10-25 Risk management method, risk management device, electronic equipment and medium Pending CN113989018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111244065.7A CN113989018A (en) 2021-10-25 2021-10-25 Risk management method, risk management device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111244065.7A CN113989018A (en) 2021-10-25 2021-10-25 Risk management method, risk management device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN113989018A true CN113989018A (en) 2022-01-28

Family

ID=79741250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111244065.7A Pending CN113989018A (en) 2021-10-25 2021-10-25 Risk management method, risk management device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN113989018A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565256A (en) * 2022-02-23 2022-05-31 江苏小微云链金融科技有限公司 Enterprise cluster type supply chain risk control method and system based on block chain
CN114817681A (en) * 2022-04-28 2022-07-29 北京辰行科技有限公司 Financial wind control system based on big data analysis and management equipment thereof
CN117710113A (en) * 2023-11-17 2024-03-15 中国人寿保险股份有限公司山东省分公司 Abnormal insurance application behavior identification method and system based on legal person business knowledge graph
CN118014446A (en) * 2024-04-09 2024-05-10 广东瑞和通数据科技有限公司 Enterprise technology innovation comprehensive index analysis method, storage medium and computer equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565256A (en) * 2022-02-23 2022-05-31 江苏小微云链金融科技有限公司 Enterprise cluster type supply chain risk control method and system based on block chain
CN114565256B (en) * 2022-02-23 2023-10-31 江苏小微云链金融科技有限公司 Enterprise cluster type supply chain risk management and control method and system based on block chain
CN114817681A (en) * 2022-04-28 2022-07-29 北京辰行科技有限公司 Financial wind control system based on big data analysis and management equipment thereof
CN114817681B (en) * 2022-04-28 2023-04-07 广州市华商小额贷款股份有限公司 Financial wind control system based on big data analysis and management equipment thereof
CN117710113A (en) * 2023-11-17 2024-03-15 中国人寿保险股份有限公司山东省分公司 Abnormal insurance application behavior identification method and system based on legal person business knowledge graph
CN118014446A (en) * 2024-04-09 2024-05-10 广东瑞和通数据科技有限公司 Enterprise technology innovation comprehensive index analysis method, storage medium and computer equipment

Similar Documents

Publication Publication Date Title
US10878358B2 (en) Techniques for semantic business policy composition
US20230046324A1 (en) Systems and Methods for Organizing and Finding Data
US8756191B2 (en) Massively scalable reasoning architecture
CN113989018A (en) Risk management method, risk management device, electronic equipment and medium
CN112927082A (en) Credit risk prediction method, apparatus, device, medium, and program product
US20030220860A1 (en) Knowledge discovery through an analytic learning cycle
US11934415B2 (en) Computer-based systems for dynamic data discovery and methods thereof
US10255364B2 (en) Analyzing a query and provisioning data to analytics
US10529017B1 (en) Automated business plan underwriting for financial institutions
US20190311271A1 (en) Document analyzer
US11620538B1 (en) Machine learning integration for a dynamically scaling matching and prioritization engine
CN111046237A (en) User behavior data processing method and device, electronic equipment and readable medium
Khatri Managerial work in the realm of the digital universe: The role of the data triad
CN111177653B (en) Credit evaluation method and device
US20130097604A1 (en) Information integration flow freshness cost
Navdeep et al. Role of big data analytics in analyzing e-Governance projects
Vafopoulos et al. Insights in global public spending
Liu et al. Female employment data analysis based on decision tree algorithm and association rule analysis method
Abdullah et al. Mapping crowdfunding research on the web of science database: A bibliometric analysis approach
CN113516553A (en) Credit risk early warning method and device
Spada et al. WHAT USERS WANT: A NATURAL LANGUAGE PROCESSING APPROACH TO DISCOVER USERS'NEEDS FROM ONLINE REVIEWS
CN116680408A (en) Abnormal fund supervision method, system and equipment based on knowledge graph
Khan et al. Data Analysis through Information Visualisation for eGovernments & eBusinesses
Tagkoulis Knowledge Graphs Tools and Applications
CN114090752A (en) Problem thread mining method, device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination