CN111222681A - Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction - Google Patents

Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction Download PDF

Info

Publication number
CN111222681A
CN111222681A CN201911075577.8A CN201911075577A CN111222681A CN 111222681 A CN111222681 A CN 111222681A CN 201911075577 A CN201911075577 A CN 201911075577A CN 111222681 A CN111222681 A CN 111222681A
Authority
CN
China
Prior art keywords
enterprise
graph
node
prediction
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911075577.8A
Other languages
Chinese (zh)
Inventor
宋仲伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Quantum Shuju Beijing Technology Co ltd
Original Assignee
Quantum Shuju Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quantum Shuju Beijing Technology Co ltd filed Critical Quantum Shuju Beijing Technology Co ltd
Priority to CN201911075577.8A priority Critical patent/CN111222681A/en
Publication of CN111222681A publication Critical patent/CN111222681A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The application discloses a data processing method, a data processing device, data processing equipment and a data processing storage medium for enterprise bankruptcy risk prediction. The method comprises the steps of establishing an enterprise relation map according to collected enterprise data; according to at least enterprise node attributes and enterprise node relations included in the enterprise relation graph as input, training to obtain a prediction model; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag; and predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information. The method and the device solve the technical problem of poor effect of enterprise bankruptcy risk prediction. According to the method and the device, the correlation information among enterprises is quantized by constructing the enterprise relational graph, and meanwhile, the multidimensional edge characteristics are utilized more fully by providing a new graph neural network model, so that the accuracy of graph node classification prediction can be improved.

Description

Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction
Technical Field
The application relates to the field of data processing, in particular to a data processing method, a data processing device, data processing equipment and a data processing storage medium for enterprise bankruptcy risk prediction.
Background
Traditional enterprise risk prediction is only evaluated according to self information of a single enterprise, including enterprise profile, legal representatives and high-level conditions of the enterprise, registered capital and composition conditions, enterprise intellectual property and other information, and enterprise credit information is analyzed through manual or machine learning methods such as SVM, XGboost and the like to provide enterprise credit rating.
The inventor finds that the machine learning method only inputs basic information of a single enterprise, cannot correlate the conditions of high management and stockholder relation among enterprises, external investment cooperation and the like, and loses important reference information of enterprise risks. Further, the accuracy of enterprise risk prediction is also insufficient.
Aiming at the problem of poor effect of enterprise bankruptcy risk prediction in the related technology, no effective solution is provided at present.
Disclosure of Invention
The application mainly aims to provide a data processing method, a data processing device, equipment and a storage medium for enterprise bankruptcy risk prediction, so as to solve the problem that the effect of enterprise bankruptcy risk prediction is poor.
In order to achieve the above object, according to one aspect of the present application, a data processing method for enterprise bankruptcy risk prediction is provided.
The data processing method for enterprise bankruptcy risk prediction comprises the following steps: establishing an enterprise relation map according to the collected enterprise data; according to at least enterprise node attributes and enterprise node relations included in the enterprise relation graph as input, training to obtain a prediction model; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag; and predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information.
Further, the training to obtain the prediction model according to at least the enterprise node attributes and the enterprise node relationships included in the enterprise relationship graph as inputs includes:
for an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for a graph G ═ node features in (V, E):
Xijthe jth feature vector representing node i,
where N represents the number of nodes and F represents the feature dimension of each node.
Further, the training to obtain the prediction model according to at least the enterprise node attributes and the enterprise node relationships included in the enterprise relationship graph as inputs includes:
for an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for graph G ═ edge features in (V, E):
Eijan edge feature vector representing node i and node j,
Eijprepresents EijIs generated from the p-dimensional feature vector of (1),
e is the NxNxP tensor of the edge feature of the graph, and E is the time when no link exists between two nodesij=0。
Further, predicting the enterprise risk through the trained prediction model according to the received enterprise information comprises:
and acquiring low-dimensional potential feature representation of nodes in the graph neural network by adopting a network embedding method according to the received enterprise information, and taking the feature representation as the feature of a graph-based classification task.
Further, predicting the enterprise risk through the trained prediction model according to the received enterprise information comprises:
and (3) fusing node attribute information and node association information of enterprises in the enterprise relational graph, mapping the high-dimensional sparse matrix of the graph to a low-dimensional dense vector, and training to obtain a prediction model for node classification prediction.
In order to achieve the above object, according to another aspect of the present application, a data processing apparatus for enterprise bankruptcy risk prediction is provided.
The data processing device for enterprise bankruptcy risk prediction according to the application comprises: the enterprise relation map module is used for establishing an enterprise relation map according to the collected enterprise data; the model training module is used for training to obtain a prediction model according to enterprise node attributes and enterprise node relations at least included in the enterprise relation graph as input; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag; and the risk prediction module is used for predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information.
Further, the model training module is used for
For an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted X, and for a node feature in graph G ═ V, E):
Xijthe jth feature vector representing node i,
where N represents the number of nodes and F represents the feature dimension of each node.
Further, the model training module is used for
For an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for graph G ═ edge features in (V, E):
Eijan edge feature vector representing node i and node j,
Eijprepresents EijIs generated from the p-dimensional feature vector of (1),
e is the NxNxP tensor of the edge feature of the graph, and E is the time when no link exists between two nodesij=0。
In order to achieve the above object, according to yet another aspect of the present application, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the data processing method for enterprise bankruptcy risk prediction.
In order to achieve the above object, according to yet another aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements the steps of the data processing method for enterprise bankruptcy risk prediction.
According to the data processing method, the data processing device, the data processing equipment and the data processing storage medium for enterprise bankruptcy risk prediction, an enterprise relation map is established according to collected enterprise data, a prediction model is obtained through training according to enterprise node attributes and enterprise node relations at least included in the enterprise relation map, the purpose that enterprise bankruptcy risks are predicted through the prediction model obtained through training according to received enterprise information is achieved, the technical effect of improving the prejudgment capacity of enterprise operation risks is achieved, and the technical problem that the effect of enterprise bankruptcy risk prediction is poor is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
FIG. 1 is a schematic flow chart of a data processing method for enterprise bankruptcy risk prediction according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a data processing apparatus for enterprise bankruptcy risk prediction according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an apparatus according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an implementation principle according to an embodiment of the present application;
FIG. 5 is a schematic diagram of business relationships according to an embodiment of the present application;
FIG. 6 is a diagram of a neural network model architecture for enterprise risk oriented mapping according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.
Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, the method includes steps S101 to S103 as follows:
step S101, establishing an enterprise relation map according to the collected enterprise data;
and establishing an enterprise relationship map according to the collected enterprise data. And in the data collection and preprocessing stage, the system collects enterprise data and constructs an enterprise relationship map through enterprise information.
Specifically, the relation and entity attributes between knowledge graph entities can be defined through ontology modeling, and the construction of the knowledge graph is realized by utilizing algorithms such as Chinese word segmentation, entity extraction, attribute extraction, entity alignment, entity disambiguation, semantic understanding, knowledge inference, knowledge fusion, semantic matching and the like.
Alternatively, for semi-structured data and unstructured data, the representation of knowledge points can be obtained through knowledge extraction such as entity extraction, relationship extraction and attribute extraction, the representation of knowledge points can also be obtained through importing of structured data, and after the knowledge representation is obtained, nodes representing the same entity are merged together after the knowledge is aligned through the entity.
Step S102, according to at least enterprise node attributes and enterprise node relations included in the enterprise relation graph as input, training to obtain a prediction model;
the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag.
And training to obtain a prediction model according to the enterprise node attributes and the enterprise node relations in the enterprise relation graph and taking the enterprise node attributes and the enterprise node relations as input.
Specifically, based on the node and structure representation of the enterprise relational graph, the enterprise node attributes and the relational structure between the nodes in the graph are vectorially represented and used for training input of a subsequent model.
And S103, predicting the bankruptcy risk of the enterprise through the prediction model obtained through training according to the received enterprise information.
And according to the received enterprise information without the label, the prediction model obtained by training in the steps predicts the probability of the enterprise failure risk.
Specifically, an enterprise node classification algorithm based on a graph neural network is constructed, whether enterprises are bankruptcy is taken as a label, enterprise node attributes and relationship structure vectors in a graph are simultaneously input into the improved graph neural network, and the bankruptcy probability of the enterprise nodes is trained; and (4) predicting the bankruptcy probability of the unlabeled data by using the model, so as to realize enterprise risk prediction.
From the above description, it can be seen that the following technical effects are achieved by the present application:
according to the data processing method, the data processing device, the data processing equipment and the data processing storage medium for enterprise bankruptcy risk prediction, an enterprise relation map is established according to collected enterprise data, a prediction model is obtained through training according to enterprise node attributes and enterprise node relations at least included in the enterprise relation map, the purpose that enterprise bankruptcy risks are predicted through the prediction model obtained through training according to received enterprise information is achieved, the technical effect of improving the prejudgment capacity of enterprise operation risks is achieved, and the technical problem that the effect of enterprise bankruptcy risk prediction is poor is solved.
According to the embodiment of the present application, as a preferred embodiment in the present embodiment, training the obtained prediction model according to at least the enterprise node attributes and the enterprise node relationships included in the enterprise relationship graph as inputs includes:
for an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for a graph G ═ node features in (V, E):
Xijthe jth feature vector representing node i,
where N represents the number of nodes and F represents the feature dimension of each node.
Specifically, given has NThe enterprise relationship graph of the enterprise nodes has X as an N X F matrix representation of the node characteristics of the whole graph. The elements of the matrix or tensor are denoted by indices in the subscripts. For graph G ═ V, E, the node characteristics are used: by XijThe jth eigenvector representing node i can be represented by an N × F matrix X. Where N represents the number of nodes and F represents the feature dimension of each node.
According to the embodiment of the present application, as a preferred embodiment in the present embodiment, training the obtained prediction model according to at least the enterprise node attributes and the enterprise node relationships included in the enterprise relationship graph as inputs includes:
for an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for graph G ═ edge features in (V, E):
Eijan edge feature vector representing node i and node j,
Eijprepresents EijIs generated from the p-dimensional feature vector of (1),
e is the NxNxP tensor of the edge feature of the graph, and E is the time when no link exists between two nodesij=0。
In particular, given an enterprise relationship graph having N enterprise nodes, let X be an N F matrix representation of the node features of the entire graph. The elements of the matrix or tensor are denoted by indices in the subscripts. For graph G ═ (V, E), edge characteristics: eijEdge feature vectors representing nodes i and j, EijpRepresents EijE is the nxnxnxp tensor of edge features of the graph, E is the number of links between two nodesij=0。
According to the embodiment of the present application, as a preferred embodiment in the present embodiment, predicting the enterprise risk by the prediction model obtained by training according to the received enterprise information includes:
and acquiring low-dimensional potential feature representation of nodes in the graph neural network by adopting a network embedding method according to the received enterprise information, and taking the feature representation as the feature of a graph-based classification task.
In particular, a Network Embedding method (Network Embedding) is adopted to learn low-dimensional potential representation of nodes in a Network, and the learned feature representation can be used as features of various tasks based on a graph, such as classification, clustering and other tasks. The enterprise risk prediction is converted into a node classification problem of an enterprise relation graph, enterprise relation information is input into a neural network as important features to be represented, the complex relations of enterprise nodes are fully utilized to establish relations, a mining model is combined with enterprise operation and management features, and enterprises and individuals with close relations are mined to extract enterprise operation and management conditions and bankruptcy risks, so that the pre-judging capability of enterprise operation risks is improved.
Optionally, in the node classification problem setting, each node v is represented as its feature x _ v and is associated with a label t _ v, each node is represented by a d-dimensional state vector h _ v, wherein information of its neighborhood is contained, finally an embedded state h _ v containing adjacent information of each vertex v is learned, and unlabeled nodes are predicted by using h _ v.
hv=f(Xv,Xco[v],hne[v],Xne[v])
According to the embodiment of the present application, as a preferred embodiment in the present embodiment, predicting the enterprise risk by the prediction model obtained by training according to the received enterprise information includes:
and (3) fusing node attribute information and node association information of enterprises in the enterprise relational graph, mapping the high-dimensional sparse matrix of the graph to a low-dimensional dense vector, and training to obtain a prediction model for node classification prediction.
And mapping the high-dimensional sparse matrix of the graph to a low-dimensional dense vector by fusing the node attribute information and the node association information of the enterprises in the enterprise relational graph, and training to obtain a prediction model for node classification prediction.
Specifically, by constructing an enterprise relationship map, the associated information among enterprises is expressed in a quantitative mode, and the map structure information of high management, stockholders, external investment and the like of the enterprises is used as characteristic input, so that the problem that the associated information among the enterprises is difficult to quantify is solved. Meanwhile, the relation characteristics are used as the input of a subsequent risk prediction model, effective utilization of side information by the neural network cannot be limited, and the accuracy of enterprise risk prediction is improved.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
According to an embodiment of the present application, there is also provided a data processing apparatus for enterprise bankruptcy risk prediction, for implementing the above method, as shown in fig. 2, the apparatus includes: the enterprise relationship map module 10 is used for establishing an enterprise relationship map according to the collected enterprise data; the model training module 11 is configured to train to obtain a prediction model according to at least enterprise node attributes and enterprise node relationships included in the enterprise relationship graph as inputs; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag; and the risk prediction module 12 is used for predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information.
In the enterprise relationship graph module 10 according to the embodiment of the present application, an enterprise relationship graph may be established according to the collected enterprise data. And in the data collection and preprocessing stage, the system collects enterprise data and constructs an enterprise relationship map through enterprise information.
Specifically, the relation and entity attributes between knowledge graph entities can be defined through ontology modeling, and the construction of the knowledge graph is realized by utilizing algorithms such as Chinese word segmentation, entity extraction, attribute extraction, entity alignment, entity disambiguation, semantic understanding, knowledge inference, knowledge fusion, semantic matching and the like.
Alternatively, for semi-structured data and unstructured data, the representation of knowledge points can be obtained through knowledge extraction such as entity extraction, relationship extraction and attribute extraction, the representation of knowledge points can also be obtained through importing of structured data, and after the knowledge representation is obtained, nodes representing the same entity are merged together after the knowledge is aligned through the entity.
In the model training module 11 of the embodiment of the present application, a prediction model is obtained by training according to the enterprise node attributes and the enterprise node relationships in the enterprise relationship graph and using the enterprise node attributes and the enterprise node relationships as inputs.
Specifically, based on the node and structure representation of the enterprise relational graph, the enterprise node attributes and the relational structure between the nodes in the graph are vectorially represented and used for training input of a subsequent model.
According to the enterprise information without the label, the risk prediction module 12 of the embodiment of the application predicts the probability of the enterprise failure risk through the prediction model obtained through the training of the above steps.
Specifically, an enterprise node classification algorithm based on a graph neural network is constructed, whether enterprises are bankruptcy is taken as a label, enterprise node attributes and relationship structure vectors in a graph are simultaneously input into the improved graph neural network, and the bankruptcy probability of the enterprise nodes is trained; and (4) predicting the bankruptcy probability of the unlabeled data by using the model, so as to realize enterprise risk prediction.
According to the embodiment of the present application, as a preferred embodiment in the present application, the model training module 11 is configured to, for an enterprise relationship graph of N enterprise nodes, represent an N × F matrix of node features of the whole graph as X, and for a node feature in (V, E): xijThe jth feature vector representing node i, where N represents the number of nodes and F represents the feature dimension of each node.
In particular, given an enterprise relationship graph having N enterprise nodes, let X be an N F matrix representation of the node features of the entire graph. The elements of the matrix or tensor are denoted by indices in the subscripts. For graph G ═ V, E, the node characteristics are used: by XijThe jth eigenvector representing node i can be represented by an N × F matrix X. Where N represents the number of nodes and F represents the feature dimension of each node.
According to the embodiment of the present application, as a preferred embodiment in the present application, the model training module 11 is configured to, for an enterprise relationship graph of N enterprise nodes, represent an N × F matrix of node features of the whole graph as X, and for an edge feature in (V, E): eijEdge feature vectors representing nodes i and j, EijpRepresents EijE is the nxnxnxp tensor of edge features of the graph, E is the number of links between two nodesij=0。
In particular, given an enterprise relationship graph having N enterprise nodes, let X be an N F matrix representation of the node features of the entire graph. The elements of the matrix or tensor are denoted by indices in the subscripts. For graph G ═ (V, E), edge characteristics: eijEdge feature vectors representing nodes i and j, EijpRepresents EijE is the nxnxnxp tensor of edge features of the graph, E is the number of links between two nodesij=0。
The embodiment of the application also provides computer equipment. As shown in fig. 3, the computer device 20 may include: the at least one processor 201, e.g., CPU, the at least one network interface 204, the user interface 203, the memory 205, the at least one communication bus 202, and optionally, a display 206. Wherein a communication bus 202 is used to enable the connection communication between these components. The user interface 203 may include a touch screen, a keyboard or a mouse, among others. The network interface 204 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and a communication connection may be established with the server via the network interface 204. The memory 205 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory, and the memory 205 includes a flash in the embodiment of the present invention. The memory 205 may optionally be at least one memory system located remotely from the processor 201. As shown in fig. 3, memory 205, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and program instructions.
It should be noted that the network interface 204 may be connected to a receiver, a transmitter or other communication module, and the other communication module may include, but is not limited to, a WiFi module, a bluetooth module, etc., and it is understood that the computer device in the embodiment of the present invention may also include a receiver, a transmitter, other communication module, etc.
Processor 201 may be used to call program instructions stored in memory 205 and cause computer device 20 to perform the following operations:
establishing an enterprise relation map according to the collected enterprise data;
according to at least enterprise node attributes and enterprise node relations included in the enterprise relation graph as input, training to obtain a prediction model; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag;
and predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information.
Please refer to fig. 4 to fig. 6, which illustrate the present application in detail:
as shown in fig. 4, step 1, the first phase is a data collection and preprocessing phase, the system collects enterprise data, and an enterprise relationship map is constructed through enterprise information;
step 2, expressing the node and the structure of the enterprise relational graph in a vectorization way, and using the enterprise node attributes and the relational structure between the nodes in the graph for the training input of a subsequent model;
step 3, constructing an enterprise node classification algorithm based on the graph neural network, taking whether enterprises are bankruptcy as a label, inputting enterprise node attributes and relationship structure vectors in the graph into the improved graph neural network at the same time, and training the bankruptcy probability of the enterprise nodes;
and 4, predicting the bankruptcy probability of the unlabeled data by using the model, so as to realize enterprise risk prediction.
A preferred embodiment of the present application is described below with reference to the accompanying drawings:
enterprise relation knowledge graph construction
The method comprises the steps of defining the relation among knowledge graph entities and entity attributes through ontology modeling, and constructing the knowledge graph by utilizing algorithms of Chinese word segmentation, entity extraction, attribute extraction, entity alignment, entity disambiguation, semantic understanding, knowledge inference, knowledge fusion, semantic matching and the like. The method comprises the steps of obtaining the representation of knowledge points through knowledge extraction (entity extraction, relation extraction and attribute extraction) on semi-structured data and unstructured data, obtaining the representation of the knowledge points through importing of structured data, and merging nodes representing the same entity after the knowledge representation is obtained and the knowledge is aligned through the entity.
As shown in fig. 5, the data sources include basic information of the enterprise inside the bank, transaction data, external public opinion data, complaint data, and business data. And extracting the closely related triple information of the enterprises by using various algorithms. The frequency and quantity of the fund coming and going among the enterprises are larger than the threshold value, and the close association is considered to exist. There are a relatively large number of cross stakeholders, or jurisdictions, between enterprises that may be considered to be in close association. There are a large number of litigation events between enterprises, and the two can be considered to be closely related. There are a number of financial transactions between the business and the individual greater than a threshold value, and a close association may be considered to exist. The relationship of the enterprise and the individual such as the competent pipe and the stockholder can be considered to be closely related. There are many dispute relations between enterprises and individuals, and it can be considered that there are close relations.
(II) node and structure representation method based on enterprise relation graph
Given an enterprise relationship graph having N enterprise nodes, let X be an N F matrix representation of the node features of the entire graph. The elements of the matrix or tensor are denoted by indices in the subscripts. For graph G ═ (V, E), we use the following features:
node characteristics: by XijThe jth eigenvector representing node i can be represented by an N × F matrix X. Where N represents the number of nodes and F represents the feature dimension of each node.
Edge characteristics: eijEdge feature vectors representing nodes i and j, EijpRepresents EijE is the nxnxnxp tensor of edge features of the graph, E is the number of links between two nodesij=0。
Enterprise map edge characteristic scale
Serial number Feature(s) Numerical value Dimension (d) of
1 Common shareholder Proportion of the share 1×3
2 Common high pipe Number of 1×2
3 Enterprises mutually stock Proportion of the share 1×3
4 Inter-enterprise investment Amount of investment, sum of money 1×4
5 Litigation between enterprises Quantity and categories of litigation 1×4
6 Upstream and downstream relations of enterprise Upstream and downstream 1×2
(III) classified learning method based on enterprise relation structure representation
Network Embedding (Network Embedding) aims at learning low-dimensional potential representations of nodes in a Network, and the learned feature representations can be used as features of various tasks based on a graph, such as classification, clustering and the like.
In the node classification problem setting, each node v is represented as a characteristic x _ v of the node and is associated with a label t _ v, each node is represented by a d-dimensional state vector h _ v, the information of the neighborhood of each node is contained, finally, an embedded state h _ v containing the adjacent information of each vertex v is learned, and the unmarked node is predicted by using the h _ v.
hv=f(Xv,Xco[v],hne[v],Xne[v])
Where x _ co [ v ] represents the characteristic of the edge connected to v, h _ ne [ v ] represents the neighbor node to v, and x _ ne [ v ] represents the characteristic of the neighbor node to v. The function f is a mapping function that projects these inputs into a d-dimensional space. Since a unique solution for h _ v needs to be found, the Banach stationary point theorem can be applied and the above equation rewritten as an iterative update process.
Ht+1=F(Ht,X)
H and X represent the set of all H and X, respectively. The output of the GNN is calculated by passing the state h _ v and the feature x _ v to the output function g:
Ov=g(hv,Xv)
where F and G represent the transfer function and the output function, respectively, using a feed forward neural network. The H after t +1 th iteration can be iteratively solved according to the panah stationary point theorem to converge on any H0. Assuming that the target information is tv, the L1 loss function is:
Figure BDA0002261247770000141
training is carried out on a training set (data with enterprise bankruptcy labels), then the death probability of the enterprise is deduced on a testing set, wherein p is the number of supervised vertexes, and the training is carried out through a gradient descent method. The model structure is shown in fig. 6.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A data processing method for enterprise bankruptcy risk prediction is characterized by comprising the following steps:
establishing an enterprise relation map according to the collected enterprise data;
according to at least enterprise node attributes and enterprise node relations included in the enterprise relation graph as input, training to obtain a prediction model; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag;
and predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information.
2. The data processing method for enterprise bankruptcy risk prediction according to claim 1, wherein training a prediction model based on at least enterprise node attributes and enterprise node relationships included in the enterprise relationship graph as inputs comprises:
for an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for a graph G ═ node features in (V, E):
Xijthe jth feature vector representing node i,
where N represents the number of nodes and F represents the feature dimension of each node.
3. The data processing method for enterprise bankruptcy risk prediction according to claim 1, wherein training a prediction model based on at least enterprise node attributes and enterprise node relationships included in the enterprise relationship graph as inputs comprises:
for an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for graph G ═ edge features in (V, E):
Eijan edge feature vector representing node i and node j,
Eijprepresents EijIs generated from the p-dimensional feature vector of (1),
e is the NxNxP tensor of the edge feature of the graph, and E is the time when no link exists between two nodesij=0。
4. The data processing method for enterprise bankruptcy risk prediction as defined in claim 1, wherein predicting enterprise risk by the trained predictive model based on the received enterprise information comprises:
and acquiring low-dimensional potential feature representation of nodes in the graph neural network by adopting a network embedding method according to the received enterprise information, and taking the feature representation as the feature of a graph-based classification task.
5. The data processing method for enterprise bankruptcy risk prediction as defined in claim 1, wherein predicting enterprise risk by the trained predictive model based on the received enterprise information comprises:
and (3) fusing node attribute information and node association information of enterprises in the enterprise relational graph, mapping the high-dimensional sparse matrix of the graph to a low-dimensional dense vector, and training to obtain a prediction model for node classification prediction.
6. A data processing apparatus for enterprise bankruptcy risk prediction, comprising:
the enterprise relation map module is used for establishing an enterprise relation map according to the collected enterprise data;
the model training module is used for training to obtain a prediction model according to enterprise node attributes and enterprise node relations at least included in the enterprise relation graph as input; the prediction model is based on enterprise classification of the graph neural network, and whether the enterprise is bankruptcy is used as a data tag;
and the risk prediction module is used for predicting the bankruptcy risk of the enterprise through the prediction model obtained by training according to the received enterprise information.
7. The data processing apparatus for enterprise bankruptcy risk prediction of claim 6, wherein the model training module is configured to train the model to be used for enterprise bankruptcy risk prediction
For an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted X, and for a node feature in graph G ═ V, E):
Xijthe jth feature vector representing node i,
where N represents the number of nodes and F represents the feature dimension of each node.
8. The data processing apparatus for enterprise bankruptcy risk prediction of claim 6, wherein the model training module is configured to train the model to be used for enterprise bankruptcy risk prediction
For an enterprise relationship graph of N enterprise nodes, the N × F matrix of node features of the entire graph is denoted as X, and for graph G ═ edge features in (V, E):
Eijrepresenting nodes i and jThe edge feature vector of the point j,
Eijprepresents EijIs generated from the p-dimensional feature vector of (1),
e is the NxNxP tensor of the edge feature of the graph, and E is the time when no link exists between two nodesij=0。
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the data processing method for enterprise bankruptcy risk prediction of any of claims 1 to 5.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data processing method for enterprise bankruptcy risk prediction according to any one of claims 1 to 5.
CN201911075577.8A 2019-11-05 2019-11-05 Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction Pending CN111222681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911075577.8A CN111222681A (en) 2019-11-05 2019-11-05 Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911075577.8A CN111222681A (en) 2019-11-05 2019-11-05 Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction

Publications (1)

Publication Number Publication Date
CN111222681A true CN111222681A (en) 2020-06-02

Family

ID=70830564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911075577.8A Pending CN111222681A (en) 2019-11-05 2019-11-05 Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction

Country Status (1)

Country Link
CN (1) CN111222681A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184012A (en) * 2020-09-27 2021-01-05 平安资产管理有限责任公司 Enterprise risk early warning method, device, equipment and readable storage medium
CN112200382A (en) * 2020-10-27 2021-01-08 支付宝(杭州)信息技术有限公司 Training method and device of risk prediction model
CN112215441A (en) * 2020-11-17 2021-01-12 北京明略软件系统有限公司 Prediction model training method and system
CN112380345A (en) * 2020-11-20 2021-02-19 山东省计算中心(国家超级计算济南中心) COVID-19 scientific literature fine-grained classification method based on GNN
CN112446778A (en) * 2020-11-09 2021-03-05 广东华兴银行股份有限公司 Method, device and medium for identifying enterprise credit risk based on knowledge graph
CN112580716A (en) * 2020-12-16 2021-03-30 北京百度网讯科技有限公司 Method, device and equipment for identifying edge types in map and storage medium
CN112766684A (en) * 2021-01-11 2021-05-07 上海信联信息发展股份有限公司 Enterprise credit evaluation method and device and electronic equipment
CN112766683A (en) * 2021-01-11 2021-05-07 上海信联信息发展股份有限公司 Food enterprise credit evaluation method and device and electronic equipment
CN112785414A (en) * 2021-01-04 2021-05-11 上海海事大学 Credit risk prediction method based on knowledge graph and ontology inference engine
CN113283795A (en) * 2021-06-11 2021-08-20 同盾科技有限公司 Data processing method and device based on two-classification model, medium and equipment
CN113361962A (en) * 2021-06-30 2021-09-07 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise risk based on block chain network
CN113642704A (en) * 2021-08-02 2021-11-12 上海明略人工智能(集团)有限公司 Graph feature derivation method, system, storage medium and electronic device
CN114022058A (en) * 2022-01-06 2022-02-08 成都晓多科技有限公司 Small and medium-sized enterprise confidence loss risk prediction method based on time sequence knowledge graph
CN116796909A (en) * 2023-08-16 2023-09-22 浙江同信企业征信服务有限公司 Judicial litigation risk prediction method, device, equipment and storage medium
CN117094566A (en) * 2023-10-19 2023-11-21 中节能大数据有限公司 View-oriented enterprise management analysis strategy method
CN112819308B (en) * 2021-01-23 2024-04-02 罗家德 Head enterprise identification method based on bidirectional graph convolution neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates
CN109657977A (en) * 2018-12-19 2019-04-19 重庆誉存大数据科技有限公司 A kind of Risk Identification Method and system
CN109934697A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of credit risk control method, device and equipment based on graph structure model
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934697A (en) * 2017-12-15 2019-06-25 阿里巴巴集团控股有限公司 A kind of credit risk control method, device and equipment based on graph structure model
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates
CN109657977A (en) * 2018-12-19 2019-04-19 重庆誉存大数据科技有限公司 A kind of Risk Identification Method and system
CN110263227A (en) * 2019-05-15 2019-09-20 阿里巴巴集团控股有限公司 Clique based on figure neural network finds method and system

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184012A (en) * 2020-09-27 2021-01-05 平安资产管理有限责任公司 Enterprise risk early warning method, device, equipment and readable storage medium
CN112200382A (en) * 2020-10-27 2021-01-08 支付宝(杭州)信息技术有限公司 Training method and device of risk prediction model
CN112446778A (en) * 2020-11-09 2021-03-05 广东华兴银行股份有限公司 Method, device and medium for identifying enterprise credit risk based on knowledge graph
CN112215441A (en) * 2020-11-17 2021-01-12 北京明略软件系统有限公司 Prediction model training method and system
CN112380345A (en) * 2020-11-20 2021-02-19 山东省计算中心(国家超级计算济南中心) COVID-19 scientific literature fine-grained classification method based on GNN
CN112380345B (en) * 2020-11-20 2022-03-29 山东省计算中心(国家超级计算济南中心) COVID-19 scientific literature fine-grained classification method based on GNN
CN112580716A (en) * 2020-12-16 2021-03-30 北京百度网讯科技有限公司 Method, device and equipment for identifying edge types in map and storage medium
CN112580716B (en) * 2020-12-16 2023-07-11 北京百度网讯科技有限公司 Method, device, equipment and storage medium for identifying edge types in atlas
CN112785414A (en) * 2021-01-04 2021-05-11 上海海事大学 Credit risk prediction method based on knowledge graph and ontology inference engine
CN112785414B (en) * 2021-01-04 2024-01-26 上海海事大学 Credit risk prediction method based on knowledge graph and ontology inference engine
CN112766684A (en) * 2021-01-11 2021-05-07 上海信联信息发展股份有限公司 Enterprise credit evaluation method and device and electronic equipment
CN112766683A (en) * 2021-01-11 2021-05-07 上海信联信息发展股份有限公司 Food enterprise credit evaluation method and device and electronic equipment
CN112819308B (en) * 2021-01-23 2024-04-02 罗家德 Head enterprise identification method based on bidirectional graph convolution neural network
CN113283795A (en) * 2021-06-11 2021-08-20 同盾科技有限公司 Data processing method and device based on two-classification model, medium and equipment
CN113361962A (en) * 2021-06-30 2021-09-07 支付宝(杭州)信息技术有限公司 Method and device for identifying enterprise risk based on block chain network
CN113642704A (en) * 2021-08-02 2021-11-12 上海明略人工智能(集团)有限公司 Graph feature derivation method, system, storage medium and electronic device
CN114022058A (en) * 2022-01-06 2022-02-08 成都晓多科技有限公司 Small and medium-sized enterprise confidence loss risk prediction method based on time sequence knowledge graph
CN116796909A (en) * 2023-08-16 2023-09-22 浙江同信企业征信服务有限公司 Judicial litigation risk prediction method, device, equipment and storage medium
CN117094566A (en) * 2023-10-19 2023-11-21 中节能大数据有限公司 View-oriented enterprise management analysis strategy method
CN117094566B (en) * 2023-10-19 2024-01-02 中节能大数据有限公司 View-oriented enterprise management analysis strategy method

Similar Documents

Publication Publication Date Title
CN111222681A (en) Data processing method, device, equipment and storage medium for enterprise bankruptcy risk prediction
CN110968701A (en) Relationship map establishing method, device and equipment for graph neural network
Xue et al. Dynamic network embedding survey
Bharadiya A comparative study of business intelligence and artificial intelligence with big data analytics
CN110889556B (en) Enterprise operation risk characteristic data information extraction method and extraction system
US20220066772A1 (en) System and Method for Code and Data Versioning in Computerized Data Modeling and Analysis
EP3985578A1 (en) Method and system for automatically training machine learning model
US10268753B2 (en) System and method for optimized query execution in computerized data modeling and analysis
US10275502B2 (en) System and method for interactive reporting in computerized data modeling and analysis
Guo et al. Supplier selection based on hierarchical potential support vector machine
Tsai Combining cluster analysis with classifier ensembles to predict financial distress
Zhang et al. Enhancing stock market prediction with extended coupled hidden Markov model over multi-sourced data
Zhang et al. An explainable artificial intelligence approach for financial distress prediction
La Rosa et al. Detecting approximate clones in business process model repositories
US20110202322A1 (en) Computer Implemented Method for Discovery of Markov Boundaries from Datasets with Hidden Variables
Zhao et al. The study on the text classification for financial news based on partial information
Meng et al. Intelligent salary benchmarking for talent recruitment: A holistic matrix factorization approach
Chen et al. Multi-scale temporal features extraction based graph convolutional network with attention for multivariate time series prediction
CN116402512B (en) Account security check management method based on artificial intelligence
Sawalha et al. Towards an efficient big data management schema for IoT
CN110781970A (en) Method, device and equipment for generating classifier and storage medium
Jeyaraman et al. Practical Machine Learning with R: Define, build, and evaluate machine learning models for real-world applications
Li et al. An improved genetic-XGBoost classifier for customer consumption behavior prediction
US11921821B2 (en) System and method for labelling data for trigger identification
Ni et al. Robust factorization machines for credit default prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination