CN116662570A - Heterogeneous graph knowledge graph completion method and system for bank risk assessment - Google Patents

Heterogeneous graph knowledge graph completion method and system for bank risk assessment Download PDF

Info

Publication number
CN116662570A
CN116662570A CN202310665418.3A CN202310665418A CN116662570A CN 116662570 A CN116662570 A CN 116662570A CN 202310665418 A CN202310665418 A CN 202310665418A CN 116662570 A CN116662570 A CN 116662570A
Authority
CN
China
Prior art keywords
entity
embedding
knowledge graph
risk identification
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310665418.3A
Other languages
Chinese (zh)
Inventor
赵晶
时俊康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202310665418.3A priority Critical patent/CN116662570A/en
Publication of CN116662570A publication Critical patent/CN116662570A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a heterogeneous graph knowledge graph completion method and system for bank risk assessment, which relate to the technical field of knowledge graph completion and specifically comprise the following steps: acquiring risk identification data of a customer to be evaluated, and constructing risk identification data heterograms according to different semantic association relations; extracting entity embedding through an encoder based on hierarchical attention based on risk identification data iso-composition; simultaneously, vectorizing the original relationship to generate a relationship embedding; processing the entity embedding and the relation embedding by using a decoder, and updating a risk identification knowledge graph; the application uses the graph structure characteristic of the heterogram, updates the characteristic representation of the entity by adding an improved hierarchical attention mechanism, obtains the entity embedding, and realizes better link prediction performance by keeping the translation characteristic between the relation and the entity through an improved decoder.

Description

Heterogeneous graph knowledge graph completion method and system for bank risk assessment
Technical Field
The application belongs to the technical field of knowledge graph completion, and particularly relates to a heterogeneous graph knowledge graph completion method and system for bank risk assessment.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The main purpose of the knowledge graph completion is to predict the part of the triplet missing, at present, the application of the knowledge graph in the bank field is very wide, and based on the characteristics of the knowledge graph in deep perception data, wide interconnection and isolation data and high intelligent sharing analysis data, the knowledge graph can help a practitioner to analyze and decide a business scene, is favorable for establishing a customer portrait, carrying out accurate marketing and obtaining customers, identifying bank risks, finding out behaviors such as credit card cash register, fund stealing and the like, and better expressing and analyzing the transaction full view of a financial business scene.
The bank risk mainly refers to the credit risk of a commercial bank, namely that in the personal credit business handling, borrowers do not fulfill the relevant conventions of the bank according to a certain time, so that the bank cannot withdraw funds on time, and the bank funds are lost; credit risk is an important issue in personal credit businesses, which comes mainly from borrowers' inconsistent revenue and loss of integrity; the concrete steps are as follows: the borrower has false or unreasonable situations in the information left by the bank when applying the loan, or the repayment will be reduced after the loan, the contract can not be performed on time, or the work and the address are changed frequently, or excessive debt burden exists, or the borrowing purpose is ambiguous, etc., and all the factors can increase the credit loss of the bank; under the condition of asymmetric information, borrowers can more easily resort to fraud, fake making and other means to cheat the funds of the bank; the root cause of the problems is that the bank has insufficient knowledge of various information of borrowers and does not know the social relationship characteristics of the borrowers; in addition, banks cannot effectively utilize their own data and a large amount of external data resources as support for a platform, and reasonably use multi-source heterogeneous data to identify personal credit services.
Combining information of different structures by utilizing the knowledge graph to synthesize different data types to obtain a risk identification knowledge graph for identifying bank risks; however, the existing knowledge graph complement has a plurality of problems: when the existing translation model, semantic matching model and neural network model are used for processing the problem of triplet deletion, the triples are independently treated, hidden entity information and rich semantic relations in the neighborhood around the triples are ignored, the model structure is relatively simple, the expression capability is poor, the existing neural network model is easy to ignore the integral structural characteristics of the triples, the relation between the entities and the relations in the low-dimensional space can not be captured, and the conversion characteristics among the triples are ignored; although some graph neural network models can learn richer semantic information from adjacent entities and relations by utilizing the structural characteristics of graph connectivity, the heterogeneous graphs containing different types of entities and relations are not fully considered, and updating of a central entity is complicated with much information; therefore, based on the above problems, constructing the risk identification knowledge graph through the existing knowledge graph has the problems of inaccurate data information, incomplete data and the like, and the data of different structures cannot be fully utilized, so that risk assessment through preparation cannot be performed.
Disclosure of Invention
In order to solve the problems, the application provides a heterogeneous graph knowledge graph completion method and system for bank risk assessment, which utilize the structural characteristics of heterogeneous graphs, adopt the model structure of an encoder-decoder, effectively fuse the information of neighbor entity nodes by adding a hierarchical attention mechanism, update the characteristic representation of a center entity node as an encoder, and finally obtain the score of each entity by an improved decoder Conv-transition, thereby effectively improving the accuracy of risk identification knowledge graph.
According to a first aspect of the embodiment of the present application, there is provided a heterogeneous graph knowledge graph completion method for bank risk assessment, including:
acquiring risk identification data of a customer to be evaluated, and constructing risk identification data heterograms according to different semantic association relations;
extracting entity embedding through an encoder based on hierarchical attention based on risk identification data iso-composition; simultaneously, vectorizing the original relationship to generate a relationship embedding;
processing the entity embedding and the relation embedding by using a decoder, and updating a risk identification knowledge graph;
the encoder based on layered attention comprises an entity attention layer and a semantic attention layer, wherein the entity attention layer aggregates neighbor features of different element paths, the semantic attention layer distinguishes the importance of different element paths, and the encoder aggregates the neighbor features of the element paths in a layered mode to obtain entity embedding.
Further, the risk identification data includes customer credit data, consumption data, and personal information text data;
the client credit data comprises the frequency of loans and the mutual loan relation;
the consumption data comprises consumption frequency and consumption preference of the related platform;
the personal information text data is basic information of the individual, and comprises personal identity information, an academic and a work unit.
Further, the risk identification data is heterograms, entities are taken as nodes, and different semantic association relations among the entities are taken as edges; the two entities are connected by different semantic paths, which are called meta-paths.
Further, the entity attention layer deepens the importance of neighbors of the learning meta-path to each entity in the heterograms, and simultaneously aggregates neighbor entity information to update the embedded representation of the center entity.
Further, the semantic attention layer learns the importance of the semantics and merges entity representations under a plurality of different meta-paths.
Furthermore, the decoder adopts a Conv-transition convolution network, removes a remolding step on the basis of a ConvE model, directly carries out convolution operation on entity embedding and relation embedding, and retains translation characteristics.
Furthermore, the decoder takes the entity embedding and vectorization relation embedding updated by the encoder as input and outputs the scores of the entities, takes the scores as the basis for judging the relevance intensity between the entities, and selects the entity with high scores for triad completion.
The second aspect of the application provides a heterogeneous graph knowledge graph completion system for bank risk assessment.
The heterogeneous graph knowledge graph completion system for bank risk assessment comprises a heterogeneous graph construction module, a feature coding module and a risk identification knowledge graph completion module;
a heterogeneous graph construction module configured to: acquiring risk identification data of a customer to be evaluated, and constructing risk identification data heterograms according to different semantic association relations;
a feature encoding module configured to: extracting entity embedding based on risk identification data heterograms through an encoder based on layered attention, and simultaneously vectorizing the original relationship to generate relationship embedding;
the risk identification knowledge graph completion module is configured to: and processing the entity embedding and the relation embedding by using a decoder, and finally updating the risk identification knowledge graph.
The encoder based on layered attention comprises an entity attention layer and a semantic attention layer, wherein the entity attention layer aggregates neighbor features of different element paths, the semantic attention layer distinguishes the importance of different element paths, and the encoder aggregates the neighbor features of the element paths in a layered mode to obtain entity embedding.
A third aspect of the present application provides a computer readable storage medium having stored thereon a program which when executed by a processor implements the steps in a heterogeneous map knowledge graph completion method for bank risk assessment as described in the first aspect of the present application.
A fourth aspect of the present application provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a heterogeneous map knowledge graph completion method for bank risk assessment according to the first aspect of the present application when the program is executed.
The one or more of the above technical solutions have the following beneficial effects:
according to the application, through the knowledge graph completion method of the heterogeneous graph neural network for improving attention, a knowledge graph completion frame is constructed to realize continuous updating and perfecting of the missing knowledge graph, the attention mechanism is introduced to better understand semantic information between entities and relations in the knowledge graph, and graph structure information is fully utilized, so that knowledge reasoning and link prediction can be more accurately carried out.
The decoder module adopts a Conv-transition method to carry out relationship verification, and the relationship verification mode can verify the correctness of the triples by using the internal data of the knowledge graph; the relationship verification can verify the relationship between the entity pairs to form correct triples, thereby updating the knowledge graph.
Additional aspects of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.
Fig. 1 is a flow chart of a method of a first embodiment.
Fig. 2 is an illustration of a first embodiment of an isomerism representation.
Fig. 3 is a block diagram of the encoder of the first embodiment.
Fig. 4 is a block diagram of a decoder of the first embodiment.
Detailed Description
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The method for supplementing the heterogeneous graph knowledge graph with improved attention is applied to a bank risk assessment knowledge graph, and is mainly applied to a knowledge reasoning stage; risk identification data acquired from banks, credit companies, insurance industries, internet public fraud blacklists, industry blacklist alliances, social media information and the like are fused more efficiently, and risk identification knowledge graphs are perfected, so that practitioners can be more efficiently helped to build customer images, the attention mechanism can be added to better evaluate the credibility of customers through the association relationship of the customers, and richer interaction information related to the customers is captured, so that characteristic representation of the customers is updated, and the transaction risk level is reduced.
Example 1
In one or more embodiments, a heterogeneous graph knowledge graph completion method for bank risk assessment is disclosed, as shown in fig. 1, including the following steps:
step S1: and acquiring risk identification data of the clients to be evaluated, and constructing risk identification data heterograms according to different semantic association relations.
Firstly, risk identification data of a customer to be evaluated, including customer credit data, consumption data and personal information text data, is obtained from channels such as banks, credit investigation companies, insurance industry, internet public fraud blacklist, industry blacklist alliance, social media information and the like.
The client credit data comprises the frequency of borrowing and the mutual lending relation, namely the mutual lending relation comprises the relation between borrowers and guarantor, namely the relation between borrowers and relatives and friends is the relation of father, mother, colleagues, classmates and the like; the consumption data comprises consumption frequency, consumption preference and the like of the related platform; the personal information text data comprises personal basic information, mainly including personal identity information, academic, work units and other text information.
The isograph (Heterogeneous graphs) is a isograph, which is a isograph with only one type and one relationship in the whole graph, and the isograph is a isograph with multiple types of nodes and multiple types of relationships in the whole graph, so in this embodiment, based on the acquired risk identification data, an isograph with entities as nodes and different semantic association relationships among the entities as edges is established, where the semantic association relationships are the multiple types of relationships.
Isomerism is an inherent property of isomerism graphs, that is, the types of entities and edges in an iso-graph are more than one type. In the heterograms, two entities can be connected through different semantic paths, which are called meta-paths; for example: the borrower's characteristics may relate to personal information such as gender, age, and academic school. On the other hand, the borrower's associated attributes may relate to the guarantor and his friends and relatives. Fig. 2 is an exemplary diagram of a heterogram, and as can be seen from fig. 2, the meta-path contains friend 1-borrower-friend 2 and relative 1-borrower-relative 2.
Step S2: extracting entity embedding through an encoder based on hierarchical attention based on risk identification data iso-composition; and simultaneously, vectorizing the original relationship to generate the relationship embedding.
FIG. 3 is a block diagram of an encoder, as shown in FIG. 3, based on hierarchical attention KHAN, including an entity attention layer and a semantic attention layer, the entity attention layer deepens the importance of neighbors of a learning element path to each entity in an abnormal pattern, and at the same time, aggregates neighbor entity information, updates the embedded representation of a center entity, the semantic attention layer learns the importance of semantics and merges entity representations under a plurality of semantics, and the encoder aggregates neighbor features of the element path in a hierarchical manner to obtain entity embedding, and two layers are described in detail below:
(1) Physical attention layer
Before the meta-path neighbor information is aggregated, the neighbor entity node of each meta-path should be noted first, and in the embedded learning of a specific meta-path, the different roles played and the importance represented are performed. The addition of the entity-level attention is mainly used for deepening the importance of the neighbor entity nodes of the learning meta-path to each entity node in the risk identification data iso-graph, and meanwhile, the embedded representation of the entity node of the center is updated by aggregating neighbor entity node information.
The input to this layer is a set of node features,wherein N is the number of the entity nodes,for the features of the i-th entity node, at least one learnable linear transformation is required in order to obtain sufficient expressive power to transform the input features into higher-level features; for this purpose, as an initial step, a matrix of weights is usedPerforming parameterized shared linear transformation, applying to each entity node, generating a new set of entity node characteristics +.>The formula is:
wherein F is the feature number of each entity node.
Given any group of entity node pairs (j) which can be connected through meta-paths, the importance of the entity node j for the entity node i can be learned through an adaptive attention mechanism, and the updated entity node is usedThe representation is made of a combination of a first and a second color,the formula can be expressed as:
wherein h' i ,h′ j Representing the characteristics of entity node i and entity node j, respectively, Φ representing a given meta-path, att node () Represented as a deep neural network performing adaptive attention, given a primitive path Φ, att node () Shared to all the entity node pairs based on the meta-path. It is particularly noted that entity node i and entity node j are asymmetric, the importance of entity node i to entity node j being different from the importance of entity node j to entity node i.
It follows that the entity level attention mechanism can maintain asymmetry, which is an important property of iso-patterning; the structural information is then added to the attention mechanism by requiring only the computation of the entity nodesIs->Wherein->Representing the neighbor of the entity node i based on the meta-path, and obtaining the weight coefficient +_ by carrying out normalization operation through softmax after obtaining the importance between the entity node pair based on the meta-path>
Where σ () is an activation function, ||represents a join operation, a Φ Is the physical node level attention vector of the meta-path Φ.
As can be derived from equation (3), the weight coefficient between the pair of entity nodes is mainly dependent on the characteristics between them, and it should be noted that the importance of the entity-level attention vector of the meta-path is different for different entity nodes, not only because of the different connection order in the entity nodes, but also because the neighboring nodes of different entity nodes will also be different, so that the normalized terms (denominators) will be very different.
UsingWeighting the characteristics after sharing the linear transformation to obtain the embedding of the entity node i based on the meta-path>
Wherein, the liquid crystal display device comprises a liquid crystal display device,is a learning embedding for the entity node i of the meta-path Φ.
Due toAnd the meta-path are associated, thus +.>Semantic information represented in the meta-path can be captured, and embedding based on the meta-path can also be called semantic-related embedding; repeating the focusing operation for K times, and connecting the learned embeddings into semantic specific embeddings, wherein the specific formula is as follows:
where σ () is an activation function, || represents a join operation.
Given the set of meta-paths { Φ ] 1 ,……,Φ p After the entity node characteristics are introduced into the entity level attention, a semantic specific entity node embedded group can be obtained, expressed as
(2) Semantic attention layer
For a given set of meta-paths { Φ ] 1 ,……,Φ p In order to learn more comprehensive entity node embedding, a new semantic level attention is put forward to automatically learn the importance of different element paths and fuse them into specific tasks. Semantic level attention is used to learn entity node representations under different semantics. The importance of the semantics can be learned through the semantic level attention, and entity node representations under a plurality of semantics are fused; formalized description of semantic level attention is as follows:
wherein t is sem () Representing a deep neural network that performs semantic level attention.
Specifically, the importance of each semantic (meta-path) is learned using a single layer neural network and semantic level attention vectors and normalized by softmax.
To learn the importance of each meta-path, a non-linear transformation is used to transform the particular semantic embeddings, and then the similarity of the transformed embeddings to the semantic-level attention vector q is taken as the importance of the semantic-specific embeddings, and furthermore, the importance of all semantic-specific entity node embeddings is averaged. The importance of each meta-path is expressed asThe formula is as follows:
where is the weight matrix, b is the bias vector, q is the semantic level attention vector. After the importance of each meta-path is obtained, they are normalized by the softmax function. Meta-path phi p Is expressed as the weight ofThe importance of all meta-paths can be normalized by using the softmax function to obtain:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing meta-path Φ p Weight of->The importance of each meta-path.
It is obvious that the process is not limited to,the higher the meta-path Φ p The more important. Note that for different tasks, meta-path Φ p Possibly with different weights. The learned weights are used as coefficients, and the semantic specific embeddings can be fused to obtain a final embedment Z, wherein the final embedment is formed by aggregation of all semantic specific embeddings, and the formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing meta-path Φ p Weight of->Representing semantic-specific node embedding.
Step S3: and processing the entity embedding and the relation embedding by using a decoder, and finally updating the risk identification knowledge graph.
Through the improved decoder Conv-transition, the decoder is mainly optimized for a ConvE model, a remolding process is omitted, a plurality of convolution kernels are used for convolution to obtain a feature map, then a long vector is spliced, the feature map is output through linear transformation and multiplied by an entity embedding matrix, and a score function layer is further used for obtaining the score of each entity, so that the accuracy of risk identification knowledge maps is effectively improved.
By learning the existing ConvE model, it is found that the ConvE model does not merge the consistency Structure in the knowledge graph into the space, and because in ConvE, after the header entity and the relation are remodeled, an input matrix is generated and fed to the convolution layer through connection operation, the ConvE does not maintain flatness like a TransE, and is inspired by a model Structure-Aware Convolutional Network (SCAN), the Conv-TransE is adopted as a decoder, the ConvE remodelling step is eliminated, and the convolution filter is directly operated on the same dimension of the entity and the relation, so that the tranE flatness is maintained.
And the decoder takes the entity embedding and vectorization relation embedding after the updating of the encoder as input, the output result is the score of the entity, and the higher the score is, the higher the relevance among the entities is considered, so that the bank risk identification knowledge graph is more accurate.
Fig. 4 is a block diagram of a decoder, and as shown in fig. 4, the decoder first directly performs convolution operation on the embedding of the entity and the relation to generate a feature map, then performs full-connection operation, and finally multiplies the full-connection result by all the embedding results to realize a scoring method.
For a decoder, the input is two embedding matrices, one is the entity embedding of the encoder output, and one is the relation embedding, which is the scoring of the output entity by representing the original relation vector in the same dimension as the entity embedding of the output.
The convolution in the decoder is calculated as follows:
where K represents the core width, n is the index of the entry in the output vector, n ε [0,F ] L -1],F L Refers to the physical embedding dimension of the output, which is the kernel parameter omega c Representing trainable;and->Representing entity inserts e, respectively s And relation embedding e r Is a filled version of (a); if the kernel dimension s is odd, the first +.>Component and last->The component is filled with 0, hereRefers to the lower limit of the return value; otherwise, first->And last->The components being filled with 0's, the other content being directly from e s And e r And (5) copying.
Finally, the scoring function of the decoder after nonlinear convolution is as follows:
ψ(e s ,e 0 )=f(vec(M(e s ,e r ))W)e 0 (10)
wherein, the liquid crystal display device comprises a liquid crystal display device,is a matrix of linear transformations, f representing a nonlinear function. Remodeling the feature map matrix to a vector +.>And projected to F using W for linear transformation L Dimensional space.
The effect of the method of this example was verified by experiments as follows.
Four reference data sets (FB 15k, WN18, FB15k-237 and WN18 RR) were selected for this experiment and detailed information for these data sets is shown in Table 1.
Table 1 details of the dataset
FB15k is a commonly used link predicted dataset that contains all entities mentioned more than 100 times in FreeBase, and will also be converted into a set of binary edges with a materialized n-gram, which greatly affects the structure and semantics of the graph.
WN18 is extracted from WordNet3, and to construct WN18, wordNet3 is used as a starting point, and then entities and relationships that are mentioned too few times are iteratively filtered out.
FB15k-237 is a subset of the FB15k dataset that inspiration from observations of FB15k experiencing test leakage, including test data seen by the model while training.
The WN18RR is a subset of the WN18, also after observing test leaks in the WN 18.
With respect to the evaluation protocol, using the H is@N and MRR values, for each test set (head entity, relationship, tail entity), the head entity and tail entity are replaced with all entities in the data set to calculate a score. For knowledge graph completion tasks, processing triples of the same score is strict and fair. Two standard metrics were used to evaluate performance: average reciprocal ranking (MRR), and the proportion of ranking scores in all test triples of n=1, 3, and 10 (H is@n); MRR is the average of the reciprocal scores of all test samples that predict the correct sample; h is @ N is the ratio of the score of the predicted correct sample to the k-th score in all test samples.
Following a two-step training procedure, namely, firstly training an encoder to encode information about graph entities and relationships, and then training Conv-transform as a decoder model to perform a relationship prediction task; the encoder can update the central entity by aggregating neighbor entity semantics through the meta-path; all parameters were optimized using Adam, the initial learning rate was set to 0.001, and the last layer of entity and relationship embedding was set to 200.
By evaluating the model by adopting a link prediction task at the references WN18, FB15K, WN RR and FB15K-237 and adopting a triplet classification task at the references WN11 and FB13, experimental results show that compared with the existing model, the model provided by the application has obvious improvement in the aspect of entity embedding interaction.
Example two
In one or more embodiments, a heterogeneous graph knowledge graph completion system for bank risk assessment is disclosed, comprising a construction module, a coding module, and a completion module:
a build module configured to: acquiring risk identification data of a customer to be evaluated, and constructing risk identification data heterograms according to different semantic association relations;
an encoding module configured to: extracting entity embedding through an encoder based on hierarchical attention based on risk identification data iso-composition; simultaneously, vectorizing the original relationship to generate a relationship embedding;
a complement module configured to: processing the entity embedding and the relation embedding by using a decoder, and updating a risk identification knowledge graph;
the encoder based on layered attention comprises an entity attention layer and a semantic attention layer, wherein the entity attention layer aggregates neighbor features of different element paths, the semantic attention layer distinguishes the importance of different element paths, and the encoder aggregates the neighbor features of the element paths in a layered mode to obtain entity embedding.
Example III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a heterogeneous map knowledge graph completion method for bank risk assessment as described in the first embodiment of the present disclosure.
Example IV
An object of the present embodiment is to provide an electronic apparatus.
An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements steps in a heterogeneous map knowledge graph completion method for bank risk assessment according to an embodiment of the present disclosure when the program is executed.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. The heterogeneous graph knowledge graph completion method for bank risk assessment is characterized by comprising the following steps of:
acquiring risk identification data of a customer to be evaluated, and constructing risk identification data heterograms according to different semantic association relations;
extracting entity embedding through an encoder based on hierarchical attention based on risk identification data iso-composition; simultaneously, vectorizing the original relationship to generate a relationship embedding;
processing the entity embedding and the relation embedding by using a decoder, and updating a risk identification knowledge graph;
the encoder based on layered attention comprises an entity attention layer and a semantic attention layer, wherein the entity attention layer aggregates neighbor features of different element paths, the semantic attention layer distinguishes the importance of different element paths, and the encoder aggregates the neighbor features of the element paths in a layered mode to obtain entity embedding.
2. A heterogeneous graph knowledge graph completion method for bank risk assessment as claimed in claim 1, wherein the risk identification data includes customer credit data, consumption data and personal information text data;
the client credit data comprises the frequency of loans and the mutual loan relation;
the consumption data comprises consumption frequency and consumption preference of the related platform;
the personal information text data is basic information of the individual, and comprises personal identity information, an academic and a work unit.
3. The heterogeneous graph knowledge graph completion method for bank risk assessment according to claim 1, wherein the risk identification data is heterograph, entities are used as nodes, and different semantic association relations among the entities are used as edges; the two entities are connected by different semantic paths, which are called meta-paths.
4. A heterogeneous graph knowledge graph completion method for bank risk assessment as claimed in claim 3 wherein the entity attention layer deepens the importance of neighbors of the learning meta-path to each entity in the heterograph while aggregating neighbor entity information to update the embedded representation of the central entity.
5. A heterogeneous graph knowledge graph completion method for bank risk assessment as claimed in claim 3 wherein the semantic attention layer learns the importance of semantics and merges entity representations under multiple different meta-paths.
6. The method for supplementing the knowledge graph of the heterogeneous graph for bank risk assessment according to claim 1, wherein the decoder adopts a Conv-trans E convolution network, removes a remolding step on the basis of a ConvE model, directly carries out convolution operation on entity embedding and relation embedding, and retains translation characteristics.
7. The method for reinforcing the knowledge graph of the heterogeneous graph for evaluating the bank risk according to claim 1, wherein the decoder takes the entity embedding and vectorized relation embedding after the updating of the encoder as input, outputs the scores of the entities, takes the scores as the basis for judging the strength of the relativity between the entities, and selects the entity with the high score for reinforcing the triplet.
8. The heterogeneous graph knowledge graph completion system for bank risk assessment is characterized by comprising a construction module, a coding module and a completion module:
a build module configured to: acquiring risk identification data of a customer to be evaluated, and constructing risk identification data heterograms according to different semantic association relations;
an encoding module configured to: extracting entity embedding through an encoder based on hierarchical attention based on risk identification data iso-composition; simultaneously, vectorizing the original relationship to generate a relationship embedding;
a complement module configured to: processing the entity embedding and the relation embedding by using a decoder, and updating a risk identification knowledge graph;
the encoder based on layered attention comprises an entity attention layer and a semantic attention layer, wherein the entity attention layer aggregates neighbor features of different element paths, the semantic attention layer distinguishes the importance of different element paths, and the encoder aggregates the neighbor features of the element paths in a layered mode to obtain entity embedding.
9. An electronic device, comprising:
a memory for non-transitory storage of computer readable instructions; and
a processor for executing the computer-readable instructions,
wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-7.
10. A storage medium, characterized by non-transitory storing computer-readable instructions, wherein the instructions of the method of any one of claims 1-7 are performed when the non-transitory computer-readable instructions are executed by a computer.
CN202310665418.3A 2023-06-05 2023-06-05 Heterogeneous graph knowledge graph completion method and system for bank risk assessment Pending CN116662570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310665418.3A CN116662570A (en) 2023-06-05 2023-06-05 Heterogeneous graph knowledge graph completion method and system for bank risk assessment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310665418.3A CN116662570A (en) 2023-06-05 2023-06-05 Heterogeneous graph knowledge graph completion method and system for bank risk assessment

Publications (1)

Publication Number Publication Date
CN116662570A true CN116662570A (en) 2023-08-29

Family

ID=87727556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310665418.3A Pending CN116662570A (en) 2023-06-05 2023-06-05 Heterogeneous graph knowledge graph completion method and system for bank risk assessment

Country Status (1)

Country Link
CN (1) CN116662570A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454979A (en) * 2023-10-26 2024-01-26 上海歆广数据科技有限公司 Individual case map updating method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454979A (en) * 2023-10-26 2024-01-26 上海歆广数据科技有限公司 Individual case map updating method and system
CN117454979B (en) * 2023-10-26 2024-04-19 上海峻思寰宇数据科技有限公司 Individual case map updating method and system

Similar Documents

Publication Publication Date Title
WO2019196546A1 (en) Method and apparatus for determining risk probability of service request event
TW202032422A (en) Neural network system and method for analyzing relational network diagram
US7930242B2 (en) Methods and systems for multi-credit reporting agency data modeling
CN114036307B (en) Knowledge graph entity alignment method and device
US11720615B2 (en) Self-executing protocol generation from natural language text
CN113609345B (en) Target object association method and device, computing equipment and storage medium
Zhou et al. Disentangled network alignment with matching explainability
CN115221413B (en) Sequence recommendation method and system based on interactive graph attention network
CN113761250A (en) Model training method, merchant classification method and device
CN116662570A (en) Heterogeneous graph knowledge graph completion method and system for bank risk assessment
CN116843400A (en) Block chain carbon emission transaction anomaly detection method and device based on graph representation learning
Rao et al. Credit risk assessment mechanism of personal auto loan based on PSO-XGBoost Model
CN114330499A (en) Method, device, equipment, storage medium and program product for training classification model
Muñoz-Cancino et al. On the combination of graph data for assessing thin-file borrowers’ creditworthiness
CN116401379A (en) Financial product data pushing method, device, equipment and storage medium
Chen et al. Refined analysis and a hierarchical multi-task learning approach for loan fraud detection
CN113553446A (en) Financial anti-fraud method and device based on heteromorphic graph deconstruction
CN112149413A (en) Method and device for identifying state of internet website based on neural network and computer readable storage medium
CN115344794A (en) Scenic spot recommendation method based on knowledge map semantic embedding
CN115099988A (en) Model training method, data processing method, device and computer medium
CN115713248A (en) Method for scoring and evaluating data for exchange
Cheng et al. BHONEM: Binary high-order network embedding methods for networked-guarantee loans
CN114529399A (en) User data processing method, device, computer equipment and storage medium
Wu et al. Phrase-level attention network for few-shot inverse relation classification in knowledge graph
CN111291196A (en) Method and device for improving knowledge graph and method and device for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination