CN111951079A - Credit rating method and device based on knowledge graph and electronic equipment - Google Patents

Credit rating method and device based on knowledge graph and electronic equipment Download PDF

Info

Publication number
CN111951079A
CN111951079A CN202010820772.5A CN202010820772A CN111951079A CN 111951079 A CN111951079 A CN 111951079A CN 202010820772 A CN202010820772 A CN 202010820772A CN 111951079 A CN111951079 A CN 111951079A
Authority
CN
China
Prior art keywords
target
target object
risk
credit rating
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010820772.5A
Other languages
Chinese (zh)
Other versions
CN111951079B (en
Inventor
张宾
孙喜民
周晶
李慧超
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid E Commerce Co Ltd
State Grid E Commerce Technology Co Ltd
Original Assignee
State Grid E Commerce Co Ltd
State Grid E Commerce Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid E Commerce Co Ltd, State Grid E Commerce Technology Co Ltd filed Critical State Grid E Commerce Co Ltd
Priority to CN202010820772.5A priority Critical patent/CN111951079B/en
Publication of CN111951079A publication Critical patent/CN111951079A/en
Application granted granted Critical
Publication of CN111951079B publication Critical patent/CN111951079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a method and a device for credit rating based on a knowledge graph and electronic equipment, wherein the method comprises the following steps: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension; and performing risk identification on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.

Description

Credit rating method and device based on knowledge graph and electronic equipment
Technical Field
The application relates to the technical field of deep learning, in particular to a method and a device for credit rating based on a knowledge graph and electronic equipment.
Background
At present, research on enterprise credit rating is mainly focused in the field of risk analysis related to enterprises, and with the increase of complexity of risk data, the analysis requirement of the enterprise is improved, so that application of deep learning becomes the research frontier in the field of risk management, and subversion is bound to be generated in the field of risk management. Deep learning is derived through the development of artificial neural networks, and comprises a complex and multi-level learning structure, and the establishment of the deep learning structure is based on a learning mechanism simulating the human brain. The deep learning model inputs new features into the next layer through learning each data feature, and the new features are obtained through specific feature transformation on the learned data features in the process, so that the prediction effect of the model is improved.
In the actual rating process, the risk prediction indexes are various and are mutually associated, the indexes are directly used as features for classification regression, as the more complete and deeper features are not utilized, and the model trained by a conventional machine learning method cannot have a good prediction effect, the risk indexes can be quantized and used as the input of a neural network to extract the deeper features of the risk indexes, then the extracted features are input into a classifier or a regression model for risk early warning model training, and after adjustment and super-reference and multi-round iterative training, an optimal risk early warning model is obtained and is deployed on a related platform for use, and enterprise operation risk, stock risk, foreign exchange risk and the like are predicted.
In the above various risk prediction implementation schemes based on machine learning, although machine learning provides a reliable and convincing prediction capability for people, the difficulty of enterprise credit rating is the acquisition capability and unified associated representation of multi-dimensional data, which cannot be realized by the traditional machine learning technology. Therefore, the current risk prediction implementation scheme has the technical problem of low prediction accuracy due to the fact that high-dimensional features cannot be obtained.
Disclosure of Invention
In view of the above, the present application provides a method, an apparatus and an electronic device for credit rating based on a knowledge graph, which includes:
a method for knowledge-graph based credit rating, the method comprising:
obtaining a target corpus, wherein the target corpus comprises a plurality of sentences;
extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension;
and performing risk identification on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.
The above method, preferably, the knowledge-graph is obtained by:
reading structured data stored in a relational database, the structured data being associated with at least one target object;
and converting the structured data into ternary data by using a preset mapping relation between the structured data and the triples so as to obtain the knowledge graph.
The above method, preferably, the knowledge-graph is obtained by:
acquiring a target page related to at least one target object in an industry website by using a preset word corresponding to the target object;
reading page content in the target page;
and generating ternary group data according to the page content to obtain the knowledge graph.
In the method, preferably, the target page at least includes a first page associated with the preset word and a second page obtained by performing in-station acquisition on the first page.
In the above method, preferably, the risk identification model is obtained by training in the following manner:
obtaining a plurality of training feature word sets with credit rating labels; the training feature word set is a feature word set obtained by utilizing the knowledge graph to extract words of sentences in the training corpus; the training feature word set comprises training feature words on at least one information dimension;
and taking the training feature words on each information dimension as input samples of corresponding risk recognition models, taking the credit rating labels of the training feature word set as output samples of the risk recognition models, and training the risk recognition models.
In the method, preferably, the risk recognition model performs risk recognition on the training feature word set corresponding to the training corpus to obtain a difference between a credit rating test result and a credit rating label corresponding to the training corpus, where the difference is greater than or equal to a preset threshold.
A knowledge-graph based credit rating apparatus, the apparatus comprising:
the corpus acquiring unit is used for acquiring a target corpus, and the target corpus comprises a plurality of sentences;
the characteristic extraction unit is used for extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension;
and the risk identification unit is used for carrying out risk identification on the feature words of the target object in each information dimension by utilizing a risk identification model corresponding to each information dimension so as to obtain a credit rating result corresponding to each information dimension of the target object, the risk identification model is obtained by utilizing a plurality of training feature word sets with credit rating labels for training, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.
The above apparatus, preferably, further comprises:
a first graph building unit for reading structured data stored in a relational database, the structured data being related to at least one target object; and converting the structured data into ternary data by using a preset mapping relation between the structured data and the triples so as to obtain the knowledge graph.
The above apparatus, preferably, further comprises:
the second map building unit is used for obtaining a target page related to at least one target object in an industry website by using a preset word corresponding to the target object; reading page content in the target page; and generating ternary group data according to the page content to obtain the knowledge graph.
An electronic device, comprising:
the memory is used for storing the application program and data generated by the running of the application program;
a processor for executing the application to implement: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension; and performing risk identification on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension.
According to the scheme, after the target corpus is obtained, the pre-constructed knowledge graph is used for extracting words from each sentence in the target corpus to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one information dimension, and therefore risk identification is performed on the feature words of the target object in each information dimension by using the risk identification model corresponding to each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so that the credit risk of the corresponding target object in the corresponding information dimension is represented. Therefore, the knowledge graph is used for extracting the feature words in multiple information dimensions, so that the feature content input into the deep learning model is enriched, and the accuracy of the obtained credit rating result is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart of a method for rating credit based on a knowledge-graph according to an embodiment of the present application;
FIGS. 2-3 are partial flow charts of a first embodiment of the present application;
FIG. 4 is a schematic structural diagram of a knowledge-graph-based credit rating device according to a second embodiment of the present application;
fig. 5-6 are schematic structural diagrams of another embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application;
FIG. 8 is an exemplary flowchart illustrating web content retrieval according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a convolutional neural network in an embodiment of the present application;
fig. 10 is a diagram of a system architecture implemented by the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, a flowchart of an implementation of a method for credit rating based on a knowledge graph is provided in an embodiment of the present application, and the method is applied to an electronic device capable of data processing, such as a computer or a server. The technical scheme in the embodiment is mainly used for rating the credit of a target object such as a business or an individual.
Specifically, the method in this embodiment may include the following steps:
step 101: and obtaining the target corpus.
Wherein, the target language material comprises a plurality of sentences. For example, the target corpus is a news segment, or the target corpus is a summary report, or the target corpus is a speech manuscript, etc.
It should be noted that the sentences in the target corpus describe target objects to be rated, such as enterprises or individuals, and in addition, the sentences in the target corpus also describe contents related to the target objects. For example, the statements in the target corpus describe the related content of a certain enterprise in multiple information dimensions, such as the related content on the business status, the registered capital, the financial index, etc. in the business situation dimension, and further such as the related content on the corporate shareholder change, the external investment, etc. in the business information dimension.
Step 102: and extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus.
The plurality of feature words corresponding to the target corpus comprise feature words of at least one target object on at least one information dimension. For example, the feature words corresponding to the target corpus include feature words of the enterprise a in the business situation dimension and feature words of the enterprise B in the business situation and the business information dimension.
Specifically, in this embodiment, a knowledge graph including a plurality of triple data may be pre-constructed, where the triple data may be a relational triple, such as an entity-relationship-entity triple, or an attribute-type triple, such as an entity-attribute-value triple, and the triple data includes a plurality of triples of enterprises in a plurality of information dimensions. Based on this, in this embodiment, the triple data in the knowledge graph is used to extract a word from each statement in the target corpus, and further extract a plurality of feature words corresponding to the target corpus, such as a relation triple that "enterprise a" has "investment" for "enterprise B", and then an attribute triple that "enterprise a" is a "sales" type company and has a sales amount of "100 ten thousand", and so on.
Step 103: and performing risk identification on the feature words of the target object on each information dimension by using the risk identification model corresponding to each information dimension to obtain a credit rating result corresponding to each information dimension of the target object.
The risk identification model is obtained by utilizing a plurality of training feature word sets with credit rating labels for training, and the credit rating result of the target object on one information dimension finally obtained represents the credit risk of the corresponding target object on the corresponding information dimension.
For example, in this embodiment, a plurality of risk identification models are pre-constructed, each risk identification model corresponds to one information dimension, such as a risk identification model corresponding to an operation status dimension and a risk identification model corresponding to an industrial and commercial information dimension, and the like, then the risk identification models in the corresponding information dimension are trained by using a plurality of training feature word sets with credit rating labels corresponding to each information dimension, the trained risk identification models can rate the credit of the target object in the corresponding information dimension to obtain a credit rating result of the target object in the information dimension, and the credit rating result of the target object in the information dimension can represent the credit risk of the target object in the corresponding information dimension, and if the credit rating result of the enterprise a in the operation status dimension represents that the credit risk of the enterprise a in the operation status dimension is higher, the credit rating result of the enterprise B in the industrial and commercial information dimension represents that the credit risk of the enterprise B in the industrial and commercial information dimension is low, and the like.
In an implementation manner, the risk identification model in this embodiment may be a deep learning model constructed based on a machine learning algorithm, such as a deep learning model constructed based on a convolutional Neural network cnn (convolutional Neural networks), or the like.
According to the above scheme, in the knowledge graph-based credit rating method provided in the embodiment of the present application, after a target corpus is obtained, a pre-constructed knowledge graph is used to perform word extraction on each sentence in the target corpus to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one information dimension, and thus, a risk identification model corresponding to each information dimension is used to perform risk identification on the feature words of the target object in each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so as to represent the credit risk of the corresponding target object in the corresponding information dimension. Therefore, in the embodiment, the feature words in multiple information dimensions are extracted by using the knowledge graph, so that the feature content input into the deep learning model is enriched, and the accuracy of the obtained credit rating result is improved.
In one implementation, the knowledge-graph in the present embodiment may be obtained by the following method, as shown in fig. 2:
step 201: structured data stored in a relational database is read.
The relational database is a database storing structured data related to target objects, for example, the registration database includes structured data of stores, brands, users, and the like, and the structured data is related to at least one target object, such as a business, an individual, and the like.
Specifically, in this embodiment, structured data such as tables and columns in the relational database may be read in a stack or queue manner.
Step 202: and converting the structured data into ternary data by using a preset mapping relation between the structured data and the triples so as to obtain the knowledge graph.
In a specific implementation, the preset mapping relationship in this embodiment may be understood as a mapping specification mapped from the relational database to the semantic data, and specifically, a visual specification configuration tool may be used to configure the preset mapping relationship between the structured data and the triple. Specifically, in this embodiment, by analyzing the basic structure in the structured data and the structure of the triple of the knowledge graph, for example, analyzing the meaning of each table, the association between tables, the entity and the entity attribute in the triple, and the like, a preset mapping relationship between the structured data and the triple is configured, for example, the user table in the database corresponds to the concept of a person in the knowledge graph, the phone field in the table in the database corresponds to the attribute of the contact manner defined on the person in the knowledge graph, and the like. Based on this, when the structured data is converted into the triple data, the preset mapping relation is utilized to map the elements in the rows and columns in the table to the elements such as the entities, the entity relations or the entity attributes in the triple, so that the triple data is obtained, and the knowledge graph is further formed.
In one implementation, the knowledge-graph in this embodiment can be supplemented or enriched by the following means, as shown in fig. 3:
step 301: and acquiring a target page related to the target object in the industry website by using the preset words corresponding to the at least one target object.
In this embodiment, based on a preset seed vocabulary, that is, a preset word, which can represent the industry where the target object is located, a search engine or a search interface or the like may be used to perform a page search on an industry website (including a page of an industry knowledge base) to obtain a target page related to the target object.
In an implementation manner, the target page may include only a first page associated with a preset word, such as a page directly including the preset word, or the target page may further include a second page obtained by performing in-station acquisition on the first page, that is, a page corresponding to a link included in the first page, and so on.
Specifically, in this embodiment, a search engine or a search interface may be used to search for a first page including a preset word, and then the first page is acquired in-station, and the maximum depth of acquisition is set to 3 layers, that is, from the first page, a depth-first acquisition policy is used to acquire 3 layers in total. In other implementations, the acquisition depth may also be set to other values, such as 2-layer or 4-layer, etc.
Step 302: and reading the page content in the target page.
In this embodiment, a crawler or other technologies may be used to obtain page content in the target page to obtain content such as text therein.
Step 303: and generating ternary group data according to the page content to obtain the knowledge graph.
The page content can be subjected to triple extraction by using a pre-constructed and trained triple extraction model to obtain triple data, so that the knowledge graph is formed. The triple extraction model can be a model constructed based on a deep learning algorithm, and training is performed by using training sentence samples with triple labels, so that the trained triple extraction model can perform triple extraction on the sentences to obtain corresponding triple data, and the triple data is added to the knowledge graph.
In an implementation manner, the risk recognition model in this embodiment may be obtained by training in the following manner:
firstly, obtaining a plurality of training feature word sets with credit rating labels, wherein each training feature word set can be a feature word set obtained by utilizing a knowledge graph to extract words of sentences in corresponding training linguistic data;
it should be noted that the training feature word set herein includes training feature words on multiple information dimensions;
and then, taking the training feature words on each information dimension as input samples of the risk recognition model corresponding to the information dimension, taking the credit rating labels of the training feature word set as output samples of the risk recognition model on the information dimension, and training the risk recognition model.
Specifically, in this embodiment, the training feature words in each information dimension are input into the risk identification model corresponding to the corresponding information dimension, and a credit rating test result output by the risk identification model for the input training feature words is obtained, then, the credit rating test result is compared with the credit rating label, and the model parameters of the risk identification model are adjusted according to the difference value represented by the comparison result, so that the loss function of the risk identification model is reduced, and the ranking is performed until the loss function is converged, and the training is completed.
Further, in this embodiment, in order to improve the accuracy of the training samples, before the risk identification model is trained, the difficult samples are screened out. That is to say, the corpus participating in the risk model training is a sample corpus with higher accuracy, and at this time, the risk recognition model performs risk recognition on the training feature word set corresponding to the corpus to obtain a credit rating test result, and the difference between the credit rating test result and the credit rating label corresponding to the corpus is greater than or equal to the preset threshold.
In specific implementation, in this embodiment, the risk recognition model may be used to perform test training on the small-risk sample, and then after a preset threshold is obtained according to a test result, the preset threshold is used to screen out the training corpora participating in training, and after repeated iterative training of the corpora, the risk recognition model is finally obtained.
Referring to fig. 4, a schematic structural diagram of a knowledge-graph-based credit rating apparatus provided in the second embodiment of the present application, the apparatus being suitable for use in an electronic device capable of data processing, such as a computer or a server. The technical scheme in the embodiment is mainly used for rating the credit of a target object such as a business or an individual.
Specifically, the apparatus in this embodiment may include the following units:
a corpus obtaining unit 401, configured to obtain a target corpus, where the target corpus includes a plurality of sentences;
a feature extraction unit 402, configured to perform word extraction on each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one risk dimension;
a risk identification unit 403, configured to perform risk identification on the feature words of the target object in each risk dimension by using a risk identification model corresponding to each risk dimension to obtain a credit rating result corresponding to each risk dimension of the target object, where the risk identification model is obtained by using a plurality of training feature word sets with credit rating labels for training, and the credit rating result represents the credit risk level of the corresponding target object in the corresponding risk dimension.
According to the above scheme, in the knowledge graph-based credit rating device provided in the second embodiment of the present application, after the target corpus is obtained, the pre-constructed knowledge graph is used to perform word extraction on each sentence in the target corpus to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one information dimension, and thus, a risk identification model corresponding to each information dimension is used to perform risk identification on the feature words of the target object in each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so as to represent the credit risk of the corresponding target object in the corresponding information dimension. Therefore, in the embodiment, the feature words in multiple information dimensions are extracted by using the knowledge graph, so that the feature content input into the deep learning model is enriched, and the accuracy of the obtained credit rating result is improved.
In one implementation, the apparatus in this embodiment may further include the following units, as shown in fig. 5:
a first graph construction unit 404, configured to read structured data stored in a relational database, the structured data being related to at least one target object; and converting the structured data into ternary data by using a preset mapping relation between the structured data and the triples so as to obtain the knowledge graph.
The second map building unit 405 is configured to obtain a target page related to at least one target object in an industry website by using a preset word corresponding to the target object; reading page content in the target page; and generating ternary group data according to the page content to obtain the knowledge graph.
Optionally, the target page at least includes a first page associated with the preset word and a second page obtained by performing in-station acquisition on the first page.
In another implementation, the apparatus in this embodiment may further include the following units, as shown in fig. 6:
a model training unit 406, configured to obtain a plurality of training feature word sets with credit rating labels; the training feature word set is a feature word set obtained by utilizing the knowledge graph to extract words of sentences in the training corpus; the training feature word set comprises training feature words on at least one information dimension; and taking the training feature words on each information dimension as input samples of corresponding risk recognition models, taking the credit rating labels of the training feature word set as output samples of the risk recognition models, and training the risk recognition models.
Optionally, the risk recognition model performs risk recognition on the training feature word set corresponding to the training corpus to obtain a credit rating test result, and a difference between the credit rating test result and the credit rating label corresponding to the training corpus is greater than or equal to a preset threshold.
It should be noted that, for the specific implementation of each unit in the present embodiment, reference may be made to the corresponding content in the foregoing, and details are not described here.
Referring to fig. 7, a schematic structural diagram of an electronic device according to a third embodiment of the present disclosure is provided, where the electronic device may be an electronic device capable of performing data processing, such as a computer or a server. The technical scheme in the embodiment is mainly used for rating the credit of a target object such as a business or an individual.
Specifically, the electronic device in this embodiment may include the following structure:
a memory 701 for storing an application program and data generated by the application program;
a processor 702 for executing the application to implement: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one risk dimension; and performing risk identification on the feature words of the target object in each risk dimension by using a risk identification model corresponding to each risk dimension to obtain a credit rating result of the target object in each risk dimension, wherein the risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension.
According to the above scheme, in the electronic device provided in the third embodiment of the present application, after the target corpus is obtained, the pre-constructed knowledge graph is used to perform word extraction on each sentence in the target corpus to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one information dimension, and thus, a risk identification model corresponding to each information dimension is used to perform risk identification on the feature words of the target object in each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so as to represent the credit risk of the corresponding target object in the corresponding information dimension. Therefore, in the embodiment, the feature words in multiple information dimensions are extracted by using the knowledge graph, so that the feature content input into the deep learning model is enriched, and the accuracy of the obtained credit rating result is improved.
It should be noted that, in the present embodiment, reference may be made to the corresponding contents in the foregoing, and details are not described here.
Taking the enterprise credit rating by using the technical scheme of the application as an example, the technical scheme of the application is exemplified:
firstly, the semantic representation and understanding problem of multi-source heterogeneous data is solved by introducing a knowledge graph technology, and the credit scoring effectiveness of large-data enterprises is improved. Specifically, the implementation of the technical scheme of the application is mainly divided into two parts: and constructing an enterprise knowledge graph and realizing a credit scoring system based on the enterprise knowledge graph. The following were used:
1. construction of enterprise knowledge graph
The construction of the enterprise knowledge graph basically adopts structured data related to enterprises and businesses and various vertical site data in the Internet as data sources. Has the following characteristics:
(1) the industry coverage is wide and the industry depth is considerable. The data sources are all from data which are strongly related to the enterprise, and the data relevance is closely combined with the enterprise;
(2) the reliability is high: the internal structured data of the enterprise is usually used for supporting the business of the enterprise, so the reliability is very high; the enterprise data is stored in the relational database, and the structured ternary group data can be obtained only by converting the relational data to a certain degree, so that the reliability is good.
(3) The structure is strong: for internal structured data, the vast majority are stored via relational databases; the open industry data is basically edited and published by a high-quality website, and the structure is good.
When the enterprise knowledge graph is constructed, a data mode can be predefined, and a top-down knowledge graph mode is adopted. The data pattern is the most core part in the knowledge graph, and after the data pattern is defined, the data layer can be filled from various data sources. The method comprises the following specific steps:
1) converting the database to the triples:
the present application proposes a set of mapping specifications for mapping from a relational database to semantic data, i.e. the preset mapping relation in the foregoing may be named as D2RML (relationship database to RDF mapping language), and the specifications are described using XML language; based on the usability and universality of the XML language, the D2RML can be easily understood and used by common users; when the language is used, the user is not required to use related knowledge such as resource Description framework RDF (resource Description framework) and the like, so that the use threshold is reduced. In addition, the application also provides a visual standard configuration tool, and a user can complete the formulation of the mapping rule only by some simple configurations on the tool.
The main keywords and corresponding description functions in D2RML are as follows:
(a) dbtype is the type of a source database, such as mysql, oracle, sqlserver and the like, and determines the drive used in connection;
(b) dburl: the database is connected with a character string, and information such as the address, the port and the used database of the database is appointed.
(c) dbuser: a user name of the database;
(d) dbpwd: a password for the database;
(e) table: a source data table;
(f) concept: importing a target concept;
(g) name colname attribute of name: an entity name source column;
(h) the colname attribute of synonym: a synonymous entity source column;
(i) parent's tabename attribute: table names of parent concepts;
(j) the colname of attribute specifies the attribute source column, and attrname specifies the attribute name.
For example, one mapping file is as follows:
Figure BDA0002634342340000141
Figure BDA0002634342340000151
when the mapping conversion of the knowledge graph triples is carried out from the structured data, the basic structure in the structured data, including the meaning of each table and the association between the tables, is firstly analyzed, the structure of the knowledge graph is simultaneously analyzed, and then the tables in the structured data are associated with the concepts or entities in the knowledge graph by using a D2RML language, so that the conversion is realized.
2) Structured data knowledge mapping
After the mapping configuration file is defined, the triplets of the knowledge graph can be converted from the database according to the configured mapping relation. In this embodiment, a knowledge transformation engine may connect a target database configured in a configuration file, read structured data in a corresponding table, map data of tables and columns in the database into entities of concepts and attributes of the entities, respectively, and then store knowledge obtained by mapping into a knowledge graph.
3) Internet data collection and mapping
In order to enrich the knowledge graph, the application provides an industry knowledge base and industry website automatic discovery algorithm based on a search engine and an online encyclopedia, so that more triples related to each enterprise are mined and enriched in the knowledge graph.
The page acquisition and content acquisition are realized by the following algorithm flow, as shown in fig. 8:
(1) the search is carried out in a search engine and a search interface of an online encyclopedia by utilizing seed vocabularies which can represent industries. For the webpage documents returned by the search engine, the result of certain data arranged in the front is selected and directly added to the target webpage list. For the pages returned by encyclopedia, the corresponding article pages are entered, and then two types of links, namely common external links and external links of reference documents, are searched in the article pages and are added to the target webpage list.
(2) And classifying the target web pages in the target web page list according to the websites, wherein the acquisition strategies of different pages are different, such as list pages, detail pages and other pages.
(3) And performing in-station acquisition on the obtained webpage, wherein the maximum acquisition depth is set to be 3 layers, namely, starting from the first page, a depth-first acquisition strategy is used, and 3 layers are acquired in total.
(4) Analyzing the content of the websites, and extracting and storing the content of the webpage acquired by each website; for the content of the website, if the frequency of containing the industry keywords is high, the content of the website is related to the industry, the content is selected as a target data source, otherwise, the content only contains a small number of examples and is abandoned, and finally, a corresponding triple is generated by the stored content and is added to a knowledge graph.
2. Credit scoring system implementation based on enterprise knowledge graph
In the application, a credit scoring system based on an enterprise knowledge graph, namely a risk identification model in the preceding text, can be constructed based on a convolutional neural network, wherein the convolutional neural network can automatically extract features for input sentences or image data, perform classification tasks, extract more features to be used as input for next training in natural language processing, the Convolutional Neural Network (CNN) is generally used for natural language processing tasks such as character-level information modeling, the current word is connected with the preceding and following Chinese characters by using window sliding on word vectors of the input words through the CNN, the influence of the preceding and following words on the current word is calculated, and the generated words represent word features. In the present application, the term "convolutional neural network" is taken as an example, and the CNN layer structure is shown in fig. 9. After the convolution is finished, context information between the characters is extracted, expression characteristics of words and sentences are generated, and then the expression characteristics are input into a lower-layer neural network.
It should be noted that, in the training of a machine learning model (i.e., a risk recognition model) for risk recognition, a conventional machine learning algorithm often encounters a problem that cannot be solved, that is, the risk sample data is insufficient, and the features that can be extracted are limited. In a normal production environment, harmless data is far larger than harmful data, and a traditional machine learning algorithm based on statistics can obtain an ideal recognition model only under the training of a large amount of high-quality sample data. The idea of a risk identification model based on a deep machine learning DBN (deep Belief network) algorithm is that finite harmless data can be used for training, and multi-dimensional and multi-level learning is performed through iteration of a multi-layer neural network RBM (verified Boltzmann machine), so that the number of features can be obtained through rapid increase of learning.
The method comprises the steps of firstly training a small risk sample based on a deep machine learning DBN algorithm, obtaining an accurate sample by taking a threshold value, then training the accurate test sample by using the DBN again, and repeating iteration in the way to finally obtain a final risk identification model.
In combination with the design architecture diagram of the enterprise credit scoring system shown in fig. 10, the mainstream big data product is fully combined in the application, so that the usability, flexibility and expandability of the product are ensured. The application layer adopts interface development to provide a series of service capabilities, and simultaneously ensures the simplicity and the expandability of deployment.
The whole system can execute single-machine and distributed deployment, and utilizes the map to realize enterprise risk assessment and credit scoring and change event prompts; and newly adding an enterprise entity by using a knowledge graph unified data interface, establishing a risk relationship, and calculating information such as graph enterprise total credit, risk early warning trend and the like by using a distributed asynchronous algorithm. The method comprises the following specific steps:
the system can be built based on a cloud host, an independent server or a third-party virtual host, and is based on one or more databases of MSSQL, MySQL, orade and the like;
processing such as storage, caching, self-defined functions, transaction processing, reading and writing of a database and the like is realized in a data layer;
the method comprises the steps of constructing a knowledge graph in a business layer, namely constructing the graph by using a mapping rule, describing enterprises and events related to the enterprises based on the graph, such as credit evaluation, monitoring enterprises, monitoring statistics, event lists, rule configuration and the like, wherein CNN and DBN are the main implementation of business evaluation in the business layer, and asynchronous calculation is performed between a data layer and the business layer;
processing such as template engine rendering and request receiving in the display layer;
an interactive interface is provided for a user in the form of hypertext Markup language html (hypertext Markup language), cascading Style sheets cs (screening styles sheets), jQuery and pictures at a front end UI (user interface).
Therefore, the method and the system can deeply analyze the business state and public opinion trend of the enterprise, comprehensively depict all dimension information of each enterprise through the knowledge graph, and realize timely and effective credit scoring based on all dimension data of the enterprise. Moreover, according to the method and the system, based on enterprise characteristics and relevant historical negative samples and complaint samples, through artificial intelligence technologies such as deep learning, knowledge maps and natural language processing, deep correlation analysis and risk characteristic extraction are carried out on enterprise operation management data such as engineering project management, marketing management, material management and clean government construction, automatic intelligent identification of enterprise internal operation management risks is achieved, risk identification accuracy is improved, and risk control level is improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for knowledge-graph based credit rating, the method comprising:
obtaining a target corpus, wherein the target corpus comprises a plurality of sentences;
extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension;
and performing risk identification on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.
2. The method of claim 1, wherein the knowledge-graph is obtained by:
reading structured data stored in a relational database, the structured data being associated with at least one target object;
and converting the structured data into ternary data by using a preset mapping relation between the structured data and the triples so as to obtain the knowledge graph.
3. The method of claim 1 or 2, wherein the knowledge-graph is obtained by:
acquiring a target page related to at least one target object in an industry website by using a preset word corresponding to the target object;
reading page content in the target page;
and generating ternary group data according to the page content to obtain the knowledge graph.
4. The method according to claim 3, wherein the target page comprises at least a first page associated with the preset word and a second page obtained by performing in-station acquisition on the first page.
5. The method of claim 1, wherein the risk identification model is trained by:
obtaining a plurality of training feature word sets with credit rating labels; the training feature word set is a feature word set obtained by utilizing the knowledge graph to extract words of sentences in the training corpus; the training feature word set comprises training feature words on at least one information dimension;
and taking the training feature words on each information dimension as input samples of corresponding risk recognition models, taking the credit rating labels of the training feature word set as output samples of the risk recognition models, and training the risk recognition models.
6. The method according to claim 5, wherein the difference between a credit rating test result obtained by the risk recognition of the training feature word set corresponding to the training corpus and the credit rating label corresponding to the training corpus by the risk recognition model is greater than or equal to a preset threshold.
7. A knowledge-graph based credit rating apparatus, the apparatus comprising:
the corpus acquiring unit is used for acquiring a target corpus, and the target corpus comprises a plurality of sentences;
the characteristic extraction unit is used for extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension;
and the risk identification unit is used for carrying out risk identification on the feature words of the target object in each information dimension by utilizing a risk identification model corresponding to each information dimension so as to obtain a credit rating result corresponding to each information dimension of the target object, the risk identification model is obtained by utilizing a plurality of training feature word sets with credit rating labels for training, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.
8. The apparatus of claim 7, further comprising:
a first graph building unit for reading structured data stored in a relational database, the structured data being related to at least one target object; and converting the structured data into ternary data by using a preset mapping relation between the structured data and the triples so as to obtain the knowledge graph.
9. The apparatus of claim 7, further comprising:
the second map building unit is used for obtaining a target page related to at least one target object in an industry website by using a preset word corresponding to the target object; reading page content in the target page; and generating ternary group data according to the page content to obtain the knowledge graph.
10. An electronic device, comprising:
the memory is used for storing the application program and data generated by the running of the application program;
a processor for executing the application to implement: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words of each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of characteristic words corresponding to the target corpus, wherein the characteristic words comprise characteristic words of at least one target object in at least one information dimension; and performing risk identification on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension.
CN202010820772.5A 2020-08-14 2020-08-14 Credit rating method and device based on knowledge graph and electronic equipment Active CN111951079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010820772.5A CN111951079B (en) 2020-08-14 2020-08-14 Credit rating method and device based on knowledge graph and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010820772.5A CN111951079B (en) 2020-08-14 2020-08-14 Credit rating method and device based on knowledge graph and electronic equipment

Publications (2)

Publication Number Publication Date
CN111951079A true CN111951079A (en) 2020-11-17
CN111951079B CN111951079B (en) 2024-04-02

Family

ID=73343663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010820772.5A Active CN111951079B (en) 2020-08-14 2020-08-14 Credit rating method and device based on knowledge graph and electronic equipment

Country Status (1)

Country Link
CN (1) CN111951079B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950344A (en) * 2021-02-26 2021-06-11 平安国际智慧城市科技股份有限公司 Data evaluation method and device, electronic equipment and storage medium
CN114429398A (en) * 2022-04-06 2022-05-03 北京市农林科学院信息技术研究中心 Data-driven novel agricultural operation main body credit grade generation method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates
WO2019095572A1 (en) * 2017-11-17 2019-05-23 平安科技(深圳)有限公司 Enterprise investment risk assessment method, device, and storage medium
US20190259033A1 (en) * 2015-06-20 2019-08-22 Quantiply Corporation System and method for using a data genome to identify suspicious financial transactions
CN110197280A (en) * 2019-05-20 2019-09-03 中国银行股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN110503236A (en) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 Risk Forecast Method, device, equipment and the storage medium of knowledge based map
CN110889556A (en) * 2019-11-28 2020-03-17 福建亿榕信息技术有限公司 Enterprise operation risk prediction method and system
CN110930249A (en) * 2020-02-07 2020-03-27 成都数联铭品科技有限公司 Large-scale enterprise credit risk prediction method and system, storage medium and electronic equipment
CN111061882A (en) * 2019-08-19 2020-04-24 广州利科科技有限公司 Knowledge graph construction method
CN111080178A (en) * 2020-01-22 2020-04-28 中国建设银行股份有限公司 Risk monitoring method and device
CN111177653A (en) * 2019-12-10 2020-05-19 中国建设银行股份有限公司 Credit assessment method and device
CN111259167A (en) * 2020-02-11 2020-06-09 广州众赢科技有限公司 User request risk identification method and device
CN111489168A (en) * 2020-04-17 2020-08-04 支付宝(杭州)信息技术有限公司 Target object risk identification method and device and processing equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190259033A1 (en) * 2015-06-20 2019-08-22 Quantiply Corporation System and method for using a data genome to identify suspicious financial transactions
WO2019095572A1 (en) * 2017-11-17 2019-05-23 平安科技(深圳)有限公司 Enterprise investment risk assessment method, device, and storage medium
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates
CN110197280A (en) * 2019-05-20 2019-09-03 中国银行股份有限公司 A kind of knowledge mapping construction method, apparatus and system
CN110503236A (en) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 Risk Forecast Method, device, equipment and the storage medium of knowledge based map
CN111061882A (en) * 2019-08-19 2020-04-24 广州利科科技有限公司 Knowledge graph construction method
CN110889556A (en) * 2019-11-28 2020-03-17 福建亿榕信息技术有限公司 Enterprise operation risk prediction method and system
CN111177653A (en) * 2019-12-10 2020-05-19 中国建设银行股份有限公司 Credit assessment method and device
CN111080178A (en) * 2020-01-22 2020-04-28 中国建设银行股份有限公司 Risk monitoring method and device
CN110930249A (en) * 2020-02-07 2020-03-27 成都数联铭品科技有限公司 Large-scale enterprise credit risk prediction method and system, storage medium and electronic equipment
CN111259167A (en) * 2020-02-11 2020-06-09 广州众赢科技有限公司 User request risk identification method and device
CN111489168A (en) * 2020-04-17 2020-08-04 支付宝(杭州)信息技术有限公司 Target object risk identification method and device and processing equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950344A (en) * 2021-02-26 2021-06-11 平安国际智慧城市科技股份有限公司 Data evaluation method and device, electronic equipment and storage medium
CN114429398A (en) * 2022-04-06 2022-05-03 北京市农林科学院信息技术研究中心 Data-driven novel agricultural operation main body credit grade generation method and device
CN114429398B (en) * 2022-04-06 2023-12-22 北京市农林科学院信息技术研究中心 Data-driven novel agricultural operation subject credibility level generation method and device

Also Published As

Publication number Publication date
CN111951079B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
US11599714B2 (en) Methods and systems for modeling complex taxonomies with natural language understanding
Hofmann et al. Text mining and visualization: Case studies using open-source tools
WO2018218708A1 (en) Deep-learning-based public opinion hotspot category classification method
US20160042296A1 (en) Generating and Using a Knowledge-Enhanced Model
CN108563620A (en) The automatic writing method of text and system
CN109299865B (en) Psychological evaluation system and method based on semantic analysis and information data processing terminal
US10089390B2 (en) System and method to extract models from semi-structured documents
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN110750648A (en) Text emotion classification method based on deep learning and feature fusion
JP2020520002A (en) Establishing the entity model
CN111951079B (en) Credit rating method and device based on knowledge graph and electronic equipment
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
Sandhiya et al. A review of topic modeling and its application
CN114238653A (en) Method for establishing, complementing and intelligently asking and answering knowledge graph of programming education
Vukanti et al. Business Analytics: A case-study approach using LDA topic modelling
CN117271558A (en) Language query model construction method, query language acquisition method and related devices
US11295078B2 (en) Portfolio-based text analytics tool
Ranjan et al. Profile generation from web sources: an information extraction system
Gammack et al. Semantic knowledge management system for design documentation with heterogeneous data using machine learning
CN116070599A (en) Intelligent question bank generation and auxiliary management system
KR102454261B1 (en) Collaborative partner recommendation system and method based on user information
CN113326348A (en) Blog quality evaluation method and tool
Yang et al. Evaluation and assessment of machine learning based user story grouping: A framework and empirical studies
Oosthuizen et al. Analysis of INCOSE Systems Engineering journal and international symposium research topics
Chebil et al. Clustering social media data for marketing strategies: Literature review using topic modelling techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: No. 308 Guang'anmen Inner Street, Xicheng District, Beijing, 100053

Applicant after: State Grid Digital Technology Holdings Co.,Ltd.

Applicant after: State Grid E-Commerce Technology Co.,Ltd.

Address before: 311 guanganmennei street, Xicheng District, Beijing 100053

Applicant before: STATE GRID ELECTRONIC COMMERCE Co.,Ltd.

Country or region before: China

Applicant before: State Grid E-Commerce Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant