CN111951079B

CN111951079B - Credit rating method and device based on knowledge graph and electronic equipment

Info

Publication number: CN111951079B
Application number: CN202010820772.5A
Authority: CN
Inventors: 张宾; 孙喜民; 周晶; 李慧超; 王帅
Original assignee: State Grid Digital Technology Holdings Co ltd; State Grid E Commerce Technology Co Ltd
Current assignee: State Grid Digital Technology Holdings Co ltd; State Grid E Commerce Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2024-04-02
Anticipated expiration: 2040-08-14
Also published as: CN111951079A

Abstract

The application discloses a credit rating method and device based on a knowledge graph and electronic equipment, wherein the method comprises the following steps: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words from each sentence in the target corpus by utilizing a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one information dimension; and carrying out risk recognition on feature words of the target object in each information dimension by using a risk recognition model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk recognition model is obtained by training by using a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.

Description

Credit rating method and device based on knowledge graph and electronic equipment

Technical Field

The application relates to the technical field of deep learning, in particular to a credit rating method and device based on a knowledge graph and electronic equipment.

Background

At present, research on enterprise credit rating is mainly focused on the risk analysis field related to enterprises, and with the improvement of the complexity of risk data, the improvement of analysis requirements is brought, so that the application of deep learning has become the research front of the risk management field, and the subversion change in the risk management field is also generated. Deep learning is derived from the development of artificial neural networks, and comprises a complex multi-level learning structure, which is built based on learning mechanisms mimicking the human brain. The deep learning model learns each data characteristic and then inputs new characteristics into the next layer, and in the process, the new characteristics are obtained by carrying out specific characteristic transformation on the learned data characteristics, so that the prediction effect of the model is improved.

In the actual rating process, the indexes of risk prediction are various, all indexes are related to each other, the indexes are directly used as features to carry out classification regression, and because the characteristics of more complete and deeper layers are not utilized, a model trained by a conventional machine learning method cannot have a good prediction effect, all indexes of risk can be quantized and then used as input of a neural network to extract the deeper-layer features of all indexes of risk, the extracted features are input into a classifier or regression model to carry out risk early warning model training, and after the super-parameters are adjusted and repeated iterative training is carried out, an optimal risk early warning model is obtained, and then the optimal risk early warning model is deployed on a relevant platform to be used for predicting enterprise management risks, stock risks, foreign exchange risks and the like.

In the above various risk prediction implementations based on machine learning, while machine learning provides a reliable and convincing prediction capability for people, the difficulty of enterprise credit rating is the acquisition capability and unified association representation of multidimensional data, which is not possible with conventional machine learning techniques. Therefore, the current risk prediction implementation scheme has the technical problem of low prediction accuracy due to the fact that high-dimensional characteristics cannot be obtained.

Disclosure of Invention

In view of this, the present application provides a credit rating method and apparatus based on a knowledge graph, and an electronic device, as follows:

a credit rating method based on a knowledge graph, the method comprising:

obtaining a target corpus, wherein the target corpus comprises a plurality of sentences;

extracting words from each sentence in the target corpus by utilizing a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one information dimension;

and carrying out risk recognition on feature words of the target object in each information dimension by using a risk recognition model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk recognition model is obtained by training by using a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.

In the above method, preferably, the knowledge graph is obtained by:

reading structured data stored in a relational database, the structured data being associated with at least one target object;

and converting the structured data into triplet data by utilizing a preset mapping relation between the structured data and the triplet so as to obtain the knowledge graph.

In the above method, preferably, the knowledge graph is obtained by:

obtaining a target page related to the target object in an industry website by utilizing a preset word corresponding to at least one target object;

reading page content in the target page;

and generating triplet data according to the page content to obtain the knowledge graph.

In the above method, preferably, the target page at least includes a first page associated with the preset word and a second page obtained by in-station acquisition of the first page.

In the above method, preferably, the risk identification model is trained by:

obtaining a plurality of training feature word sets with credit rating labels; the training feature word set is a feature word set obtained by extracting words from sentences in the training corpus by utilizing the knowledge graph; the training feature word set comprises training feature words in at least one information dimension;

And training the risk recognition model by taking training feature words in each information dimension as input samples of the corresponding risk recognition model and taking credit rating labels of the training feature word set as output samples of the risk recognition model.

In the above method, preferably, the difference between the credit rating test result obtained by performing risk identification on the training feature word set corresponding to the training corpus by the risk identification model and the credit rating label corresponding to the training corpus is greater than or equal to a preset threshold.

A knowledge-graph-based credit rating apparatus, the apparatus comprising:

the corpus obtaining unit is used for obtaining a target corpus, wherein the target corpus comprises a plurality of sentences;

the feature extraction unit is used for extracting words from each sentence in the target corpus by utilizing a pre-constructed knowledge graph so as to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one information dimension;

the risk identification unit is used for carrying out risk identification on feature words of the target object in each information dimension by utilizing a risk identification model corresponding to each information dimension so as to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training by utilizing a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension.

The above device, preferably, further comprises:

a first map construction unit for reading structured data stored in a relational database, the structured data being related to at least one target object; and converting the structured data into triplet data by utilizing a preset mapping relation between the structured data and the triplet so as to obtain the knowledge graph.

The above device, preferably, further comprises:

the second map construction unit is used for obtaining a target page related to the target object in the industry website by utilizing a preset word corresponding to at least one target object; reading page content in the target page; and generating triplet data according to the page content to obtain the knowledge graph.

An electronic device, comprising:

the memory is used for storing the application program and data generated by the running of the application program;

a processor for executing the application program to realize: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words from each sentence in the target corpus by utilizing a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one information dimension; and carrying out risk recognition on feature words of the target object in each information dimension by using a risk recognition model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk recognition model is obtained by training by using a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension.

According to the credit rating method, the credit rating device and the electronic equipment based on the knowledge graph, after the target corpus is obtained, the word extraction is carried out on each sentence in the target corpus by utilizing the pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise at least one feature word of the target object in at least one information dimension, and therefore, the risk identification is carried out on the feature words of the target object in each information dimension by utilizing a risk identification model corresponding to each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so that the credit risk of the corresponding target object in the corresponding information dimension is represented. Therefore, the knowledge graph is utilized to extract the feature words in the multiple information dimensions, so that the feature content input into the deep learning model is enriched, and the accuracy of the acquired credit rating result is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a credit rating method based on a knowledge graph according to an embodiment of the present application;

FIGS. 2-3 are partial flow charts, respectively, of a first embodiment of the present application;

fig. 4 is a schematic structural diagram of a credit rating device based on a knowledge graph according to a second embodiment of the present application;

fig. 5 to 6 are respectively schematic structural views of a second embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application;

FIG. 8 is an example flow chart of web content acquisition according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a convolutional neural network in an embodiment of the present application;

fig. 10 is a system architecture diagram implemented in the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, a flowchart of an implementation of a credit rating method based on a knowledge graph according to an embodiment of the present application is applicable to an electronic device capable of performing data processing, such as a computer or a server. The technical solution in this embodiment is mainly used for rating the credit of a target object, such as an enterprise or a person.

Specifically, the method in this embodiment may include the following steps:

step 101: and obtaining a target corpus.

Wherein the target corpus contains a plurality of sentences. For example, the target corpus is a news segment, or the target corpus is a summary report, or the target corpus is a speaking manuscript, or the like.

It should be noted that, the sentences in the target corpus describe the target objects to be rated, such as enterprises or individuals, and the sentences in the target corpus also describe the content related to the target objects. For example, the sentences in the target corpus describe related contents of a certain enterprise in a plurality of information dimensions, such as related contents about business states, registered capital, financial indexes and the like in the enterprise business situation dimension, and further related contents about corporate stakeholder changes, external investments and the like in the enterprise business information dimension, and the like.

Step 102: and extracting words from each sentence in the target corpus by utilizing the pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus.

The plurality of feature words corresponding to the target corpus comprise feature words of at least one target object in at least one information dimension. For example, the feature words corresponding to the target corpus include feature words of the enterprise a in the business condition dimension and feature words of the enterprise B in the business condition and business information dimension.

Specifically, in this embodiment, a knowledge graph including a plurality of triples may be pre-constructed, where the triples may be relational triples, such as triples of entity-relationship-entity, or triples of attribute type, such as triples of entity-attribute value, where the triples include triples of a plurality of enterprises in a plurality of information dimensions. Based on this, in this embodiment, word extraction is performed on each sentence in the target corpus by using the triplet data in the knowledge graph, so as to extract a plurality of feature words corresponding to the target corpus, for example, a relation triplet in which "enterprise a" has "investment" on "enterprise B", and further, for example, an attribute triplet in which "enterprise a" is a "sales" company and sales amount is "100 ten thousand", and so on.

Step 103: and carrying out risk recognition on the feature words of the target object in each information dimension by using the risk recognition model corresponding to each information dimension so as to obtain a credit rating result corresponding to the target object in each information dimension.

The risk identification model is obtained by training a plurality of training feature word sets with credit rating labels, and a credit rating result of the obtained target object in one information dimension represents the credit risk of the corresponding target object in the corresponding information dimension.

For example, in this embodiment, a plurality of risk recognition models are pre-built, each risk recognition model corresponds to an information dimension, such as a risk recognition model corresponding to an operation state dimension and a risk recognition model corresponding to an industrial and commercial information dimension, and the like, then, training the risk recognition model corresponding to each information dimension by using a plurality of training feature word sets with credit rating labels, where the training feature word sets are corresponding to each information dimension, the trained risk recognition model can rate the credit of the target object in the corresponding information dimension to obtain a credit rating result of the target object in the information dimension, and the credit rating result of the target object in the information dimension can represent the credit risk of the target object in the corresponding information dimension, such as that the credit rating result of enterprise a in the operation state dimension represents that the credit risk of enterprise a in the operation state dimension is higher, the credit rating result of enterprise B in the industrial and commercial information dimension represents that the credit of enterprise B in the industrial and commercial information dimension is lower, and the like.

In one implementation, the risk recognition model in this embodiment may be a deep learning model constructed based on a machine learning algorithm, such as a deep learning model constructed based on a convolutional neural network CNN (Convolutional Neural Networks), or the like.

As can be seen from the foregoing, in the credit rating method based on a knowledge graph provided in the first embodiment of the present application, after obtaining a target corpus, a word is extracted from each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, where the feature words include at least one feature word of a target object in at least one information dimension, so that risk identification is performed on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so as to characterize the credit risk of the corresponding target object in the corresponding information dimension. Therefore, in this embodiment, the knowledge graph is used to extract the feature words in multiple information dimensions, so as to enrich the feature content input into the deep learning model, thereby improving the accuracy of the obtained credit rating result.

In one implementation, the knowledge-graph in this embodiment may be obtained by the following manner, as shown in fig. 2:

step 201: and reading the structured data stored in the relational database.

The relational database is a database storing structured data related to target objects, for example, the registration database contains structured data of shops, brands, users and the like, and the structured data is related to at least one target object, such as enterprises or individuals and the like.

Specifically, in this embodiment, structured data such as tables and columns in the relational database may be read by stacking or queuing.

Step 202: and converting the structured data into triplet data by utilizing a preset mapping relation between the structured data and the triplet so as to obtain the knowledge graph.

In a specific implementation, the preset mapping relationship in this embodiment may be understood as a mapping specification for mapping from a relational database to semantic data, and specifically, a visual specification configuration tool may be used to configure the preset mapping relationship between structured data and triples. Specifically, in this embodiment, by analyzing the basic structure in the structured data and the structure of the triplet of the knowledge graph, for example, analyzing the meaning of each table, the association between tables, and the like, and the entity and entity attribute in the triplet, a preset mapping relationship between the structured data and the triplet is further configured, for example, a concept that a user table in a database corresponds to a person in the knowledge graph, a field in a table in the database corresponds to a contact manner defined on the person in the knowledge graph, and so on. Based on the above, when converting from the structured data to the triplet data, the elements in the rows and columns in the table are mapped into the elements such as the entities, the entity relationships or the entity attributes in the triplet by using the preset mapping relationship, thereby obtaining the triplet data and further forming the knowledge graph.

In one implementation, the knowledge graph in this embodiment may be further supplemented or enriched by the following manner, as shown in fig. 3:

step 301: and obtaining a target page related to the target object in the industry website by utilizing the preset word corresponding to the at least one target object.

In this embodiment, based on a preset seed vocabulary capable of representing the industry where the target object is located, that is, a preset word, a search engine or a search interface or the like may be used to perform a page search on an industry website (a page including an industry knowledge base) to obtain a target page related to the target object.

In one implementation manner, the target page may include only a first page associated with a preset word, for example, a page directly including the preset word, or the target page may further include a second page obtained by in-station acquisition of the first page, that is, a page corresponding to a link included in the first page, and so on.

Specifically, in this embodiment, a search engine or a search interface may be used to search a first page including a preset word, then intra-station acquisition is performed on the first page, and the maximum depth of acquisition is set to 3 layers, that is, from the first page, a depth-first acquisition strategy is used to acquire 3 layers in total. In other implementations, the acquisition depth may also be set to other values, such as 2 layers or 4 layers, etc.

Step 302: and reading page contents in the target page.

In this embodiment, a crawler and other technologies may be used to acquire page content in a target page, so as to obtain content such as text therein.

Step 303: and generating triplet data according to the page content to obtain the knowledge graph.

The method can use a pre-constructed and trained triplet extraction model to perform triplet extraction on page contents so as to obtain triplet data, thereby forming a knowledge graph. The triplet extraction model can be a model constructed based on a deep learning algorithm and is trained by using training sentence samples with triplet labels, so that the trained triplet extraction model can extract triples of sentences to obtain corresponding triplet data and is added into a knowledge graph.

In one implementation, the risk identification model in this embodiment may be trained by:

firstly, obtaining a plurality of training feature word sets with credit rating labels, wherein each training feature word set can be a feature word set obtained by extracting words from sentences in a corresponding training corpus by using a knowledge graph;

It should be noted that, the training feature word set here includes training feature words in multiple information dimensions;

and then, taking the training feature words in each information dimension as input samples of the corresponding risk recognition models in the information dimension, taking the credit rating labels of the training feature word sets as output samples of the risk recognition models in the information dimension, and training the risk recognition models.

Specifically, in this embodiment, training feature words on each information dimension are input into corresponding risk recognition models on the corresponding information dimension, a credit rating test result output by the risk recognition model for the input training feature words is obtained, then the credit rating test result is compared with a credit rating label, and model parameters of the risk recognition model are adjusted according to a difference value represented by the comparison result, so that a loss function of the risk recognition model is reduced, and the loss function is reduced until the loss function converges, and training is completed.

Furthermore, in order to improve accuracy of training samples in this embodiment, before training the risk identification model, a difficult sample is screened out. That is, the training corpus participated in the training of the risk model is a sample corpus with higher accuracy, and at this time, the difference between the credit rating test result obtained by performing risk recognition on the training feature word set corresponding to the training corpus by the risk recognition model and the credit rating label corresponding to the training corpus is greater than or equal to a preset threshold.

In a specific implementation, in this embodiment, the risk recognition model may be first used to perform test training of the small risk sample, then after a preset threshold is obtained according to a test result, a training corpus involved in the training is screened out by using the preset threshold, and then the risk recognition model is finally obtained after repeated iterative training of the training corpus.

Referring to fig. 4, a schematic structural diagram of a credit rating device based on a knowledge graph according to a second embodiment of the present application is provided, and the device is suitable for an electronic device capable of performing data processing, such as a computer or a server. The technical solution in this embodiment is mainly used for rating the credit of a target object, such as an enterprise or a person.

Specifically, the apparatus in this embodiment may include the following units:

a corpus obtaining unit 401, configured to obtain a target corpus, where the target corpus includes a plurality of sentences;

a feature extraction unit 402, configured to extract words from each sentence in the target corpus by using a pre-constructed knowledge graph, so as to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one risk dimension;

The risk recognition unit 403 is configured to perform risk recognition on feature words of the target object in each risk dimension by using a risk recognition model corresponding to each risk dimension, so as to obtain a credit rating result corresponding to each risk dimension of the target object, where the risk recognition model is obtained by training by using a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension.

As can be seen from the foregoing, in the credit rating device based on a knowledge graph provided in the second embodiment of the present application, after obtaining a target corpus, word extraction is performed on each sentence in the target corpus by using a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one information dimension, so that risk identification is performed on the feature words of the target object in each information dimension by using a risk identification model corresponding to each information dimension to obtain a credit rating result corresponding to each information dimension of the target object, so as to characterize the credit risk of the corresponding target object in the corresponding information dimension. Therefore, in this embodiment, the knowledge graph is used to extract the feature words in multiple information dimensions, so as to enrich the feature content input into the deep learning model, thereby improving the accuracy of the obtained credit rating result.

In one implementation, the apparatus in this embodiment may further include the following units, as shown in fig. 5:

a first map construction unit 404 for reading structured data stored in a relational database, the structured data being related to at least one target object; and converting the structured data into triplet data by utilizing a preset mapping relation between the structured data and the triplet so as to obtain the knowledge graph.

The second map construction unit 405 is configured to obtain a target page related to the target object in the industry website by using a preset word corresponding to at least one target object; reading page content in the target page; and generating triplet data according to the page content to obtain the knowledge graph.

Optionally, the target page at least includes a first page associated with the preset word and a second page obtained by in-station acquisition of the first page.

In another implementation, the apparatus in this embodiment may further include the following units, as shown in fig. 6:

a model training unit 406, configured to obtain a plurality of training feature word sets with credit rating labels; the training feature word set is a feature word set obtained by extracting words from sentences in the training corpus by utilizing the knowledge graph; the training feature word set comprises training feature words in at least one information dimension; and training the risk recognition model by taking training feature words in each information dimension as input samples of the corresponding risk recognition model and taking credit rating labels of the training feature word set as output samples of the risk recognition model.

Optionally, the difference between the credit rating test result obtained by performing risk identification on the training feature word set corresponding to the training corpus by the risk identification model and the credit rating label corresponding to the training corpus is greater than or equal to a preset threshold.

It should be noted that, the specific implementation of each unit in this embodiment may refer to the corresponding content in the foregoing, which is not described in detail herein.

Referring to fig. 7, a schematic structural diagram of an electronic device according to a third embodiment of the present application may be an electronic device capable of performing data processing, such as a computer or a server. The technical solution in this embodiment is mainly used for rating the credit of a target object, such as an enterprise or a person.

Specifically, the electronic device in this embodiment may include the following structure:

a memory 701 for storing an application program and data generated by the application program;

a processor 702, configured to execute the application program to implement: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words from each sentence in the target corpus by utilizing a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one risk dimension; and carrying out risk recognition on feature words of the target object in each risk dimension by using a risk recognition model corresponding to each risk dimension to obtain a credit rating result of the target object in each risk dimension, wherein the risk recognition model is obtained by training by using a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension.

As can be seen from the foregoing, in the electronic device provided in the third embodiment of the present application, after obtaining a target corpus, word extraction is performed on each sentence in the target corpus by using a pre-constructed knowledge graph, so as to obtain a plurality of feature words corresponding to the target corpus, where the feature words include feature words of at least one target object in at least one information dimension, and therefore, risk recognition is performed on the feature words of the target object in each information dimension by using a risk recognition model corresponding to each information dimension, so as to obtain a credit rating result corresponding to each information dimension of the target object, and thus, the credit risk of the corresponding target object in the corresponding information dimension is represented. Therefore, in this embodiment, the knowledge graph is used to extract the feature words in multiple information dimensions, so as to enrich the feature content input into the deep learning model, thereby improving the accuracy of the obtained credit rating result.

It should be noted that, the specific implementation of the processor in this embodiment may refer to the corresponding content in the foregoing, which is not described in detail herein.

Taking the technical scheme of the application as an example for rating enterprise credit, the technical scheme of the application is illustrated:

Firstly, the semantic representation and understanding problems of multi-source heterogeneous data are solved by introducing a knowledge graph technology, and the credit scoring effectiveness of big data enterprises is improved. Specifically, the implementation of the technical scheme of the application is mainly divided into two parts: and constructing an enterprise knowledge graph and realizing a credit scoring system based on the enterprise knowledge graph. The following are provided:

1. construction of enterprise knowledge graph

The enterprise knowledge graph construction basically adopts structured data related to enterprises and business and various vertical site data in the Internet as data sources. Has the following characteristics:

(1) The industry coverage is wider and the industry depth is considerable. The data sources are all from the data which are strongly related to the enterprise, and the data relativity is tightly combined with the enterprise;

(2) The reliability is high: for the internal structured data of an enterprise, the internal structured data is usually used for supporting the business of the enterprise, so that the reliability is very high; the enterprise data is stored in the relational database, and the structured triplet data can be obtained by only converting the relational data to a certain extent, so that the reliability is good.

(3) The structure is strong: for internally structured data, the vast majority is stored by relational databases; open industry data is basically published after being edited by higher quality websites, and has good structure.

When the enterprise knowledge graph is constructed, a data mode can be predefined, and a top-down knowledge graph mode is adopted. The data pattern is the most core part in the knowledge graph, and after the data pattern is defined, the filling of the data layer can be performed from various data sources. The method comprises the following specific steps:

1) The database converts to triples:

the application proposes a set of mapping specifications for mapping from a relational database to semantic data, namely the preset mapping relation in the previous text, which can be named as D2RML (relation database to RDF mapping language), and the specifications are described by using XML language; the usability and universality of the XML-based language enable the D2RML to be easily understood and used by common users; when the language is used, the user is not required to use knowledge related to the resource description framework RDF (Resource Description Framework) and the like, so that the use threshold is lowered. In addition, the application also provides a visual standard configuration tool, and a user can complete the formulation of the mapping rule only by using a few simple configurations on the tool.

The main keywords and corresponding description functions in D2RML are as follows:

(a) db type, type of source database, mysql, oracle, sqlserver, etc., the type determines the driver used in connection;

(b) dburl: the database is connected with the character string, and the address, the port, the used database and other information of the database are specified.

(c) dbuser: a user name of the database;

(d) dbpwd: a password for the database;

(e) table: a source data table;

(f) concept: importing a target concept;

(g) The colname attribute of name: an entity name source column;

(h) The colname attribute of synonym: a synonymous entity source column;

(i) The tab attribute of parent: table names of the parent concept;

(j) The attributed colname specifies the attribute source column and attrname specifies the attribute name.

For example, one mapping file is as follows:

when the mapping conversion of the knowledge graph triples is carried out from the structured data, the basic structure in the structured data, including the meaning of each table and the association between tables, is analyzed, the structure of the knowledge graph is analyzed, and then the table in the structured data and the concept or entity in the knowledge graph are associated by using the D2RML language, so that the conversion is realized.

2) Structured data knowledge mapping

After the mapping configuration file is defined, the triples of the knowledge graph can be converted from the database according to the configured mapping relation. In this embodiment, the knowledge conversion engine may be connected to a target database configured in the configuration file, read the structured data in the corresponding table, map the data of the table and the column in the database to the entity of the concept and the attribute of the entity, and store the knowledge obtained by mapping into the knowledge graph.

3) Internet data acquisition and mapping

In order to enrich the knowledge graph, the application provides an industry knowledge base and industry website automatic discovery algorithm based on a search engine and an online encyclopedia so as to discover more triples related to each enterprise and enrich the triples into the knowledge graph.

The application realizes page acquisition and content acquisition through the following algorithm flow, as shown in fig. 8:

(1) Searches are performed in search engines and search interfaces of the online encyclopedia using seed words that can represent the industry. For the webpage document returned by the search engine, the result of selecting certain data arranged in the front is directly added to the target webpage list. For pages returned by encyclopedia, the corresponding article page is entered first, and then two types of links, namely common external links and external links of references, are searched in the article page and added to the target webpage list.

(2) And classifying the target webpages in the target webpage list according to websites, such as list pages, detail pages and other pages, wherein the acquisition strategies of the different pages are different.

(3) And carrying out in-station acquisition on the obtained webpage, wherein the maximum depth of acquisition is set to be 3 layers, namely, starting from the first page, and acquiring 3 layers in total by using a depth-first acquisition strategy.

(4) Analyzing the content of websites, and extracting and storing the content of the collected webpages of each website; for the content of the website, if the content contains the industry keywords with high frequency, the website is proved to be related to the industry, the website is selected as a target data source, otherwise, the website is abandoned after only containing a small number of examples, and finally, the corresponding triples are generated by the saved content and added into the knowledge graph.

2. Credit scoring system implementation based on enterprise knowledge graph

In the application, a credit scoring system based on enterprise knowledge graph, namely a risk recognition model in the past, can be constructed based on a convolutional neural network, wherein the convolutional neural network can automatically extract features for an input sentence or image data and carry out classification tasks, more features can be extracted to be used as input of next training in natural language processing, the Convolutional Neural Network (CNN) is generally used for natural language processing tasks such as character-level information modeling, word vectors of input words are connected with front and rear Chinese characters by using the CNN through window sliding, influences of the front and rear words on the current words are calculated, and the generated words represent word features. The present application takes the term "convolutional neural network" as an example, and the CNN layer structure is shown in fig. 9. And after the convolution is completed, extracting context information between the characters, generating word and sentence representation characteristics, and inputting the word and sentence representation characteristics into a lower neural network.

It should be noted that, in training a machine learning model (i.e., a risk recognition model) for risk recognition, a conventional machine learning algorithm often encounters an unresolved problem, and thus risk samples have insufficient data and limited features that can be extracted. Because harmless data is far greater than harmful data in a normal production environment, traditional machine learning algorithms based on statistics can only obtain a relatively ideal recognition model under the training of a large number of high-quality sample data. The risk identification model based on the deep machine learning DBN (Deep Belief Network) algorithm is characterized in that the risk identification model can be trained by using limited harmless data, and multi-dimensional and multi-level learning is performed through iteration of the multi-layer neural network RBM (Restricted Boltzmann Machine), so that the feature quantity is obtained through rapid increase learning.

The method comprises the steps of firstly training a small risk sample based on a deep machine learning (DBN) algorithm, obtaining an accurate sample by taking a threshold value, then training the accurate test sample again by using the DBN, and finally obtaining a final risk identification model through repeated iteration.

In combination with the design architecture diagram of the enterprise credit scoring system shown in fig. 10, the mainstream big data products are fully combined in the application, so that usability, flexibility and expandability of the enterprise credit scoring system are guaranteed. The application layer selects interface development to provide a series of service capability, and meanwhile, the simplicity and expandability of deployment are ensured.

The whole system can execute single machine and distributed deployment, realizes enterprise risk assessment and credit scoring by utilizing the map, and changes event prompt; and adding enterprise entities by utilizing a knowledge graph unified data interface, establishing a risk relation, and calculating information such as overall information of the graph enterprise, risk early warning trend and the like by using a distributed asynchronous algorithm. The method comprises the following steps:

the system can be built based on a cloud host, an independent server or a third-party virtual host, and is based on one or more databases of MSSQL, mySQL, orade and the like;

realizing the processing of storage, caching, custom functions, transaction processing, read-write database and the like in a data layer;

carrying out knowledge graph construction on a business layer, namely carrying out graph construction by using mapping rules, describing enterprises and events related to the enterprises, such as credit evaluation, monitoring enterprises, monitoring statistics, event list, rule configuration and the like, wherein CNN and DBN are the main realization of credit evaluation in the business layer, and asynchronous calculation is carried out between a data layer and the business layer;

rendering by a template engine, receiving requests and the like in a display layer;

an interactive interface is provided to the user at the front end UI (User Interface) in the form of hypertext markup language html (HyperText Markup Language), cascading style sheets css (Cascading Style Sheets), jQuery, pictures, and the like.

Therefore, the business state and public opinion trend of enterprises can be analyzed deeply, each enterprise is comprehensively depicted through the knowledge graph, and credit scoring based on the data of each dimension of the enterprise is achieved timely. And, based on enterprise characteristics and relevant historical negative samples and complaint samples, the method and the device carry out deep association analysis and risk characteristic extraction on enterprise management data such as engineering project management, marketing management, material management, low-cost construction and the like through artificial intelligence technologies such as deep learning, knowledge graph and natural language processing, realize automatic intelligent identification of enterprise internal management risks, improve risk identification accuracy and improve risk management and control level.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A credit rating method based on a knowledge graph, the method comprising:

performing risk recognition on feature words of the target object in each information dimension by using a risk recognition model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk recognition model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension;

the knowledge graph is obtained by the following steps:

converting the structured data into triplet data by utilizing a preset mapping relation between the structured data and the triplet so as to obtain the knowledge graph;

alternatively, the knowledge-graph is obtained by:

reading page content in the target page;

generating triplet data according to the page content to obtain the knowledge graph;

the risk identification model is obtained through training in the following way:

2. The method of claim 1, wherein the target page comprises at least a first page associated with the preset word and a second page obtained by in-station capturing of the first page.

3. The method of claim 1, wherein a difference between a credit rating test result obtained by risk recognition of the training feature word set corresponding to the training corpus by the risk recognition model and a credit rating label corresponding to the training corpus is greater than or equal to a preset threshold.

4. A credit rating device based on a knowledge graph, the device comprising:

the risk identification unit is used for carrying out risk identification on feature words of the target object in each information dimension by utilizing a risk identification model corresponding to each information dimension so as to obtain a credit rating result of the target object in each information dimension, wherein the risk identification model is obtained by training by utilizing a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding information dimension;

a first map construction unit for reading structured data stored in a relational database, the structured data being related to at least one target object; converting the structured data into triplet data by utilizing a preset mapping relation between the structured data and the triplet so as to obtain the knowledge graph;

The second map construction unit is used for obtaining a target page related to the target object in the industry website by utilizing a preset word corresponding to at least one target object; reading page content in the target page; generating triplet data according to the page content to obtain the knowledge graph;

5. An electronic device, comprising:

a processor for executing the application program to realize: obtaining a target corpus, wherein the target corpus comprises a plurality of sentences; extracting words from each sentence in the target corpus by utilizing a pre-constructed knowledge graph to obtain a plurality of feature words corresponding to the target corpus, wherein the feature words comprise feature words of at least one target object in at least one information dimension; performing risk recognition on feature words of the target object in each information dimension by using a risk recognition model corresponding to each information dimension to obtain a credit rating result of the target object in each information dimension, wherein the risk recognition model is obtained by training a plurality of training feature word sets with credit rating labels, and the credit rating result represents the credit risk of the corresponding target object in the corresponding risk dimension;

The knowledge graph is obtained by the following steps:

alternatively, the knowledge-graph is obtained by:

reading page content in the target page;