CN111598702A - Knowledge graph-based method for searching investment risk semantics - Google Patents

Knowledge graph-based method for searching investment risk semantics Download PDF

Info

Publication number
CN111598702A
CN111598702A CN202010291157.XA CN202010291157A CN111598702A CN 111598702 A CN111598702 A CN 111598702A CN 202010291157 A CN202010291157 A CN 202010291157A CN 111598702 A CN111598702 A CN 111598702A
Authority
CN
China
Prior art keywords
investment
risk
knowledge
data
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010291157.XA
Other languages
Chinese (zh)
Inventor
徐佳慧
裴乐琪
郝庆一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010291157.XA priority Critical patent/CN111598702A/en
Publication of CN111598702A publication Critical patent/CN111598702A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Marketing (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a knowledge graph-based method for searching the inauguration investment semantics, which comprises the following steps: providing a risk investment ontology knowledge model; extracting first risk investment knowledge data from the obtained semi-structured text information and extracting second risk investment knowledge data from the unstructured text information by taking the multi-source heterogeneous risk investment information corpus as a data source for constructing a knowledge base; performing data fusion according to the inauguration investment knowledge data and the second inauguration investment knowledge data to obtain fusion inauguration investment knowledge data, wherein the data fusion is used for unifying names representing the same thing; expressing the fused investment risk knowledge data in a triple form according to the investment risk ontology knowledge model to generate an investment risk knowledge graph; and providing a semantic search engine based on the investment risk knowledge graph. The invention can facilitate the retrieval of the user and provide a semantic search scheme and help the user to quickly and correctly obtain the required risk investment information.

Description

Knowledge graph-based method for searching investment risk semantics
Technical Field
The embodiment of the invention relates to the field of artificial intelligence, in particular to a knowledge graph-based method for searching for investments at risk semanteme.
Background
In recent years, the risk investment industry develops rapidly, a large number of successful investment cases emerge, and the high-efficiency self-help innovation of various industries is supported. Meanwhile, reports on the investment risk industry are more and more, and social information and news information show a great increasing trend in quantity and complexity. Extracting and storing information from such scattered and tedious information, and subsequent searching, analyzing, and the like, present challenges. The traditional data are associated in a hyperlink mode, the searched information is also displayed in a webpage text mode, and the results searched by the user usually need to be further screened, so that a more effective and convenient scheme is needed for managing the airdrop data, the user can conveniently and visually retrieve the results, and the searching quality is improved.
The knowledge graph is a technology for constructing a model for the incidence relation between knowledge in a graph structure mode, and provides powerful support for discovery of object entities and mining analysis of complex relations. And semantic search based on the knowledge graph can directly discover and reason real objects and relationships, so that the search quality is further improved. The knowledge graph technology in the vertical field can combine the field characteristics to carry out semantic expression on data by a clearer organization structure, and large-scale expansion is carried out on the high-quality and complicated data.
How to facilitate the retrieval of the user and provide a semantic search scheme and help the user to quickly and correctly obtain the required risk investment information is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the invention aims to provide a knowledge graph-based method for searching the inauguration investment semantics, which is convenient for a user to search and provide a semantic searching scheme and helps the user to quickly and correctly obtain required inauguration investment information.
In order to achieve the above object, the embodiments of the present invention mainly provide the following technical solutions:
the embodiment of the invention provides a knowledge graph-based method for searching the inauguration investment semantics, which comprises the following steps: s1: providing a risk investment ontology knowledge model; s2: extracting first risk investment knowledge data from the obtained semi-structured text information and extracting second risk investment knowledge data from the unstructured text information by taking the multi-source heterogeneous risk investment information corpus as a data source for constructing a knowledge base; s3: performing data fusion according to the investment risk knowledge data and the second investment risk knowledge data to obtain fusion investment risk knowledge data, wherein the data fusion is used for unifying names representing the same thing; s4: expressing the fused investment risk knowledge data in a triple form according to the investment risk ontology knowledge model to generate an investment risk knowledge graph; s5: and providing a semantic search engine based on the investment risk knowledge graph.
According to one embodiment of the invention, the extracting of the first inauguration investment knowledge data for obtaining the semi-structured text information comprises: and acquiring the structural information of the semi-structured text information, and designing a corresponding matching template through a regular expression to extract the first investment risk knowledge data.
According to an embodiment of the present invention, the extracting second speculation knowledge data for unstructured text information includes: extracting entity information related to the investment risk from the invested risk news text data through a BilSTM-CRF deep learning model obtained by training the labeled unstructured invested risk news text data; merging the first investment risk knowledge data and the entity information to obtain first investment risk knowledge information; extracting a switching relation pair extracted from the first investment risk knowledge information, and extracting context relation expression templates containing the investment relation pair from investment risk news text data; calculating a feature vector of each context expression template according to a Doc2Vec model obtained by training the risk investment news text data; calculating cosine similarity between the feature vector of each sentence in the investment risk news text data and the feature vector, and screening investment risk relation similar sentences larger than a first preset threshold value; and extracting the second investment risk knowledge data from the investment risk relationship similar statement through the BilSTM-CRF deep learning model.
According to an embodiment of the present invention, S3 specifically includes: s3-1: calculating the ratio R of the minimum single character editing operation times and the entity character length between the entity information and other mechanism entities by an editing distance algorithm, measuring the similarity degree between the entities by the size of 1-R, and setting a first similarity threshold value to control the number of first candidate similar mechanism entities; s3-2: training the risk investment news text data through a Word2Vec algorithm with a training mode of Skip-gram to obtain Word vectors of all investment entities, calculating the similarity among all mechanism entities, and setting a second similarity threshold to control the number of second candidate similar mechanism entities; s3-3: merging the first candidate similar mechanism entity and the second candidate similar mechanism entity to obtain a merged candidate similar mechanism entity; s3-4: comparing the attribute values of the risk investment relations of the candidate similar entities in the merged candidate similar organization entity, if two same attribute values are worth of the similar entities, fusing the candidate similar entities into the same entities and fusing the corresponding investment relations, and simultaneously automatically completing the risk investment relations of the coreference relations; s3-5: repeating S3-4 until the total investment relation quantity is stable and is not changed; s3-6: setting the attribute values of the risk investment events with the coreference relationship among the multiple sources as final attribute values; s3-7: and converting the attribute value of the numeric type into a numeric value of the same unit, converting the attribute value of the date type into a character mode of a uniform format, and uniformly appointing the attribute value of the character type.
According to an embodiment of the present invention, S5 specifically includes: s5-1: constructing a risk investment dictionary base of each entity and relationship category in the risk investment knowledge graph; s5-2: providing a risk investment query template expression with variables; s5-3: combining the inauguration investment dictionary library with the BilSTM-CRF deep learning model, and performing word segmentation processing on a user query input sentence to obtain an inauguration investment entity, attributes or relationship keywords to be searched; s5-4: obtaining the Word vectors of the extracted entities, attributes or relational keywords of the to-be-searched investments by the Word2Vec model, calculating the most similar keywords by cosine similarity with the Word vectors of the investment risk dictionary library, and linking the keywords to be searched input by the user to the related entities, attributes or relations in the investment risk knowledge graph; s5-5: combining the risk investment query template expression and the keywords to be searched to construct candidate query sentences and obtain query results; s5-6: and analyzing the query result and returning the query result to the user.
According to an embodiment of the present invention, in S4, for the inauguration investment ontology knowledge model, performing knowledge representation on the processed data in the form of triples of a resource description framework to generate the inauguration investment knowledge graph, and storing the inauguration investment knowledge graph in a triplet database facing the resource description framework.
According to an embodiment of the present invention, in S1, the method further includes: and constructing the risk investment ontology knowledge model through the risk investment relationship, the attribute description corresponding to the investment institution and the initial enterprise and the mutual relationship between every two investment institutions and the initial enterprise.
According to an embodiment of the invention, the first preset threshold is 0.65.
According to an embodiment of the invention, the first similarity threshold is 0.6.
According to an embodiment of the invention, the second similarity threshold is 0.6.
The technical scheme provided by the embodiment of the invention at least has the following advantages:
the method for searching the inauguration investment semantics based on the knowledge graph provided by the embodiment of the invention performs knowledge extraction and knowledge fusion from inauguration investment information news information by designing an inauguration investment ontology knowledge model and combining a natural language technology to obtain a semantic inauguration investment knowledge representation, and the constructed inauguration investment knowledge graph has a remarkable effect on the semantic retrieval function of a user, helps the user to efficiently and directly retrieve entities and association relations in an investment relation, and can meet the retrieval requirements of the user.
According to the method, the effective automatic risk investment knowledge extraction is carried out on the multisource heterogeneous data sources for constructing the risk investment knowledge graph, and the entity alignment and relationship combination of multisource data are automatically carried out in the knowledge fusion stage, so that the investment of a large amount of manpower is reduced, the knowledge extraction efficiency is improved, and the high-quality large-scale expansion of new data can be realized.
The invention designs a semantic search scheme by combining with the field characteristics of the risk investment data, can intelligently understand the search intention of the user and can return the query result meeting the retrieval requirement of the user.
Experiments show that the knowledge-graph-based method for searching the investment risk semantics has obvious effect in the experimental process, the investment risk knowledge graph can quickly and high-quality expand data, and the retrieval effect meets the requirement that a user accurately retrieves entities and association relations in the investment relations.
Drawings
FIG. 1 is a flow chart of a method for a semantic search of inauguration investment based on knowledge-graph according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
FIG. 1 is a flow chart of a method for a semantic search of inauguration investment based on knowledge-graph according to an embodiment of the present invention. As shown in fig. 1, the method for semantic search of investment risk based on knowledge-graph of the embodiment of the present invention includes:
s1: and providing a risk investment ontology knowledge model. Specifically, a risk investment ontology knowledge model is constructed through the risk investment relationship, the attribute description corresponding to the investment institution and the initial enterprise and the mutual relationship between every two investment institutions and the initial enterprise.
S2: and extracting first risk investment knowledge data from the obtained semi-structured text information and extracting second risk investment knowledge data from the unstructured text information by using the multi-source heterogeneous risk investment information corpus as a data source for constructing a knowledge base. The information of the investment risk knowledge includes: the method comprises the steps of entity extraction of investment institutions and initial enterprises, investment event extraction, and extraction of related attributes of each entity and each event.
In one embodiment of the present invention, S2 is specifically as follows:
s2-1: and acquiring the structure information of the semi-structured text information, and designing a corresponding matching template through a regular expression to extract the first investment risk knowledge data.
S2-2: second inauguration investment knowledge data is extracted for the unstructured text information.
In one embodiment of the present invention, S2 is specifically as follows:
s2-2-1: and extracting various entities related to the investment risk from the investment risk news text data by training the labeled unstructured investment risk news text data to obtain a BilSTM-CRF deep learning model.
S2-2-2: and combining the first investment risk knowledge data and the entity information to obtain a first investment risk knowledge message.
S2-2-3: and extracting context relation representation templates containing the investment relation pairs from the risk investment news text data according to the investment relation pairs (initial enterprises and investment institutions) extracted in the S2-2-2.
S2-2-4: the feature vector of the context relationship representation template in S2-2-3 is calculated by training the obtained Doc2Vec model on unstructured inauguration investment news text data.
S2.2.5: calculating cosine similarity of each sentence characteristic vector and each context relation representation template characteristic vector in the inauguration investment news text data, and screening the cosine similarity which is greater than a first preset threshold value T1The inauguration investment relationship of (2) is similar to a sentence. Illustratively, the first preset threshold is 0.65.
S2-2-6: and (4) extracting knowledge of the sentences with similar investment risk relationships screened in the step (S) by using a BilSTM-CRF deep learning model in the step (S2-2-1) to obtain second investment risk knowledge.
S3: and performing knowledge fusion on the extracted second investment risk knowledge.
In one embodiment of the present invention, S3 is specifically as follows:
s3-1: calculating the ratio R of the minimum single character editing operation times and the entity character length between each obtained mechanism entity and other mechanism entities by an Edit Distance (Edit Distance) algorithm, measuring the similarity between the entities by the size of 1-R, and setting a first similarity threshold T2Controlling the number of candidate similar institution entities. Exemplarily, the first similarity threshold T2Is 0.6.
S3-2: training unstructured risk investment news text data through a Word2Vec algorithm with a training mode of Skip-gram to obtain Word vectors of all investment entities, calculating the similarity among all mechanism entities, and setting a second similarity threshold T3Controlling the number of candidate similar institution entities. Illustratively, the second similarity threshold T3Is 0.6.
S3-3: and merging the candidate similar mechanism entities extracted in the S3-1 and the S3-2 to obtain a merged candidate similar mechanism entity.
S3-4: comparing the attribute values of the risk investment relations of the candidate similar entities in the entities, fusing the candidate similar entities into the same entity and fusing the corresponding investment relations if two same attribute values are worth of the similar entities, and automatically completing the risk investment relations of the coreference relations.
S3-5: and repeating the step S3-4 until the total investment relation quantity is stable and is not changed.
S3-6: and (4) voting is adopted for each attribute value of the risk investment events with the common reference relationship among multiple sources, namely the attribute value with the majority is the final attribute value.
S3-7: converting the attribute value of the numerical type into the numerical value of the same unit, converting the attribute value of the date type into the character mode of a uniform format, and uniformly appointing the attribute value of the character type.
S4: and for the risk investment ontology knowledge model, performing knowledge representation on the processed data in a triple form of a resource description framework to generate the risk investment knowledge graph, and storing the risk investment knowledge graph in a triple database facing the resource description framework.
S5: and a semantic search function facing the investment risk knowledge graph is provided, so that a user can conveniently retrieve investment risk information.
In one embodiment of the present invention, S5 is specifically as follows:
s5-1: and constructing a risk investment dictionary base of each entity and relationship class in the risk investment knowledge graph.
S5-2: a speculation query template expression with variables is provided.
S5-3: and performing word segmentation processing on the query input sentences of the user in a mode of combining a dictionary library in the S5-1 and a trained BilSTM-CRF deep learning model in the S2-2-1 to obtain the to-be-searched risk investment entities, attributes or relationship keywords.
S5-4: and obtaining Word vectors of the extracted to-be-searched inauguration investment entities, attributes or relation keywords in the S5-3 by the Word2Vec model trained in the S3-2, and calculating the most similar keywords with the Word vectors of the inauguration investment dictionary library in the S5-1 according to cosine similarity, so that the to-be-searched keywords input by the user are linked to the related entities, attributes or relations in the inauguration investment knowledge graph.
S5-5: and constructing candidate SPARQL query sentences by combining the template expression of S5-2 and the keywords to be searched of S5-4, and obtaining query results.
S5-6: and analyzing the query result and returning the query result to the user.
In order that those skilled in the art will further understand the invention, further description will be provided by way of the following examples.
A method for searching the risk investment semantics based on knowledge graph includes the following steps:
s1: and constructing a risk investment ontology knowledge model by surrounding three concepts of risk investment relationship, investment organization and initial enterprise in the risk investment relationship, corresponding attribute description and the mutual relationship between every two concepts.
The risk investment ontology knowledge model relates to three concepts of risk investment relationship, investment institutions and initial enterprises. The initial enterprise has attribute information such as name, Slogan, belonging region, official website, product description, establishment time, company business state, contact telephone, contact mailbox, company address, whether the enterprise belongs to overseas or not and the like; the investment organization has attribute information such as investor name, official network, investor description, interest field, investor phone, investor mailbox, investor address, investor whether overseas or not, etc.; the concept of the risk investment relationship comprises the occurrence time of the investment event, the description of the investment event, the belonging round, the investment amount, the investor and the financing party. The concept of the risk investment relationship and the investment organization belong to the relation of investors, the concept of the risk investment relationship and the initial enterprise belong to the relation of financers, and the investment organization and the initial enterprise belong to the relation of investment.
S2: and (3) respectively extracting knowledge from the obtained semi-structured text and the structured text by taking the multi-source heterogeneous risk investment information corpus as a data source for constructing a knowledge base, wherein the related knowledge comprises the following steps: the method comprises the steps of entity extraction of investment institutions and initial enterprises, investment event extraction, and extraction of related attributes of each entity and each event. The method comprises the following specific steps:
s2-1: extracting data from the semi-structured risk investment data, analyzing each text organization structure, and designing a corresponding matching template through a regular expression to extract knowledge.
For example: for the obtained semi-structure text data, by manually analyzing the div label positions of the corresponding elements, the attribute values of the corresponding element positions are extracted through regular expressions like "> (. once) <".
S2-2: data is extracted from unstructured inauguration investment news text data. The method comprises the following specific steps:
s2-2-1: and extracting various entities related to the investment risk from a large amount of news text data through a BilSTM-CRF deep learning model obtained by training the marked unstructured investment risk news text data.
S2-2-2: and merging the risk investment knowledge extracted in the S2-1 and the S2-2.
S2-2-3: and extracting context relation expression templates containing the investment relation pairs from the risk investment news text data through manual editing according to the investment relation pairs (initial enterprises and investment institutions) extracted in the S2-2-2.
S2-2-4: the feature vector of the context relationship representation template in S2-2-3 is calculated by training the obtained Doc2Vec model on unstructured inauguration investment news text data.
S2-2-5: calculating cosine similarity of each sentence characteristic vector and each context relation representation template characteristic vector in the risk investment news text data, and screening the cosine similarity which is greater than a certain threshold value T1Risk investment relation similar statement;
s2-2-6: and extracting knowledge of the speculation relation similar sentences screened in the steps by using a BilSTM-CRF deep learning model in S2-2-1 to obtain new speculation knowledge.
S3: and performing knowledge fusion on the new investment risk knowledge extracted in the S2-2-6. The method comprises the following specific steps:
s3-1: calculating the ratio R of the minimum single character editing operation times and the entity character length between each obtained mechanism entity and other mechanism entities by an Edit Distance (Edit Distance) algorithm, measuring the similarity between the entities by the size of 1-R, and setting a certain similarity threshold T2Controlling the number of candidate similar institution entities.
For example: the minimum number of single character editing operations of "ali bara group" and "ali bara" is 2 by the Edit Distance (Edit Distance) algorithm, so the similarity relationship between the two examples is 1-2/6 ═ 0.667, i.e. greater than a preset threshold value of similarity T2When 0.6, the two examples are combined.
S3-2: training unstructured risk investment news text data through a Word2Vec algorithm with a training mode of Skip-gram to obtain Word vectors of all investment entities, calculating the similarity among all mechanism entities, and setting a certain similarity threshold T3Controlling the number of candidate similar institution entities.
S3-3: and merging the candidate similar institution entities extracted in the S3-1 and the S3-2.
S3-4: comparing the attribute values of the risk investment relations of the candidate similar entities in the entities, fusing the candidate similar entities into the same entity and fusing the corresponding investment relations if two same attribute values are worth of the similar entities, and automatically completing the risk investment relations of the coreference relations.
S3-5: and repeating the step S3-4 until the total investment relation quantity is stable and is not changed.
S3-6: and (4) voting is adopted for each attribute value of the risk investment events with the common reference relationship among multiple sources, namely the attribute value with the majority is the final attribute value.
S3-7: converting the attribute value of the numerical type into the numerical value of the same unit, converting the attribute value of the date type into the character mode of a uniform format, and uniformly appointing the attribute value of the character type.
S4: and performing knowledge representation on the processed data of S3 in a triple form with a Resource Description Framework (RDF) as a basic framework according to the ontology knowledge model in S1, and storing the data in an RDF-oriented triple database.
For example: the establishment time of the 'lead cover' of the initial enterprise is that the RDF is expressed in 7 months in 2019 in the following way: < http:// www.v2s.com # startup/lead > < http:// www.v2s.com # build _ time > "2019.07".
S5: and a semantic search function facing the investment risk knowledge graph is provided, so that a user can conveniently retrieve investment risk information. The method comprises the following specific steps:
s5-1: constructing a risk investment dictionary base of each entity and relationship category in the risk investment knowledge graph;
s5-2: providing a series of risk investment query template expressions with variables;
s5-3: performing word segmentation processing on a user query input sentence in a mode of combining a risk investment dictionary library in the S5-1 and a trained BilSTM-CRF deep learning model in the S2-2-1 to obtain a risk investment entity, attribute or relation keyword to be searched;
s5-4: obtaining Word vectors of the extracted to-be-searched inauguration investment entities, attributes or relationship keywords in S5-3 by a Word2Vec model trained in S3-2, and calculating the most similar keywords with the Word vectors of the inauguration investment dictionary library in S5-1 by cosine similarity, so as to link the to-be-searched keywords input by the user to related entities, attributes or relationships in an inauguration investment knowledge graph;
s5-5: constructing candidate SPARQL query sentences by combining the risk investment query template expression of S5-2 and the keywords to be searched of S5-4, and obtaining query results;
s5-6: and analyzing the query result and returning the query result to the user.
For example: and (3) user input: when NOME home furnishing is invested, the system identifies that an entity name to be inquired by a user is NOME furniture, identifies that attributes to be inquired are mapped to the occurrence time of an investment event, and acquires the occurrence time of the investment event in the risk investment relation concept by traversing the SPARQL query statement template.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for semantic search of inauguration investment based on knowledge-graph is characterized by comprising the following steps:
s1: providing a risk investment ontology knowledge model;
s2: extracting first risk investment knowledge data from the obtained semi-structured text information and extracting second risk investment knowledge data from the unstructured text information by taking the multi-source heterogeneous risk investment information corpus as a data source for constructing a knowledge base;
s3: performing data fusion according to the investment risk knowledge data and the second investment risk knowledge data to obtain fusion investment risk knowledge data, wherein the data fusion is used for unifying names representing the same thing;
s4: expressing the fused investment risk knowledge data in a triple form according to the investment risk ontology knowledge model to generate an investment risk knowledge graph;
s5: and providing a semantic search engine based on the investment risk knowledge graph.
2. The method of intellectual graph based speculation semantic search claimed in claim 1, wherein said extracting first speculation knowledge data for obtaining semi-structured textual information comprises:
and acquiring the structural information of the semi-structured text information, and designing a corresponding matching template through a regular expression to extract the first investment risk knowledge data.
3. The method of intellectual graph based speculation semantic search claimed in claim 2 wherein said extracting second speculation knowledge data for unstructured textual information comprises:
extracting entity information related to the investment risk from the invested risk news text data through a BilSTM-CRF deep learning model obtained by training the labeled unstructured invested risk news text data;
merging the first investment risk knowledge data and the entity information to obtain first investment risk knowledge information;
extracting a switching relation pair extracted from the first investment risk knowledge information, and extracting context relation expression templates containing the investment relation pair from investment risk news text data;
calculating a feature vector of each context expression template according to a Doc2Vec model obtained by training the risk investment news text data;
calculating cosine similarity between the feature vector of each sentence in the investment risk news text data and the feature vector, and screening investment risk relation similar sentences larger than a first preset threshold value;
and extracting the second investment risk knowledge data from the investment risk relationship similar statement through the BilSTM-CRF deep learning model.
4. The method for semantic search of investments in risk based on knowledge-graph according to claim 3, wherein S3 specifically comprises:
s3-1: calculating the ratio R of the minimum single character editing operation times and the entity character length between the entity information and other mechanism entities by an editing distance algorithm, measuring the similarity degree between the entities by the size of 1-R, and setting a first similarity threshold value to control the number of first candidate similar mechanism entities;
s3-2: training the risk investment news text data through a Word2Vec algorithm with a training mode of Skip-gram to obtain Word vectors of all investment entities, calculating the similarity among all mechanism entities, and setting a second similarity threshold to control the number of second candidate similar mechanism entities;
s3-3: merging the first candidate similar mechanism entity and the second candidate similar mechanism entity to obtain a merged candidate similar mechanism entity;
s3-4: comparing the attribute values of the risk investment relations of the candidate similar entities in the merged candidate similar organization entity, if two same attribute values are worth of the similar entities, fusing the candidate similar entities into the same entities and fusing the corresponding investment relations, and simultaneously automatically completing the risk investment relations of the coreference relations;
s3-5: repeating S3-4 until the total investment relation quantity is stable and is not changed;
s3-6: setting the attribute values of the risk investment events with the coreference relationship among the multiple sources as final attribute values;
s3-7: and converting the attribute value of the numeric type into a numeric value of the same unit, converting the attribute value of the date type into a character mode of a uniform format, and uniformly appointing the attribute value of the character type.
5. The method for semantic search of investments in risk based on knowledge-graph according to claim 4, wherein S5 specifically comprises:
s5-1: constructing a risk investment dictionary base of each entity and relationship category in the risk investment knowledge graph;
s5-2: providing a risk investment query template expression with variables;
s5-3: combining the inauguration investment dictionary library with the BilSTM-CRF deep learning model, and performing word segmentation processing on a user query input sentence to obtain an inauguration investment entity, attributes or relationship keywords to be searched;
s5-4: obtaining the Word vectors of the extracted entities, attributes or relational keywords of the to-be-searched investments by the Word2Vec model, calculating the most similar keywords by cosine similarity with the Word vectors of the investment risk dictionary library, and linking the keywords to be searched input by the user to the related entities, attributes or relations in the investment risk knowledge graph;
s5-5: combining the risk investment query template expression and the keywords to be searched to construct candidate query sentences and obtain query results;
s5-6: and analyzing the query result and returning the query result to the user.
6. The method for semantic search of investments based on knowledge graph according to claim 1, wherein in S4, the investment risk ontology knowledge model is subjected to knowledge representation of the processed data in the form of triples of a resource description framework to generate the investment risk knowledge graph, and the investment risk knowledge graph is stored in the triples database facing the resource description framework.
7. The method for semantic intellectual graph based speculation search as claimed in claim 1, further comprising, at S1: and constructing the risk investment ontology knowledge model through the risk investment relationship, the attribute description corresponding to the investment institution and the initial enterprise and the mutual relationship between every two investment institutions and the initial enterprise.
8. The method for intellectual graph based speculation semantic search as claimed in claim 3, wherein the first predetermined threshold is 0.65.
9. The method of intellectual graph based speculation semantic search claimed in claim 4, wherein the first similarity threshold is 0.6.
10. The method of intellectual graph based speculation semantic search claimed in claim 4, wherein the second similarity threshold is 0.6.
CN202010291157.XA 2020-04-14 2020-04-14 Knowledge graph-based method for searching investment risk semantics Pending CN111598702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010291157.XA CN111598702A (en) 2020-04-14 2020-04-14 Knowledge graph-based method for searching investment risk semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010291157.XA CN111598702A (en) 2020-04-14 2020-04-14 Knowledge graph-based method for searching investment risk semantics

Publications (1)

Publication Number Publication Date
CN111598702A true CN111598702A (en) 2020-08-28

Family

ID=72190293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010291157.XA Pending CN111598702A (en) 2020-04-14 2020-04-14 Knowledge graph-based method for searching investment risk semantics

Country Status (1)

Country Link
CN (1) CN111598702A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330183A (en) * 2020-11-18 2021-02-05 布瑞克农业大数据科技集团有限公司 Method and system for constructing big data portrait of agricultural enterprise
CN112597315A (en) * 2020-12-28 2021-04-02 中国航天系统科学与工程研究院 System model map construction method based on SysML meta-model ontology
CN112732945A (en) * 2021-03-30 2021-04-30 中国电子技术标准化研究院 Standard knowledge graph construction and standard query method and device
CN112765310A (en) * 2020-12-11 2021-05-07 北京航天云路有限公司 Knowledge graph question-answering method based on deep learning and similarity matching
CN113051249A (en) * 2021-03-22 2021-06-29 江苏杰瑞信息科技有限公司 Cloud service platform design method based on multi-source heterogeneous big data fusion
CN113342987A (en) * 2021-04-21 2021-09-03 国网浙江省电力有限公司杭州供电公司 Composite network construction method of special corpus for power distribution DTU acceptance
CN113872794A (en) * 2021-08-17 2021-12-31 北京邮电大学 IT operation and maintenance platform system based on cloud resource support and operation and maintenance method thereof
CN115080694A (en) * 2022-06-27 2022-09-20 国网甘肃省电力公司电力科学研究院 Power industry information analysis method and equipment based on knowledge graph
CN115358201A (en) * 2022-08-03 2022-11-18 浙商期货有限公司 Processing method and system for delivery and research report in futures field
CN116975313A (en) * 2023-09-25 2023-10-31 国网江苏省电力有限公司电力科学研究院 Semantic tag generation method and device based on electric power material corpus
CN117633518A (en) * 2024-01-25 2024-03-01 北京大学 Industrial chain construction method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates
CN108932340A (en) * 2018-07-13 2018-12-04 华融融通(北京)科技有限公司 The construction method of financial knowledge mapping under a kind of non-performing asset operation field

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN108596439A (en) * 2018-03-29 2018-09-28 北京中兴通网络科技股份有限公司 A kind of the business risk prediction technique and system of knowledge based collection of illustrative plates
CN108932340A (en) * 2018-07-13 2018-12-04 华融融通(北京)科技有限公司 The construction method of financial knowledge mapping under a kind of non-performing asset operation field

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330183A (en) * 2020-11-18 2021-02-05 布瑞克农业大数据科技集团有限公司 Method and system for constructing big data portrait of agricultural enterprise
CN112765310A (en) * 2020-12-11 2021-05-07 北京航天云路有限公司 Knowledge graph question-answering method based on deep learning and similarity matching
CN112597315B (en) * 2020-12-28 2023-07-14 中国航天系统科学与工程研究院 System model map construction method based on SysML meta-model ontology
CN112597315A (en) * 2020-12-28 2021-04-02 中国航天系统科学与工程研究院 System model map construction method based on SysML meta-model ontology
CN113051249A (en) * 2021-03-22 2021-06-29 江苏杰瑞信息科技有限公司 Cloud service platform design method based on multi-source heterogeneous big data fusion
CN112732945A (en) * 2021-03-30 2021-04-30 中国电子技术标准化研究院 Standard knowledge graph construction and standard query method and device
CN113342987A (en) * 2021-04-21 2021-09-03 国网浙江省电力有限公司杭州供电公司 Composite network construction method of special corpus for power distribution DTU acceptance
CN113342987B (en) * 2021-04-21 2024-05-14 国网浙江省电力有限公司杭州供电公司 Composite network construction method of distribution DTU acceptance special corpus
CN113872794A (en) * 2021-08-17 2021-12-31 北京邮电大学 IT operation and maintenance platform system based on cloud resource support and operation and maintenance method thereof
CN115080694A (en) * 2022-06-27 2022-09-20 国网甘肃省电力公司电力科学研究院 Power industry information analysis method and equipment based on knowledge graph
CN115358201A (en) * 2022-08-03 2022-11-18 浙商期货有限公司 Processing method and system for delivery and research report in futures field
CN116975313A (en) * 2023-09-25 2023-10-31 国网江苏省电力有限公司电力科学研究院 Semantic tag generation method and device based on electric power material corpus
CN116975313B (en) * 2023-09-25 2023-12-05 国网江苏省电力有限公司电力科学研究院 Semantic tag generation method and device based on electric power material corpus
CN117633518A (en) * 2024-01-25 2024-03-01 北京大学 Industrial chain construction method and system
CN117633518B (en) * 2024-01-25 2024-04-26 北京大学 Industrial chain construction method and system

Similar Documents

Publication Publication Date Title
CN111598702A (en) Knowledge graph-based method for searching investment risk semantics
US10698977B1 (en) System and methods for processing fuzzy expressions in search engines and for information extraction
US11989519B2 (en) Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system
WO2020233261A1 (en) Natural language generation-based knowledge graph understanding assistance system
CN109684448B (en) Intelligent question and answer method
US11823074B2 (en) Intelligent communication manager and summarizer
US11080295B2 (en) Collecting, organizing, and searching knowledge about a dataset
US9092428B1 (en) System, methods and user interface for discovering and presenting information in text content
WO2021120627A1 (en) Data search matching method and apparatus, computer device, and storage medium
US20080052262A1 (en) Method for personalized named entity recognition
Adrian et al. Contag: A semantic tag recommendation system
Wang et al. Knowledge graph construction and applications for web search and beyond
Jayaram et al. A review: Information extraction techniques from research papers
CN110555205A (en) negative semantic recognition method and device, electronic equipment and storage medium
CN112115252A (en) Intelligent auxiliary writing processing method and device, electronic equipment and storage medium
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
CN115344666A (en) Policy matching method, device, equipment and computer readable storage medium
CN114153994A (en) Medical insurance information question-answering method and device
Lamba et al. Sentiment analysis
CN114840685A (en) Emergency plan knowledge graph construction method
Patil et al. Novel technique for script translation using NLP: performance evaluation
CN114896387A (en) Military intelligence analysis visualization method and device and computer readable storage medium
US11966698B2 (en) System and method for automatically tagging customer messages using artificial intelligence models
Sharma et al. Natural language processing and big data: a strapping combination
CN115017271B (en) Method and system for intelligently generating RPA flow component block

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination