CN114896423A - Construction method and system of enterprise basic information knowledge graph - Google Patents

Construction method and system of enterprise basic information knowledge graph Download PDF

Info

Publication number
CN114896423A
CN114896423A CN202210686880.7A CN202210686880A CN114896423A CN 114896423 A CN114896423 A CN 114896423A CN 202210686880 A CN202210686880 A CN 202210686880A CN 114896423 A CN114896423 A CN 114896423A
Authority
CN
China
Prior art keywords
enterprise
information
knowledge
entity
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210686880.7A
Other languages
Chinese (zh)
Inventor
关皓天
张宏莉
王星
刘立坤
刘春雨
孟超
孙庆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202210686880.7A priority Critical patent/CN114896423A/en
Publication of CN114896423A publication Critical patent/CN114896423A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)

Abstract

The invention provides a method and a system for constructing an enterprise basic information knowledge graph. The construction method comprises the following steps: firstly, performing data crawler on a website containing basic information of a company, and acquiring related data required by completing a knowledge graph; secondly, knowledge extraction is carried out on the constructed data set: entity extraction, relationship extraction, attribute extraction, and explicit study of objects from complex data sets; and then, carrying out knowledge fusion on the obtained entity, relation and attribute set, completing the establishment of entity-relation-entity or entity-attribute value triples, completing the construction process of the knowledge graph, and completing the knowledge reasoning by utilizing the combination of the constructed knowledge graph and Markov logic network structure learning. The invention builds a small knowledge map, not only makes an encyclopedia knowledge base containing enterprise information, but also can accurately predict all aspects of information of enterprises lacking information by further utilizing predicate expression and Markov logic network.

Description

Construction method and system of enterprise basic information knowledge graph
Technical Field
The invention relates to the technical field of knowledge graph construction, in particular to a method and a system for constructing an enterprise basic information knowledge graph.
Background
Due to ubiquitous business risks, when an enterprise cooperates and selects partners, the enterprise needs to know all-around enterprise information such as the operation condition, the loss record, the enterprise assets, the patent application condition, the operation scale, the financial information, the high-management information, the violation record and the like of the partners, and the information is used as a reference for cooperation. Meanwhile, if an enterprise is fingered about various information of the enterprise, the enterprise can also be used as an important basis in enterprise decision making. For the country, it is beneficial to know all the information of the enterprise, such as tax payment situation of the enterprise. The domain knowledge graph constructed by taking the basic enterprise information as the main content can efficiently and completely store the information, and carry out other works such as knowledge reasoning and the like by depending on the knowledge graph.
The foreign scholars Hook summarize the application of the knowledge graph as follows: the knowledge graph is considered to have four purposes (discovery, understanding, communication and education) and six applications (field-specific microscopic display, subject macroscopic visualization, education assistance for educators in course teaching, coordination of document knowledge, convenience in using digital libraries and display of knowledge dissemination). Although knowledge graph theory and practice are continuously developed and advanced in recent years, various knowledge graph applications emerge endlessly, most published researches mainly explain one link or aspect in knowledge graph construction in theory or summarization, and enterprise knowledge graph aspects are less researched.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to one aspect of the invention, a method for constructing an enterprise basic information knowledge graph is provided, and the method comprises the following steps:
constructing a data set containing enterprise information, enterprise academic paper information and enterprise patent information of a plurality of enterprises;
selecting entities, attributes and relationships among the entities in a knowledge-graph framework in the data set;
and carrying out knowledge fusion on the entity, relationship and attribute set, completing the triple establishing process of entity-relationship-entity or entity-attribute value, and completing the establishment of the enterprise basic information knowledge graph.
Further, still include: and converting the triples into predicate expressions by using the constructed knowledge graph, and combining the predicate expressions with a Markov logic network structure to finish knowledge reasoning.
Further, enterprises, enterprise high management, stock holding information records, funds, institutions, enterprise academic papers and enterprise patent information are selected from the data set as entities of the knowledge map, wherein the stock holding information records comprise fund stock holding information records and institution stock holding information records; selecting attributes of each entity in the dataset as follows:
a. an enterprise: company name, English name, president, main stockholder, established date, main business, company brief introduction, number of employees, number of management layer persons, listed date, issued amount, issued price, trading market, contact telephone, postal code, fax, e-mail box, company website, registered address, office address;
b. enterprise high management: high administration name, high administration job, high administration salary, high administration annual salary currency unit;
c. fund holdings information record or institution holdings information record: date, holder, share of hold, share proportion, rate of change, share change, amount of change, percentage of combination;
d. enterprise academic papers: academic paper number, academic paper title, academic paper author, paper abstract, publication date;
e. enterprise patent information: patent title, patent application number, patent application date, patent publication date, patent applicant.
Further, the relationship between different entities is specifically determined as follows: the relationship between the enterprise and the enterprise high management is a manager; the relation between the enterprise and the enterprise academic papers is the holding academic papers; the relation between the enterprise and the enterprise patent information is patent information; the relation between the enterprise and the fund holdings information record is fund holdings; the relationship between the enterprise and the organization holding information records is organization holding; the relationship between the fund and the fund holdings information record is fund holdings; the relationship between the organization and the organization holdinginformation records is organization holding.
Further, the learning process of the markov logic network structure is as follows:
obtaining a clause set;
initializing learning weight and optimal expected value; setting a flag bit equal to 0;
searching an optimal clause, if the optimal clause is empty, adding 1 to the flag bit, and continuing searching; if the optimal clause is not empty, adding the optimal clause into the Markov logic network, and calculating the optimal expectation;
judging whether the value of the flag bit is equal to 2, if so, ending, and if not, continuing to search for the optimal clause;
the optimal clauses are obtained after the clauses are connected with the predicates; the optimal expectation is an evaluation standard for judging the result of the connection between the clause and the predicate, and influences the weight of the clause finally obtained.
According to another aspect of the invention, a system for constructing an enterprise basic information knowledge graph is provided, which comprises:
a data set acquisition module configured to construct a data set containing enterprise information, enterprise academic paper information, and enterprise patent information of a plurality of enterprises;
a knowledge-graph building module configured to select entities, attributes, and relationships between entities in a knowledge-graph framework in the dataset; and carrying out knowledge fusion on the entity, relation and attribute set, completing the process of establishing an entity-relation-entity or entity-attribute value triple, and completing the construction of the enterprise basic information knowledge graph.
Further, a knowledge inference module is included and configured to transform the triples into predicate representations using the constructed knowledge graph and in conjunction with the markov logic network structure to complete the knowledge inference.
Further, in the knowledge graph construction module, enterprises, enterprise high management, stock holding information records, funds, institutions, enterprise academic papers and enterprise patent information are selected as entities of the knowledge graph in the data set, wherein the stock holding information records comprise fund stock holding information records and institution stock holding information records; selecting attributes of each entity in the dataset as follows:
a. an enterprise: company name, English name, president, main stockholder, established date, main business, company brief introduction, number of employees, number of management layer persons, listed date, issued amount, issued price, trading market, contact telephone, postal code, fax, e-mail box, company website, registered address, office address;
b. enterprise high management: high administration name, high administration job, high administration salary, high administration annual salary currency unit;
c. fund holdings information record or institution holdings information record: date, holder, holding share, share ratio, change rate, share change, change amount, and share combination ratio;
d. enterprise academic papers: academic paper number, academic paper title, academic paper author, paper abstract, publication date;
e. enterprise patent information: patent title, patent application number, patent application date, patent publication date, patent applicant.
Further, the relationship between different entities in the knowledge graph building module is specifically determined as follows: the relationship between the enterprise and the enterprise high management is a manager; the relation between the enterprise and the enterprise academic papers is the holding academic papers; the relation between the enterprise and the enterprise patent information is patent information; the relation between the enterprise and the fund holdings information record is fund holdings; the relationship between the enterprise and the organization holding information records is organization holding; the relationship between the fund and the fund holdings information record is fund holdings; the relationship between the organization and the organization holdinginformation records is organization holding.
Further, the learning process of the markov logic network structure in the knowledge inference module is as follows:
obtaining a clause set;
initializing learning weight and optimal expected value; setting a flag bit equal to 0;
searching an optimal clause, if the optimal clause is empty, adding 1 to the flag bit, and continuing searching; if the optimal clause is not empty, adding the optimal clause into the Markov logic network, and calculating the optimal expectation;
judging whether the value of the flag bit is equal to 2, if so, ending, and if not, continuing to search for the optimal clause;
the optimal clauses are obtained after the clauses are connected with the predicates; the optimal expectation is an evaluation standard for judging the result of the connection between the clause and the predicate, and influences the weight of the clause finally obtained.
The beneficial technical effects of the invention are as follows:
the invention takes the basic information collected from different enterprises as a data set, constructs and completes a knowledge graph and carries out simple analysis based on the graph, and the method comprises the following steps: acquiring basic enterprise information based on a web crawler to construct a data set; extracting knowledge according to the actual condition of the data set, establishing entities, relations and attributes, and designing a basic information knowledge graph Schema of an enterprise; and completing construction of the knowledge graph by using the data storage graph database according to the Schema. The invention completes the construction of a small knowledge graph, not only makes an encyclopedic knowledge base containing enterprise information, but also performs weight calculation on nodes of the missing relation of the knowledge graph by using predicate expression and Markov logic network, thus for newly added enterprises with missing information, more accurate prediction can be performed on information in all aspects by using calculation results.
Drawings
The present invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, and which are used to further illustrate preferred embodiments of the present invention and to explain the principles and advantages of the present invention.
FIG. 1 is a flow chart of a method for constructing an enterprise basic information knowledge graph according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an example of distribution of data volumes of enterprise information in an embodiment of the present invention;
FIG. 3 is a flow chart of a Markov logic network structure learning algorithm in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, exemplary embodiments or examples of the disclosure are described below with reference to the accompanying drawings. It is obvious that the described embodiments or examples are only some, but not all embodiments or examples of the invention. All other embodiments or examples obtained by a person of ordinary skill in the art based on the embodiments or examples of the present invention without any creative effort shall fall within the protection scope of the present invention.
The invention mainly researches the knowledge graph constructed based on the basic information of the enterprise, and the whole design scheme is the construction process of the knowledge graph. Firstly, performing data crawler on a website containing basic information of a company, and acquiring related data required by completing a knowledge graph; secondly, performing knowledge extraction (entity extraction, relation extraction and attribute extraction) on the constructed data set, and determining a research object from the complex data set; then, carrying out knowledge fusion and knowledge combination on the obtained entity, relationship and attribute set to complete the triple establishing process of entity-relationship-entity or entity-attribute value; and finally, storing the obtained triple relation into a graph database to complete the body construction process of the knowledge graph, and completing other works such as knowledge reasoning and the like by combining the constructed knowledge graph with Markov logic network structure learning.
The embodiment of the invention provides a method for constructing an enterprise basic information knowledge graph, which comprises the following steps of:
firstly, constructing a knowledge graph data set;
according to the embodiment of the invention, data acquisition is carried out on three types of websites, namely a financial website, an academic paper, an invention patent and a Wikipedia website through a crawler program, and the data is stored in a relational database management system, namely MySQL, so that the construction of an enterprise basic information data set is completed.
Specifically, the method comprises the steps of accessing form source data of a webpage of the financial website through a crawler program, directly storing the json format source data on the webpage to MySQL without analysis, and compiling a Java program to analyze the crawled data and store the crawled data to MySQL again after storage. In this embodiment, 13 pieces of page basic information of 418 companies are collected and put in storage, 5434 pieces of accumulated crawled pages are stored in 16 ten thousand pieces of MySQL database data, and the data amount distribution is shown in fig. 2.
And acquiring academic thesis and patent website data by using a requests frame, and processing and extracting the academic thesis and patent website data through a Beautiful Soup packet. The academic papers and the patent inventions are searched on the website by taking the company name as a keyword, and the search result is stored in a database as the academic papers and the patent inventions of the company. Since the authors of academic papers and patent applications are usually multiple persons, the contents of the cited documents are also plural. Therefore, a MongoDB database is selected in the storage of the academic papers and the invention patents of the company, all data are stored in the same collection, and migration from the MongoDB to MySQL is realized after crawling is completed so as to unify the storage of the databases.
Wikipedia webpage data are obtained by a crawler program through a requests frame, and website data are processed and extracted through a Beautiful Soup packet. The construction of the patent information crawler URL of different companies is completed by modifying the query _ txt content in the URL of different companies, and crawling of different pages is completed by modifying p 1 in the URL, so that the URL of each patent is obtained. The enterprise patent information is directly stored in a relational database MySQL, and the URL, the patent inventor and the patent details are respectively stored in a data table.
And step two, performing screening optimization operation on the data set in MySQL to enable the data to be easier to perform the operation of the next step, including the division and combination of a data table, the writing and modification of data fields and the like.
Establishing an entity-relationship-attribute in the knowledge graph; and performing knowledge fusion on the entity, relationship and attribute set, completing the process of establishing an entity-relationship-entity or entity-attribute value triple, and completing the construction of the enterprise basic information knowledge graph.
According to the embodiment of the invention, the obtained information such as company brief introduction, company high management, stock holding information, dynamic key indexes, company income, growth rate, cash flow state, financial health indexes, asset turnover rate indexes, dividend interest, asset liability statement, comprehensive profit and loss statement, cash flow statement and the like is used as a data source. Selecting enterprises, enterprise high management and stock holding information records, funds and organizations in the data source as entity types of the knowledge graph; in the data set, the enterprise academic papers and the enterprise patent information have separate data tables, and the academic papers also comprise authors, titles and other attributes, so that the academic papers and the titles also exist as entities in the knowledge map; the remaining data will appear in the knowledge-graph as attributes of these entity types.
After the entities are established, relationships need to be added between the entities, and relationships are added between different entities according to the entities established in the previous paragraph, and the relationships among the entities are specifically as follows:
enterprise-manager-enterprise high management;
enterprise-holding academic papers-enterprise academic papers;
enterprise-holding patent information-enterprise patent information;
enterprise-fund holding record;
fund holding record-fund holding-fund;
enterprise-organization stock holding record;
mechanism holding strand record-mechanism holding strand-mechanism.
Attributes for specific entity types are established as follows:
a. an enterprise: company name, English name, president, main stockholder, established date, main business, company brief introduction, number of employees, number of management layer persons, listed date, issued amount, issued price, trading market, contact telephone, postal code, fax, e-mail box, company website, registered address, office address;
b. enterprise high management: high administration name, high administration job, high administration salary, high administration annual salary currency unit;
c. fund & organization holdings information recording: date, holder, share of hold, share proportion, rate of change, share change, amount of change, percentage of combination;
d. enterprise patent information: patent title, patent application number, patent application date, patent publication date, patent applicant;
e. enterprise academic papers: academic paper number, academic paper title, academic paper author, paper abstract, publication date.
After the entity types, relationships and attributes are established according to the data set, a knowledge map Schema can be obtained, and the data are further stored into a map database according to the design.
Storage of knowledge in a graph database: after the entities, the relations and the attributes are clarified, the related knowledge stored in the relational database MySQL is transferred to a Graph database (Janus Graph), the entities are stored as the points of the Graph database, the relations among the entities are stored in the form of edges, and the attributes are stored as the attributes of the points.
And step four, obtaining predicate logic by using data of the knowledge graph, and finishing the process of knowledge reasoning by combining a Markov logic network.
According to embodiments of the present invention, predicate logic, also known as predicate calculus, allows quantification of a stated formula, a formal system used in mathematics, philosophy, linguistics, and computer science. In predicate logic, an atomic proposition is decomposed into individual words and predicates. The individual words are things or objects which can independently exist, and the predicates are words for describing the properties of the individual words, namely describing certain relational expressions among the things. In the process of knowledge graph generation, predicate calculus is the best expression form. Knowledge graphs can extract many valuable predicates.
The expression of the predicate is usually as follows: the phrases such as "… is …", "… is …", "… has … properties", "… and … have … relationship" and the like represent the properties or actions of things or the properties or the relationship are called predicates. In the knowledge graph, a large amount of triple information exists, and the triple information is basically divided into an entity-relationship-entity type and an entity-attribute value type. Both cases can be converted to predicate representations. For example, entity a "tim cusk" and entity B "apple inc", it is known to establish a relationship Manager between entity a and entity B, with entity a pointing to entity B. By combining the above, a predicate expression Manager (a, B) can be obtained, and the specific content of the expression is "timkuck is the Manager of apple inc. Similarly, there is the same predicate expression for "entity-attribute value", and similarly taking tim couk as an example, the predicate expression Position (a, c) can be obtained, where the entity a is "tim couk", the attribute value c is "chief executive officer", and the concrete content of the expression is "chief executive officer" whose Position is "tim couk".
For knowledge stored in the knowledge graph, the conversion from the knowledge graph to the predicate representation can be completed in the same form. Since the predicate expressions of the "entity-attribute value" type are too numerous in the knowledge graph spectrum, only the predicate expressions of all the "entity-relationship-entity" types are listed here, specifically as follows:
manager (A, B), entity type A "Enterprise high management", entity type B "Enterprise";
hold _ paper (B, C), entity type B "business", entity type C "academic papers of business";
hold _ depend (B, D), entity type B "enterprise", entity type D "enterprise patent information";
fund _ holding (E, B), entity type E "Fund holdup record", entity type B "Enterprise";
organization _ holding (F, B), entity type F "organizational holdings record", entity type B "enterprise";
fund _ holding (G, E), entity type G "Fund", entity type E "Fund holdings record";
organization _ holding (H, F), entity type H "organization", entity type F "organization holdings record";
author (I, C), entity type I "academic paper Author", entity type C "enterprise academic paper";
inventor (J, D), entity type J "patent Inventor", entity type D "enterprise patent information".
The Markov logic network is a statistical relationship learning model combining a Markov network and first-order logic, and is a Markov network based on a probability map model. The basic idea is to combine first-order logic rules while relaxing the hard rules, i.e. for a specific problem, when one of the rules is violated, the probability of existence will decrease, but the probability will not decrease to 0. When a problem violates a rule rarely, the problem is likely to exist. For the described degree of constraint, it can be expressed by a weight value in theory, and a specific weight value is added to the prepared rule and the weight value is used to reflect the constraint force of the rule on the problem meeting the rule. If a rule is weighted more heavily, the difference between satisfying and not satisfying the rule will be greater for different problems. After the weight is infinite, the result from the Markov logic network will be closer to the result from the first order logic knowledge. All weights in the markov logic network represent an even distribution of all events that satisfy the knowledge base, and all implication problems can be judged by calculating whether the probability of the problem rule is 1.
Fig. 3 shows a specific algorithm flowchart of the markov logic network structure learning, which specifically includes:
obtaining a clause set;
initializing learning weight and optimal expected value; setting a flag bit equal to 0;
searching an optimal clause, if the optimal clause is empty, adding 1 to the flag bit, and continuing searching; if the optimal clause is not empty, adding the optimal clause into the Markov logic network, and calculating the optimal expectation;
and judging whether the value of the flag bit is equal to 2, ending if the value of the flag bit is equal to 2, and continuously searching for the optimal clause if the value of the flag bit is not equal to 2.
The optimal clause (BestClause) is obtained by connecting the clause with the predicate; and the optimal expectation (BestScore) is an evaluation standard for evaluating the result of the connection between the clause and the predicate, and influences the weight of the clause finally obtained.
Clause set CLAUSES represents all CLAUSES in markov logic network MLN and predicate logic P.
Pseudo code for the Markov logic network structure learning algorithm is as follows:
inputting predicate set P, Markov logic network MLN and enterprise knowledge base EKB
And (3) outputting: markov logic network MLN
Figure BDA0003699997440000081
The process of reasoning and acquiring knowledge from the knowledge graph can be completed by using a predicate set P obtained by the knowledge graph, a Markov logic network MLN and an enterprise knowledge base EKB through a Markov logic network structure learning algorithm. Several results are listed below.
1) The relationship between a certain enterprise and a patent inventor held by the enterprise:
the enterprise patent is important basic information of an enterprise, but the patent may be purchased by the enterprise company from the patent inventor, and the patent inventor invent is the patent of the enterprise employee and belongs to the enterprise as the enterprise wealth. The problem model conforms to a Markov logic network model, and by using a knowledge graph, the weight of an event that a patent inventor is a staff member of an enterprise to which the patent belongs is 3.3, and the clause expression form is as follows:
Figure BDA0003699997440000082
2) the relationship between the keywords of patent invented by a certain enterprise employee and the research direction of the enterprise is as follows:
a company employs employee, and the employee invents patent, keyword of the patent, and by utilizing Markov logic network structure learning and knowledge graph, the weight of the event that the research direction of the company is related to the keyword is 2.9, and the clause of the event is expressed as:
Figure BDA0003699997440000083
3) the relationship between keywords of academic papers published by employees of a certain enterprise and research directions of the enterprise:
a company employs employee, and the employee issues paper, keyword of paper, and by using Markov logic network structure learning and knowledge graph, it can be deduced that the weight of the event that the research direction of the company is related to the keyword is 5.6, and the clause is expressed as:
Figure BDA0003699997440000091
the basic enterprise information is the most important information of an enterprise, not only can provide detailed information of operation conditions, information losing records, enterprise assets, patent application conditions, operation scale, financial information, high management information, violation records and other enterprise comprehensive information of partners for the enterprise, but also can be used as a knowledge base of the enterprise to know the operation and scientific research conditions of the enterprise and make development planning for the enterprise.
The invention completes the construction of a small knowledge graph by collecting the basic information of more than 400 American large-scale enterprises, extracting knowledge, storing graph databases and the like, not only makes an encyclopedic knowledge base containing enterprise information, but also performs weight calculation on nodes of knowledge graph missing relations by using predicate expression and Markov logic network, thus for newly added enterprises with missing information, more accurate prediction can be performed on all aspects of information by using calculation results.
Another embodiment of the present invention provides a system for constructing an enterprise basic information knowledge graph, including:
a data set acquisition module configured to construct a data set containing enterprise information, enterprise academic paper information, and enterprise patent information of a plurality of enterprises;
a knowledge-graph building module configured to select entities, attributes, and relationships between entities in a knowledge-graph framework in a dataset; and performing knowledge fusion on the entity, relationship and attribute set, completing the process of establishing an entity-relationship-entity or entity-attribute value triple, and completing the construction of the enterprise basic information knowledge graph.
In this embodiment, optionally, the system further comprises a knowledge inference module configured to convert the triples into predicate representations by using the constructed knowledge graph, and complete knowledge inference in combination with the markov logic network structure.
In this embodiment, optionally, in the knowledge graph construction module, enterprises, enterprise high governance, stock holding information records, funds, institutions, enterprise academic papers, and enterprise patent information are selected as entities of the knowledge graph in the data set, and the stock holding information records include fund stock holding information records and institution stock holding information records; the attributes for selecting each entity in the dataset are determined as follows:
a. an enterprise: company name, English name, president, main stockholder, established date, main business, company brief introduction, number of employees, number of management layer persons, listed date, issued amount, issued price, trading market, contact telephone, postal code, fax, e-mail box, company website, registered address, office address;
b. enterprise high management: high administration name, high administration job, high administration salary, high administration annual salary currency unit;
c. fund holdings information record or institution holdings information record: date, holder, share of hold, share proportion, rate of change, share change, amount of change, percentage of combination;
d. enterprise academic papers: academic paper number, academic paper title, academic paper author, paper abstract, publication date;
e. enterprise patent information: patent title, patent application number, patent application date, patent publication date, patent applicant.
In this embodiment, optionally, the relationship between different entities in the knowledge graph building module is specifically determined as follows: the relationship between the enterprise and the enterprise high management is a manager; the relation between the enterprise and the enterprise academic papers is the holding academic papers; the relation between the enterprise and the enterprise patent information is patent information; the relation between the enterprise and the fund holdings information record is fund holdings; the relationship between the enterprise and the organization holding information records is organization holding; the relationship between the fund and the fund holdings information record is fund holdings; the relationship between the organization and the organization holdinginformation records is organization holding.
In this embodiment, optionally, the learning process of the markov logic network structure in the knowledge inference module is:
obtaining a clause set;
initializing learning weight and optimal expected value; setting a flag bit equal to 0;
searching an optimal clause, if the optimal clause is empty, adding 1 to the flag bit, and continuing searching; if the optimal clause is not empty, adding the optimal clause into the Markov logic network, and calculating the optimal expectation;
judging whether the value of the flag bit is equal to 2, if so, ending, and if not, continuing to search for the optimal clause;
the optimal clauses are obtained after the clauses are connected with the predicates; the optimal expectation is an evaluation standard for judging the result of the connection of the clauses and the predicates, and influences the weight of the clauses finally obtained.
The functions of the system for constructing an enterprise basic information knowledge graph according to the embodiment of the present invention can be described by the method for constructing an enterprise basic information knowledge graph, so that the detailed description of the embodiment is omitted, and reference may be made to the above method embodiment, which is not described herein again.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (10)

1. A construction method of an enterprise basic information knowledge graph is characterized by comprising the following steps:
constructing a data set containing enterprise information, enterprise academic paper information and enterprise patent information of a plurality of enterprises;
selecting entities, attributes and relationships among the entities in a knowledge-graph framework in the data set;
and performing knowledge fusion on the entity, relationship and attribute set, completing the process of establishing an entity-relationship-entity or entity-attribute value triple, and completing the construction of the enterprise basic information knowledge graph.
2. The method for constructing an enterprise basic information knowledge graph according to claim 1, further comprising: and converting the triples into predicate expressions by using the constructed knowledge graph, and combining the predicate expressions with a Markov logic network structure to finish knowledge reasoning.
3. The method for constructing the basic information knowledge graph of the enterprise as claimed in claim 1 or 2, wherein enterprises, enterprise high management, stock holding information records, fund, institutions, enterprise academic papers and enterprise patent information are selected in the data set as entities of the knowledge graph, and the stock holding information records comprise fund stock holding information records and institution stock holding information records; selecting attributes of each entity in the dataset as follows:
a. an enterprise: company name, English name, president, main stockholder, established date, main business, company brief introduction, number of employees, number of management layer persons, listed date, issued amount, issued price, trading market, contact telephone, postal code, fax, e-mail box, company website, registered address, office address;
b. enterprise high management: high administration name, high administration job, high administration salary, high administration annual salary currency unit;
c. fund holdings information record or institution holdings information record: date, holder, share of hold, share proportion, rate of change, share change, amount of change, percentage of combination;
d. enterprise academic papers: academic paper number, academic paper title, academic paper author, paper abstract, publication date;
e. enterprise patent information: patent title, patent application number, patent application date, patent publication date, patent applicant.
4. The method for constructing an enterprise basic information knowledge graph according to claim 3, wherein the relationship between different entities is specifically determined as follows: the relationship between the enterprise and the enterprise high management is a manager; the relation between the enterprise and the enterprise academic papers is the holding academic papers; the relation between the enterprise and the enterprise patent information is patent information; the relation between the enterprise and the fund holdings information record is fund holdings; the relationship between the enterprise and the organization holding information records is organization holding; the relationship between the fund and the fund holdings information record is fund holdings; the relationship between the organization and the organization holdinginformation records is organization holding.
5. The method for constructing an enterprise basic information knowledge graph according to claim 2, wherein the learning process of the Markov logic network structure comprises:
obtaining a clause set;
initializing learning weight and optimal expected value; setting a flag bit equal to 0;
searching an optimal clause, if the optimal clause is empty, adding 1 to the flag bit, and continuing searching; if the optimal clause is not empty, adding the optimal clause into the Markov logic network, and calculating the optimal expectation;
judging whether the value of the flag bit is equal to 2, if so, ending, and if not, continuing to search for the optimal clause;
the optimal clauses are obtained after the clauses are connected with the predicates; the optimal expectation is an evaluation standard for judging the result of the connection between the clause and the predicate, and influences the weight of the clause finally obtained.
6. A system for constructing an enterprise basic information knowledge graph is characterized by comprising the following steps:
a data set acquisition module configured to construct a data set containing enterprise information, enterprise academic paper information, and enterprise patent information of a plurality of enterprises;
a knowledge-graph building module configured to select entities, attributes, and relationships between entities in a knowledge-graph framework in the dataset; and performing knowledge fusion on the entity, relationship and attribute set, completing the process of establishing an entity-relationship-entity or entity-attribute value triple, and completing the construction of the enterprise basic information knowledge graph.
7. The system for constructing an enterprise basic information knowledge graph according to claim 6, further comprising a knowledge inference module configured to convert triples into predicate representations using the constructed knowledge graph, and to perform knowledge inference in combination with the Markov logic network structure.
8. The system for constructing the basic information knowledge-graph of the enterprise as claimed in claim 6 or 7, wherein the knowledge-graph constructing module selects enterprises, enterprise high governance, stock holding information records, funds, institutions, enterprise academic papers and enterprise patent information as the entities of the knowledge-graph in the data set, and the stock holding information records comprise fund stock holding information records and institution stock holding information records; selecting attributes of each entity in the dataset as follows:
a. an enterprise: company name, English name, president, main stockholder, established date, main business, company brief introduction, number of employees, number of management layer persons, listed date, issued amount, issued price, trading market, contact telephone, postal code, fax, e-mail box, company website, registered address, office address;
b. enterprise high management: high administration name, high administration job, high administration salary, high administration annual salary currency unit;
c. fund holdings information record or institution holdings information record: date, holder, share of hold, share proportion, rate of change, share change, amount of change, percentage of combination;
d. enterprise academic papers: academic paper number, academic paper title, academic paper author, paper abstract, publication date;
e. enterprise patent information: patent title, patent application number, patent application date, patent publication date, patent applicant.
9. The system for constructing an enterprise basic information knowledge graph according to claim 8, wherein the relationship between different entities in the knowledge graph construction module is specifically determined as follows: the relationship between the enterprise and the enterprise high management is a manager; the relation between the enterprise and the enterprise academic papers is the holding academic papers; the relation between the enterprise and the enterprise patent information is patent information; the relation between the enterprise and the fund holdings information record is fund holdings; the relationship between the enterprise and the organization holding information records is organization holding; the relationship between the fund and the fund holdings information record is fund holdings; the relationship between the organization and the organization holdinginformation records is organization holding.
10. The system for constructing an enterprise basic information knowledge graph according to claim 7, wherein the learning process of the Markov logic network structure in the knowledge inference module is as follows:
acquiring a clause set;
initializing learning weight and optimal expected value; setting a flag bit equal to 0;
searching an optimal clause, if the optimal clause is empty, adding 1 to the flag bit, and continuing searching; if the optimal clause is not empty, adding the optimal clause into the Markov logic network, and calculating the optimal expectation;
judging whether the value of the flag bit is equal to 2, if so, ending, and if not, continuing to search for the optimal clause;
the optimal clauses are obtained after the clauses are connected with the predicates; the optimal expectation is an evaluation standard for judging the result of the connection between the clause and the predicate, and influences the weight of the clause finally obtained.
CN202210686880.7A 2022-06-17 2022-06-17 Construction method and system of enterprise basic information knowledge graph Pending CN114896423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210686880.7A CN114896423A (en) 2022-06-17 2022-06-17 Construction method and system of enterprise basic information knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210686880.7A CN114896423A (en) 2022-06-17 2022-06-17 Construction method and system of enterprise basic information knowledge graph

Publications (1)

Publication Number Publication Date
CN114896423A true CN114896423A (en) 2022-08-12

Family

ID=82729190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210686880.7A Pending CN114896423A (en) 2022-06-17 2022-06-17 Construction method and system of enterprise basic information knowledge graph

Country Status (1)

Country Link
CN (1) CN114896423A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502807A (en) * 2023-06-27 2023-07-28 北京中企慧云科技有限公司 Industrial chain analysis application method and device based on scientific and technological knowledge graph
CN116663751A (en) * 2023-07-31 2023-08-29 北京市科学技术研究院 Three-network industry map construction method and system based on future industry enterprises
CN116955639A (en) * 2023-04-24 2023-10-27 浙商期货有限公司 Method and device for constructing future industry chain knowledge graph and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361396A (en) * 2014-12-01 2015-02-18 中国矿业大学 Association rule transfer learning method based on Markov logic network
CN110717052A (en) * 2019-10-15 2020-01-21 山东大学 Environment characterization method in service robot intelligent service
CN111078828A (en) * 2019-12-24 2020-04-28 北京海致星图科技有限公司 Enterprise historical information extraction method and system
CN112131275A (en) * 2020-09-23 2020-12-25 中国科学技术大学智慧城市研究院(芜湖) Enterprise portrait construction method of holographic city big data model and knowledge graph
CN113360676A (en) * 2021-07-01 2021-09-07 上海明略人工智能(集团)有限公司 Method and device for determining potential relation of enterprise based on knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361396A (en) * 2014-12-01 2015-02-18 中国矿业大学 Association rule transfer learning method based on Markov logic network
CN110717052A (en) * 2019-10-15 2020-01-21 山东大学 Environment characterization method in service robot intelligent service
CN111078828A (en) * 2019-12-24 2020-04-28 北京海致星图科技有限公司 Enterprise historical information extraction method and system
CN112131275A (en) * 2020-09-23 2020-12-25 中国科学技术大学智慧城市研究院(芜湖) Enterprise portrait construction method of holographic city big data model and knowledge graph
CN113360676A (en) * 2021-07-01 2021-09-07 上海明略人工智能(集团)有限公司 Method and device for determining potential relation of enterprise based on knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王庆丰: "基于知识图谱的企业画像技术研究与实现" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955639A (en) * 2023-04-24 2023-10-27 浙商期货有限公司 Method and device for constructing future industry chain knowledge graph and computer equipment
CN116502807A (en) * 2023-06-27 2023-07-28 北京中企慧云科技有限公司 Industrial chain analysis application method and device based on scientific and technological knowledge graph
CN116502807B (en) * 2023-06-27 2023-09-12 北京中企慧云科技有限公司 Industrial chain analysis application method and device based on scientific and technological knowledge graph
CN116663751A (en) * 2023-07-31 2023-08-29 北京市科学技术研究院 Three-network industry map construction method and system based on future industry enterprises

Similar Documents

Publication Publication Date Title
US11222052B2 (en) Machine learning-based relationship association and related discovery and
US11386096B2 (en) Entity fingerprints
US10303999B2 (en) Machine learning-based relationship association and related discovery and search engines
Leydesdorff et al. Journal maps on the basis of Scopus data: A comparison with the Journal Citation Reports of the ISI
JP4451624B2 (en) Information system associating device and associating method
CN114896423A (en) Construction method and system of enterprise basic information knowledge graph
Ruan et al. Building and exploring an enterprise knowledge graph for investment analysis
CN102609512A (en) System and method for heterogeneous information mining and visual analysis
Irudeen et al. Big data solution for Sri Lankan development: A case study from travel and tourism
CN102360367A (en) XBRL (Extensible Business Reporting Language) data search method and search engine
CN111737421A (en) Intellectual property big data information retrieval system and storage medium
Soylu et al. TheyBuyForYou platform and knowledge graph: Expanding horizons in public procurement with open linked data
Li et al. An intelligent approach to data extraction and task identification for process mining
CN114254201A (en) Recommendation method for science and technology project review experts
Maltseva et al. iMetrics: the development of the discipline with many names
Ambite et al. Data Integration and Access: The Digital Government Research Center’s Energy Data Collection (EDC) Project
Kämpgen et al. Accepting the xbrl challenge with linked data for financial data integration
CN113127650A (en) Technical map construction method and system based on map database
Hovy et al. Data Acquisition and Integration in the DGRC's Energy Data Collection Project
Hu Analysis of enterprise financial and economic impact based on background deep learning model under business administration
Zhu Financial data analysis application via multi-strategy text processing
Wang et al. An ontology automation construction scheme for Chinese e‐government thesaurus optimizing
Liu et al. Person‐Oriented Ontologies Analysis for Digital Humanities Collections from a Metadata Crosswalk Perspective
Templ et al. The software environment R for official statistics and survey methodology
Chen et al. Construction Methods of Knowledge Mapping for Full Service Power Data Semantic Search System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220812

RJ01 Rejection of invention patent application after publication