CN112115271B - Knowledge graph construction method and device - Google Patents

Knowledge graph construction method and device Download PDF

Info

Publication number
CN112115271B
CN112115271B CN202010484904.1A CN202010484904A CN112115271B CN 112115271 B CN112115271 B CN 112115271B CN 202010484904 A CN202010484904 A CN 202010484904A CN 112115271 B CN112115271 B CN 112115271B
Authority
CN
China
Prior art keywords
data
attribute
knowledge
values
factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010484904.1A
Other languages
Chinese (zh)
Other versions
CN112115271A (en
Inventor
杨铭
刘设伟
陈利琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN202010484904.1A priority Critical patent/CN112115271B/en
Publication of CN112115271A publication Critical patent/CN112115271A/en
Application granted granted Critical
Publication of CN112115271B publication Critical patent/CN112115271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph construction method and a knowledge graph construction device, wherein the method comprises the following steps: acquiring knowledge data and a corresponding complexity label, the complexity label comprising: unstructured data labels; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. The invention can disassemble and comb the complex logic structure to clearly represent knowledge data.

Description

Knowledge graph construction method and device
Technical Field
The invention relates to the technical field of knowledge representation and knowledge storage, in particular to a knowledge graph construction method and device.
Background
The knowledge graph is a novel technical concept and belongs to the category of knowledge representation and knowledge storage. Knowledge representation refers to knowledge graph mainly used to describe various entities and concepts existing in the real world and their correlation. The knowledge graph representation commonly used at the present stage is to describe the relationship between entities and concepts in knowledge points in the form of triples, and the triples can be expressed in the form of (entities, attributes and attribute values). For example, a knowledge that "Li Mou is a Sichuan Guangan person" may be expressed as (Li Mou, hometown, guangan), where "Li Mou" is an entity, "hometown" is an attribute, and "Guangan" is an attribute value corresponding to the attribute of the entity; for another example, the knowledge that "the basic premium for health risk is 200 yuan and the premium is 100 ten thousand yuan" can be split into two triples (health risk, basic premium, 200 yuan) and (health risk, premium, 100 ten thousand yuan). Knowledge storage means that the knowledge graph can store data in a graph database in the form of triples, and when the data is obtained from the graph database by query, a special query statement exists.
The existing knowledge graph can only describe knowledge data with single structure and simple logic, and cannot clearly represent knowledge with complex logic structure, especially when attribute values of corresponding attributes of entities are interfered by other factors and are not fixed, the existing knowledge graph cannot clearly and definitely represent the knowledge data.
Disclosure of Invention
The embodiment of the invention provides a knowledge graph construction method, which is used for constructing a knowledge graph and clearly representing knowledge data, and comprises the following steps:
Acquiring knowledge data and a corresponding complexity label, the complexity label comprising: unstructured data labels;
If the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data;
And constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
The embodiment of the invention provides a knowledge graph construction device, which is used for constructing a knowledge graph and clearly representing knowledge data, and comprises the following components:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring knowledge data and a corresponding complexity label, and the complexity label comprises: unstructured data labels;
the extraction module is used for extracting entity data, attribute factor data and corresponding attribute values from the knowledge data if the complexity label corresponding to the knowledge data is an unstructured data label;
and the construction module is used for constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the knowledge graph construction method when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the knowledge graph construction method.
The embodiment of the invention obtains knowledge data and a corresponding complexity label, wherein the complexity label comprises the following components: unstructured data labels; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. According to the embodiment of the invention, the attribute factor data is added when the knowledge graph is constructed, the complex logic structure is disassembled and carded, and then the knowledge graph represented by the triples is constructed according to the entity data, the attribute factor data and the corresponding attribute values, so that the knowledge data is clearly represented.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a schematic diagram of a knowledge graph construction method in an embodiment of the invention;
FIG. 2 is a schematic diagram of a complexity tag in a knowledge graph construction method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of attribute factor data in the knowledge graph construction method according to the embodiment of the invention;
fig. 4 is a diagram illustrating a knowledge graph construction apparatus according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
First, the nouns involved in the embodiments of the present application will be described:
entity data: refers to data of something that is distinguishable and exists independently. For example, "Yao Mou", "no thief in the world", "cancer risk for the elderly", etc. can be used as the entity data.
Attribute data: typically as properties, relationships, descriptions, interpretations, etc. of an entity, attribute data points from one entity to its attribute value. For example, the height of Yao Mou is 226cm, the waiting period of the senile cancer prevention risk is 30 days, and the height and the waiting period can be respectively used as attribute data of two entity data of Yao Mou and senile cancer prevention risk.
Attribute value: the attribute value refers to a value of entity data object specifying attribute data. The attribute value may be a simple text or a numerical value without extending attribute, for example, "the territorial area of china is 960 ten thousand square kilometers", the attribute value of the attribute data "territorial area" corresponding to the entity data "china" is 960 ten thousand square kilometers, and "960 ten thousand square kilometers" is a text, which is not generally used as an entity data; the attribute value may also be another entity data, for example, "a father of a house is a" member ", and the attribute value of the attribute data" father "of the entity data" house is a "member", which may be a separate entity data having an attribute related to itself.
Attribute factor name and attribute factor value: the attribute factor name refers to a factor that affects an attribute value corresponding to an entity attribute, and may be one or more. Each attribute factor name corresponds to a different value, namely, the attribute factor value, and when the attribute factors take different values, the attribute values corresponding to the entity attributes are different. For example, in the knowledge that "the product is cancer prevention risk for the elderly", the guard fee of the male is 660 yuan, and the guard fee of the female is 600 yuan ", the actual data, the attribute data and the attribute value should be" cancer prevention risk for the elderly "," guard fee "and" XXX yuan ", respectively. Obviously, the specific attribute value is influenced by the factor of 'gender', namely the name of the attribute factor, and when the 'gender' is different in value, namely the attribute factor value is different, the attribute value corresponding to the entity attribute is also different. When "sex" is "male", the attribute value is "660 yuan"; when "sex" is "female", the attribute value is "600 yuan".
The triplet represents: the knowledge data is obtained from the entity, attribute value and the like, and knowledge is represented in a certain form.
In order to construct a knowledge graph and clearly represent knowledge data, an embodiment of the present invention provides a knowledge graph construction method, as shown in fig. 1, where the method may include:
Step 101, acquiring knowledge data and a corresponding complexity label, wherein the complexity label comprises: unstructured data labels;
102, if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data;
And 103, constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
As can be seen from fig. 1, the embodiment of the present invention obtains knowledge data and a corresponding complexity label, where the complexity label includes: unstructured data labels; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. According to the embodiment of the invention, the attribute factor data is added when the knowledge graph is constructed, the complex logic structure is disassembled and carded, and then the knowledge graph represented by the triples is constructed according to the entity data, the attribute factor data and the corresponding attribute values, so that the knowledge data is clearly represented.
The inventor finds that for the insurance industry, the knowledge source is quite complex, unlike other industries such as medical treatment, securities, banking industry and the like, the insurance industry only has a small amount of standard concepts and structured data, and more is unstructured text data, so that a knowledge map construction method is necessary to be designed by combining the characteristics of the insurance industry.
In specific implementation, knowledge data and a corresponding complexity label are acquired, wherein the complexity label comprises: unstructured data labels.
In an embodiment, insurance knowledge data and corresponding complexity tags may be obtained, the complexity tags comprising: unstructured data labels. Knowledge data may be obtained as follows: knowledge data is obtained from the data resource website by utilizing a crawler technology. Wherein, the crawler technology can be utilized to acquire insurance knowledge data from the insurance data resource website.
In this embodiment, the crawler technology is used to automatically crawl data from a data resource website, such as a public network resource, and the written crawler script is used to automatically crawl text data from a public data website such as hundred degrees encyclopedia, interaction encyclopedia, wiki encyclopedia, new wave financial resources, internet easy financial resources, and the like. The public network resources should include any public data available in the internet using crawler technology and are not limited to the several web sites listed above.
In an embodiment, knowledge data may be obtained as follows: knowledge data is obtained from an electronic document and/or text database. Wherein the insurance knowledge data may be obtained from insurance industry electronic documents and/or text databases.
In this embodiment, knowledge data may be obtained from an electronic document, and the electronic document data may be collected and sorted manually, and electronic document data published by a sorting network and electronic document data publicable inside a company may be collected, where specific formats of the electronic document data include, but are not limited to, word, excel, PDF, TXT, etc. document formats in which specific text contents may be obtained by a document parsing tool.
In this embodiment, knowledge data may be obtained from a text database, and text data in a text database inside a company may be obtained. The insurance knowledge data can be obtained from an insurance industry text database, and text data in an insurance company internal text database can be obtained. Specifically, after the address, account number and password of the text database are obtained, the database is logged in, the table structure of the database is analyzed and judged, and the database and the table capable of obtaining the triplet data are recorded and collected for later use.
In an embodiment, as shown in fig. 2, the complexity label may include three kinds of unstructured data labels, structured data labels and semi-structured data labels, to mark the complexity of the knowledge data. The complexity of the knowledge data is judged by the acquired knowledge format, the collected complete rule data is regarded as structured data, the collected encyclopedia knowledge data and other part of rule data are regarded as semi-structured data, and the collected network public or various document materials inside a company and other plain text data without rules are regarded as unstructured data. Given the three types of complexity tags, structured data tags and semi-structured data tags represent simple knowledge data, and unstructured data tags represent complex knowledge data. In the embodiment of the invention, aiming at the characteristic that unstructured text data in the insurance industry is more, the complexity of the data is distinguished by using the complexity label, and the insurance knowledge data with different complexity are respectively processed, so that a construction method of the knowledge graph more suitable for the insurance industry is designed.
In implementation, if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data.
In an embodiment, if the complexity label corresponding to the insurance knowledge data is an unstructured data label, entity data, attribute factor data and corresponding attribute values are extracted from the insurance knowledge data.
And in the specific implementation, constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
In an embodiment, a knowledge graph represented by a triplet is constructed according to entity data, attribute factor data and corresponding attribute values extracted from insurance knowledge data.
In an embodiment, as shown in fig. 3, the attribute factor data includes: attribute factor name and attribute factor value; constructing a knowledge graph represented by a triplet according to the entity data, the attribute factor data and the corresponding attribute values, wherein the knowledge graph comprises: and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor name, the attribute factor value and the corresponding attribute value.
In an embodiment, constructing a knowledge graph represented by a triplet according to the entity data, the attribute factor data and the corresponding attribute values, includes: assigning the entity data to a first position of a triplet; assigning the attribute data to a second position of the triplet; and assigning a third position of the triplet according to the attribute factor data and the corresponding attribute value.
In an embodiment, assigning the third location of the triplet according to the attribute factor data and the corresponding attribute value includes: generating a key value pair list according to the attribute factor data and the corresponding attribute values; and assigning a third position of the triplet to the key value pair list. In this embodiment, the third position of the triplet assigned to the list by the key value can effectively improve the data analysis speed and reduce the time complexity.
Taking the following triples as an example, the assignment of the third position of the triples according to the attribute factor data and the corresponding attribute values can be expressed as:
(insurance product, premium, [ { age: 18-26 years old, sex: male, social security, attribute value: 2400 yuan },
{ Age: 18-26 years old, sex: male, presence or absence of social security: none, attribute value: 2800-bits of the number },
{ Age: 18-26 years old, sex: female, there is social security: there are attribute values: 2200-element },
{ Age: 18-26 years old, sex: female, there is social security: none, attribute value: 2560-ary },
{ Age: 27-35 years old, sex: male, presence or absence of social security: there are attribute values: 2600-bit },
{ Age: 27-35 years old, sex: male, presence or absence of social security: none, attribute value: 3000 yuan },
......])。
If the method is applied to an automatic question-answering system, if the user wants to know what the premium of a certain insurance product is for a social protected 25-year-old female, the user can first know the entity and the attribute in the above problem, namely, the user can know the attribute factor data and the attribute value through the following query statement:
Selectresult
Where
{ health risk applying age? result }
The results of the query are:
[ { age: 18-26 years old, sex: male, presence or absence of social security: there are attribute values: 2400-ary },
{ Age: 18-26 years old, sex: male, presence or absence of social security: none, attribute value: 2800-bits of the number },
{ Age: 18-26 years old, sex: female, there is social security: there are attribute values: 2200-element },
{ Age: 18-26 years old, sex: female, there is social security: none, attribute value: 2560-ary },
{ Age: 27-35 years old, sex: male, presence or absence of social security: there are attribute values: 2600-bit },
{ Age: 27-35 years old, sex: male, presence or absence of social security: none, attribute value: 3000 yuan },
......]。
In an embodiment, assigning the third location of the triplet according to the attribute factor data and the corresponding attribute value includes: generating a nested list according to the attribute factor data and the corresponding attribute values; and assigning a third position of the triplet to the nested list.
Taking the following triples as an example, the assignment of the third position of the triples according to the attribute factor data and the corresponding attribute values can be expressed as: [ [ age, sex, presence or absence of social security ], [18-26 years old, male, no, 2800 yuan ], [18-26 years old, presence or absence, 2200 yuan ], [18-26 years old, female, no, 2560 yuan ] ].
In an embodiment, generating a nested list according to the attribute factor data and the corresponding attribute value includes: assigning the attribute factor name to a first list; assigning the attribute factor value and the corresponding attribute value to a second list; and generating a nested list according to the first list and the second list.
In an embodiment, the knowledge graph construction method further includes: and storing the knowledge graph into a structured database. For example, the structured database may be MySQL.
In an embodiment, the knowledge graph construction method further includes: and converting the knowledge graph in the structured database into RDF data, and storing the RDF data into a graph database. The knowledge graph in the structured database can be converted into RDF data by means of D2RQ and other tools, and the RDF data are stored in the graph database jena.
For example, defining a triplet (health risk, insurance age, within 100 years) that needs to be entered into jena diagram database, the triplet can be converted into a entered sparql statement by writing an automated script, as follows:
INSERT DATA
{
Health risks are covered by less than 100 years of life.
}
With the sparql statement above, the data represented by the triplet is stored in jena database, and when the next time it is desired to query the data result of "health insurance applied age", the following sparql statement can be used:
Selectresult
Where
{ health risk applying age? result }
The result can be queried to be within 100 years old by the sparql statement.
In an embodiment, the knowledge graph construction method further includes: and exporting the knowledge graph into a CSV file, and storing the CSV file into a graph database. Knowledge maps can be exported as CSV files by using a database management tool (such as Navicat, etc.), and the CSV files can be directly stored through a graph database neo4 j.
It should be noted that the graph database refers to a type of database, and applies graph theory to store relationship information between entities. The commonly used graph databases include neo4j, jena, etc., each of which stores text data in a different acceptable standard format, such as text data in CSV format as the most commonly accepted standard format for neo4j and RDF format as the most commonly accepted standard format for jena. The CSV file format, character separator value text format, is a very common file format. The RDF data (resource description framework) format is a W3C standard for describing network resources, and RDF uses URIs (Uniform resource identifiers) to identify elements.
It should be noted that, entity data, attribute values, and triple representation are all general concepts of the knowledge graph, and attribute factor data (including attribute factor names and attribute factor values) are unique in the present invention, and a specific representation method of a triple is also unique in the present invention. When data are organized into triples and stored in a knowledge graph, the complex problems which are required to be known by some expected users are considered to be manually organized by the concept of attribute factors. When converting a knowledge point into a triplet, if the attribute value of an entity is influenced by other factors, the influence factors are used as attribute factors of the attribute of the entity, and the unique attribute value corresponding to the condition that each attribute factor takes value is determined according to the value of each attribute factor.
Specific embodiments are given below to illustrate specific applications of the knowledge graph construction method in the case that the complexity label is an unstructured data label in the embodiment of the present invention.
First embodiment: the complete triplet expression of the knowledge that the cancer prevention risk for the old is that the male's premium is 660 yuan and the female's premium is 600 yuan can be expressed as follows: (cancer risk for elderly people, premium, [ { sex: male, attribute value: 660 yuan }, { sex: female, attribute value: 600 yuan }) ], in the above example, the attribute values are listed in a list together with their associated attribute factors.
Second embodiment: the specific explanation will be given taking the knowledge point of "a dangerous fee" in actual situations as an example. The premium of a general insurance product may be greatly differentiated according to the age, sex, social security record, etc. of the applicant, for example, a specific premium of a certain insurance product may be priced according to three attribute factors of the age, sex, and presence or absence of social security of the applicant, and the pricing is as shown in table 1 below.
TABLE 1
According to the general data form of the knowledge graph, for the knowledge that "the premium of a certain insurance product is the XXX element" in this example, the stored triplet form is (certain insurance product, premium, XXX element), and in fact, the premium of a certain insurance product is changed according to the relevant information of the age, sex and the existence of social security of the applicant. Here, the concept of "attribute factor" proposed by the present invention is used to treat the condition that the attribute value is affected by the association of the entity attribute and some factors in such triples separately. The factors affecting the attribute values of the entity attributes are taken as attribute factors, the age, sex and social security are all attribute factor names, and the attribute values of the corresponding attributes are different when the attribute factor values are different, and the values are listed in the following table 2.
TABLE 2
All attribute factors and their values Attributes of Attribute value
Age: 18-26 years old, sex: male, presence or absence of social security: has the following components Premium for premium 2400 Yuan
Age: 18-26 years old, sex: male, presence or absence of social security: without any means for Premium for premium 2800 Yuan
Age: 18-26 years old, sex: female, there is social security: has the following components Premium for premium 2200 Yuan
Age: 18-26 years old, sex: female, there is social security: without any means for Premium for premium 2560 Yuan
Age: 27-35 years old, sex: male, presence or absence of social security: has the following components Premium for premium 2600 Yuan
Age: 27-35 years old, sex: male, presence or absence of social security: without any means for Premium for premium 3000 Yuan
…… …… ……
The triplet of this knowledge that is finally consolidated is as follows:
(insurance product, premium, [ { age: 18-26 years old, sex: male, social security, attribute value: 2400 yuan },
{ Age: 18-26 years old, sex: male, presence or absence of social security: none, attribute value: 2800-bits of the number },
{ Age: 18-26 years old, sex: female, there is social security: there are attribute values: 2200-element },
{ Age: 18-26 years old, sex: female, there is social security: none, attribute value: 2560-ary },
{ Age: 27-35 years old, sex: male, presence or absence of social security: there are attribute values: 2600-bit },
{ Age: 27-35 years old, sex: male, presence or absence of social security: none, attribute value: 3000 yuan },
......])。
Third embodiment: the premium calculation rule for serious illness is as follows:
① The insurance age is less than or equal to 1 year old, and the month-paid premium of the product is respectively 131 yuan for men and 105 yuan for women.
② The insurance application age is more than 1 year old and less than or equal to 19 years old, and the month-to-month premium paid by the product is respectively male 61 yuan and female 105 yuan.
③ The insurance application age is more than 19 years old and less than or equal to 24 years old, and the month-paid premium of the product is 80 yuan male and 74 yuan female respectively.
④ The insurance application age is more than 24 years old and less than or equal to 29 years old, and the month-paid premium of the product is respectively 110 yuan for men and 125 yuan for women.
Described in the above text is a premium calculation rule for serious illness, entity data, attribute data, and attribute values are respectively "serious illness," "premium," "XXX element (pending)", and the repeatedly emphasized applied age and sex (i.e., male and female above) in the calculation rule are set as attribute factor names, and the attribute value of the attribute data of premium can be uniquely determined according to the attribute factor values of applied age and sex, so that the following triplets can be extracted from the above text:
(serious illness, premium, [ { sex: male, age 0-1 year old, attribute value: 131 yuan },
{ Sex: female, age: 0-1 year old, attribute value: 105-element },
{ Sex: male, age: 1-19 years old, attribute value: 61-element },
{ Sex: male, age: 1-19 years old, attribute value: 105-element },
......])。
In an embodiment, the complexity label further comprises: structured data tags and semi-structured data tags;
If the complexity label corresponding to the knowledge data is a structured data label and a semi-structured data label, extracting entity data, attribute data and corresponding attribute values from the knowledge data;
and constructing a knowledge graph according to the entity data, the attribute data, the corresponding attribute values and a predefined triplet representation model.
In this embodiment, the knowledge extraction tool may be used to extract entity data, attribute data and corresponding attribute values from the knowledge data, where the knowledge extraction tool refers to a knowledge extraction script written in a scripting language (such as python) according to the data rules of the knowledge data (i.e. the database table structure of the structured data and the data structure of the semi-structured data). In the knowledge extraction script, specific knowledge identities of all fields of the database table and knowledge identities of all pieces of data in the semi-structured data are set in a form of a set rule.
Specific embodiments are given below to illustrate specific applications of the knowledge graph construction method in the case that the complexity labels are structured data labels and semi-structured data labels in the embodiments of the present invention.
Fourth embodiment: the knowledge is extracted from the database, and entity data, attribute data and attribute values in the knowledge are identified. Table 3 below is the data for several fields of a table in the database.
TABLE 3 Table 3
Product name Waiting period Insuring path
Millions of medical risks For 30 days XXX
Millions of cancer prevention risks For 15 days XXX
Serious disease risk For 15 days XXX
Hospital insurance For 60 days XXX
...... ...... ......
In the process of collecting and arranging knowledge, setting a rule of 'product name' as an entity class, setting each value under the 'product name' field as entity data, setting 'application path' and 'waiting period' as attribute data, and setting the values under the 'application path' and 'waiting period' as attribute values corresponding to each entity data, wherein after being processed by an automatic extraction tool, the entity data, the attribute data and the attribute values which can be extracted from the table comprise:
(million medical risks, waiting period, 30 days);
(million cancer risk, waiting period, 15 days);
(heavy risk, waiting period, 15 days);
(hospitalization, waiting period, 60 days);
(million medical risks, insurance paths, XXX).
Fifth embodiment: and extracting knowledge from the semi-structured data crawled on the Internet, and identifying entity data, attribute data and attribute values in the knowledge. Table 4 below is the semi-structured data of the hundred degrees encyclopedia crawl.
TABLE 4 Table 4
In the process of collecting and organizing the semi-structured data, the main body "XX" of the table data is set as entity data (the main body can be obtained from a website URL or other specific marks), and among the data outside the table main body, the left column is attribute data, the right column is attribute value, and the entity data, attribute data and attribute value which can be extracted from the above table include:
(XX, chinese name, XX Co., ltd.);
(XX, foreign name, XX co., ltd.);
(XX, headquarter, beijing, china).
Based on the same inventive concept, the embodiment of the invention also provides a knowledge graph construction device, as described in the following embodiment. Because the principles of solving the problems are similar to those of the knowledge graph construction method, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.
Fig. 4 is a block diagram of a knowledge graph construction apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain knowledge data and a corresponding complexity label, where the complexity label includes: unstructured data labels;
an extraction module 402, configured to extract entity data, attribute factor data and corresponding attribute values from the knowledge data if the complexity label corresponding to the knowledge data is an unstructured data label;
And a construction module 403, configured to construct a knowledge graph represented by a triplet according to the entity data, the attribute factor data and the corresponding attribute values, where the entity data is assigned to a first position of the triplet, the attribute data is assigned to a second position of the triplet, and the third position of the triplet is assigned according to the attribute factor data and the corresponding attribute values.
In summary, the embodiment of the present invention obtains knowledge data and a corresponding complexity label, where the complexity label includes: unstructured data labels; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. According to the embodiment of the invention, the attribute factor data is added when the knowledge graph is constructed, the complex logic structure is disassembled and carded, and then the knowledge graph represented by the triples is constructed according to the entity data, the attribute factor data and the corresponding attribute values, so that the knowledge data is clearly represented.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. The knowledge graph construction method is characterized by comprising the following steps of:
Acquiring knowledge data and a corresponding complexity label, the complexity label comprising: unstructured data labels;
If the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and corresponding attribute values from the knowledge data;
Constructing a knowledge graph represented by a triplet according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triplet, the attribute data is assigned to a second position of the triplet, and a third position of the triplet is assigned according to the attribute factor data and the corresponding attribute values;
Acquiring knowledge data and corresponding complexity labels, comprising: acquiring insurance knowledge data and corresponding complexity labels; acquiring insurance knowledge data from an insurance data resource website by utilizing a crawler technology;
Further comprises:
Exporting the knowledge graph into a CSV file, and storing the CSV file into a graph database;
The attribute factor names refer to factors influencing attribute values corresponding to the entity attributes, each attribute factor name corresponds to different values, namely, attribute factor values, and when the attribute factors take different values, the attribute values corresponding to the entity attributes are different.
2. The knowledge graph construction method of claim 1, wherein the attribute factor data includes: attribute factor name and attribute factor value;
Constructing a knowledge graph represented by a triplet according to the entity data, the attribute factor data and the corresponding attribute values, wherein the knowledge graph comprises: and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor name, the attribute factor value and the corresponding attribute value.
3. The knowledge graph construction method of claim 1, wherein assigning a third location of a triplet based on the attribute factor data and the corresponding attribute value comprises:
generating a key value pair list according to the attribute factor data and the corresponding attribute values;
And assigning a third position of the triplet to the key value pair list.
4. The knowledge graph construction method of claim 1, wherein assigning a third location of a triplet based on the attribute factor data and the corresponding attribute value comprises:
Generating a nested list according to the attribute factor data and the corresponding attribute values;
And assigning a third position of the triplet to the nested list.
5. The knowledge graph construction method of claim 4 wherein generating a nested list from the attribute factor data and corresponding attribute values comprises:
assigning the attribute factor name to a first list;
Assigning the attribute factor value and the corresponding attribute value to a second list;
And generating a nested list according to the first list and the second list.
6. The knowledge-graph construction method of claim 1, wherein the complexity label further comprises: structured data tags and semi-structured data tags;
If the complexity label corresponding to the knowledge data is a structured data label and a semi-structured data label, extracting entity data, attribute data and corresponding attribute values from the knowledge data;
And constructing a knowledge graph represented by the triplet according to the entity data, the attribute data and the corresponding attribute values.
7. The knowledge graph construction device is characterized by comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring knowledge data and a corresponding complexity label, and the complexity label comprises: unstructured data labels;
the extraction module is used for extracting entity data, attribute factor data and corresponding attribute values from the knowledge data if the complexity label corresponding to the knowledge data is an unstructured data label;
The construction module is used for constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and the third position of the triples is assigned according to the attribute factor data and the corresponding attribute values;
Acquiring knowledge data and corresponding complexity labels, comprising: acquiring insurance knowledge data and corresponding complexity labels; acquiring insurance knowledge data from an insurance data resource website by utilizing a crawler technology;
the construction module is also used for exporting the knowledge graph into a CSV file and storing the CSV file into a graph database;
The attribute factor names refer to factors influencing attribute values corresponding to the entity attributes, each attribute factor name corresponds to different values, namely, attribute factor values, and when the attribute factors take different values, the attribute values corresponding to the entity attributes are different.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program for executing the method of any one of claims 1 to 6.
CN202010484904.1A 2020-06-01 2020-06-01 Knowledge graph construction method and device Active CN112115271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010484904.1A CN112115271B (en) 2020-06-01 2020-06-01 Knowledge graph construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010484904.1A CN112115271B (en) 2020-06-01 2020-06-01 Knowledge graph construction method and device

Publications (2)

Publication Number Publication Date
CN112115271A CN112115271A (en) 2020-12-22
CN112115271B true CN112115271B (en) 2024-05-03

Family

ID=73799230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010484904.1A Active CN112115271B (en) 2020-06-01 2020-06-01 Knowledge graph construction method and device

Country Status (1)

Country Link
CN (1) CN112115271B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113238865A (en) * 2021-05-18 2021-08-10 苏明 Method for quickly constructing knowledge graph based on Excel one-key import

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof
CN109344262A (en) * 2018-10-31 2019-02-15 百度在线网络技术(北京)有限公司 Architectonic method for building up, device and storage medium
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN111061841A (en) * 2019-12-19 2020-04-24 京东方科技集团股份有限公司 Knowledge graph construction method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268581A (en) * 2017-07-14 2018-07-10 广东神马搜索科技有限公司 The construction method and device of knowledge mapping
US10915577B2 (en) * 2018-03-22 2021-02-09 Adobe Inc. Constructing enterprise-specific knowledge graphs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof
CN109344262A (en) * 2018-10-31 2019-02-15 百度在线网络技术(北京)有限公司 Architectonic method for building up, device and storage medium
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN111061841A (en) * 2019-12-19 2020-04-24 京东方科技集团股份有限公司 Knowledge graph construction method and device

Also Published As

Publication number Publication date
CN112115271A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
US7849049B2 (en) Schema and ETL tools for structured and unstructured data
EP1899855B1 (en) System and method of making unstructured data available to structured data analysis tools
WO2019200752A1 (en) Semantic understanding-based point of interest query method, device and computing apparatus
CN102314519B (en) Information searching method based on public security domain knowledge ontology model
US7672971B2 (en) Modular architecture for entity normalization
CN112182246B (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
CN102667776B (en) Method and system for processing information of a stream of information
US20070011183A1 (en) Analysis and transformation tools for structured and unstructured data
US11550856B2 (en) Artificial intelligence for product data extraction
CN103914478A (en) Webpage training method and system and webpage prediction method and system
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
Boella et al. Eunomos, a legal document and knowledge management system to build legal services
Chou et al. Integrating XBRL data with textual information in Chinese: A semantic web approach
CN111143394B (en) Knowledge data processing method, device, medium and electronic equipment
Zuccala et al. Metric assessments of books as families of works
US20130132289A1 (en) Oil and gas interest tracking system
CN112115271B (en) Knowledge graph construction method and device
Vavpetič et al. Semantic data mining of financial news articles
Bicevskis et al. Data quality evaluation: a comparative analysis of company registers' open data in four European countries.
Durga et al. Automatic detection of illegitimate websites with mutual clustering
Dalcin et al. Data quality assessment at the Rio de Janeiro Botanical Garden Herbarium Database and considerations for data quality improvement
CN113065332B (en) Text processing method, device, equipment and storage medium based on reading model
El-Beltagy et al. An approach for mining accumulated crop cultivation problems and their solutions
Patil et al. Parsing of HTML document
Zervanou et al. Documenting social unrest: Detecting strikes in historical daily newspapers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant