CN112115271A - Knowledge graph construction method and device - Google Patents

Knowledge graph construction method and device Download PDF

Info

Publication number
CN112115271A
CN112115271A CN202010484904.1A CN202010484904A CN112115271A CN 112115271 A CN112115271 A CN 112115271A CN 202010484904 A CN202010484904 A CN 202010484904A CN 112115271 A CN112115271 A CN 112115271A
Authority
CN
China
Prior art keywords
data
attribute
knowledge
triples
factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010484904.1A
Other languages
Chinese (zh)
Other versions
CN112115271B (en
Inventor
杨铭
刘设伟
陈利琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN202010484904.1A priority Critical patent/CN112115271B/en
Publication of CN112115271A publication Critical patent/CN112115271A/en
Application granted granted Critical
Publication of CN112115271B publication Critical patent/CN112115271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph construction method and a knowledge graph construction device, wherein the method comprises the following steps: acquiring knowledge data and a corresponding complexity label, wherein the complexity label comprises: an unstructured data tag; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. The invention can disassemble and comb the complex logic structure and clearly express knowledge data.

Description

Knowledge graph construction method and device
Technical Field
The invention relates to the technical field of knowledge representation and knowledge storage, in particular to a knowledge graph construction method and a knowledge graph construction device.
Background
The knowledge graph is a relatively novel technical concept and belongs to the field of knowledge representation and knowledge storage. The knowledge representation refers to a knowledge graph which is mainly used for describing various entities and concepts existing in the real world and related relations among the entities and the concepts. The commonly used knowledge graph representation form at present is a form of a triple to describe the relationship between an entity and a concept in a knowledge point, and the triple can be expressed in a form of (entity, attribute value). For example, the knowledge that "lie is" guan ' an person in Sichuan "can be expressed as (lie, hometown, guang ' an), where" lie "is an entity," hometown "is an attribute, and" guang ' an attribute value corresponding to the attribute of the entity; as another example, knowledge that "the underlying premium for a health risk is 200 dollars, and the premium reaches 100 ten thousand dollars" can be split into two triplets (health risk, underlying premium, 200 dollars) and (health risk, premium, 100 ten thousand dollars). Knowledge storage means that a knowledge graph can store data in a graph database in a triple mode, and when data are inquired and obtained from the graph database, special inquiry sentences exist.
The existing knowledge graph can only describe knowledge data with a single structure and simple logic, and can not clearly express knowledge with a complex logic structure, especially when attribute values of attributes corresponding to entities are interfered by other factors and are not fixed, the existing knowledge graph can not clearly and definitely express the knowledge.
Disclosure of Invention
The embodiment of the invention provides a knowledge graph construction method, which is used for constructing a knowledge graph and clearly representing knowledge data, and comprises the following steps:
acquiring knowledge data and a corresponding complexity label, wherein the complexity label comprises: an unstructured data tag;
if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data;
and constructing a known map represented by the triplets according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triplets, the attribute data is assigned to a second position of the triplets, and a third position of the triplets is assigned according to the attribute factor data and the corresponding attribute values.
The embodiment of the invention provides a knowledge graph construction device, which is used for constructing a knowledge graph and clearly representing knowledge data, and comprises the following components:
an acquisition module for acquiring knowledge data and corresponding complexity tags, the complexity tags comprising: an unstructured data tag;
the extraction module is used for extracting entity data, attribute factor data and corresponding attribute values from the knowledge data if the complexity label corresponding to the knowledge data is an unstructured data label;
and the construction module is used for constructing the knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method for constructing the knowledge graph is implemented.
An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program for executing the above-mentioned knowledge graph construction method is stored.
According to the embodiment of the invention, by acquiring knowledge data and a corresponding complexity label, the complexity label comprises: an unstructured data tag; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. According to the embodiment of the invention, the attribute factor data is added when the knowledge graph is constructed, the complex logic structure is disassembled and carded, and then the knowledge graph represented by the triples is constructed according to the entity data, the attribute factor data and the corresponding attribute values, so that the knowledge data can be clearly represented.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a schematic diagram of a knowledge graph construction method in an embodiment of the invention;
FIG. 2 is a schematic diagram of a complexity label in a knowledge graph construction method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating attribute factor data in a knowledge graph construction method according to an embodiment of the present invention;
FIG. 4 is a diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are described in further detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention and not to limit the present invention.
First, terms referred to in the embodiments of the present application are described:
entity data: refers to data that is distinguishable and independent of something. For example, "yaoming", "no future will be recognized", and "cancer prevention for the elderly" can be used as the entity data.
Attribute data: attribute data refers to the value of an attribute that an entity points to, usually as a property, relationship, description, interpretation, etc. of the entity. For example, the height of Yaoming is 226cm, the waiting period of senile cancer prevention risk is 30 days, and the height and the waiting period can be used as attribute data of two entity data of Yaoming and senile cancer prevention risk respectively.
Attribute values: an attribute value refers to a value of an entity data object specifying attribute data. The attribute value may be a simple text or a numerical value without an extended attribute, for example, "the territorial area of china is 960 ten thousand square kilometers", the attribute value of the attributive data "territorial area" corresponding to the entity data "china" is "960 ten thousand square kilometers", and "960 ten thousand square kilometers" is a text and is not generally used as entity data; the attribute value may be another entity data, for example, "the father of the house ancestor name is dragon," the attribute value of the entity data "the house ancestor name" attribute data "father" is dragon, "and" dragon, "which may be a single entity data, has its own associated attribute.
Attribute factor name and attribute factor value: the attribute factor name refers to a factor that affects an attribute value corresponding to an entity attribute, and may be one or more. Each attribute factor name corresponds to a different value, namely an attribute factor value, and when the attribute factors take different values, the attribute values corresponding to the entity attributes are different. For example, in the product of "cancer prevention for old people", the insurance premium for male is 660 yuan, and the insurance premium for female is 600 yuan ", and the physical data, attribute data and attribute values thereof should be" cancer prevention for old people "," insurance premium "and" XXX yuan ", respectively. Obviously, the specific attribute value is influenced by the factor of gender, the gender is the name of the attribute factor, and when the values of gender are different, that is, the values of attribute factors are different, the attribute values corresponding to the entity attributes are also different. When "gender" is "male", the attribute value is "660-tuple"; when "gender" is "female" gender, the attribute value is "600-tuple".
Triple representation: and performing knowledge representation on entities, attributes, attribute values and the like obtained from the knowledge data according to a certain form.
In order to construct a knowledge graph and clearly represent knowledge data, an embodiment of the present invention provides a knowledge graph construction method, as shown in fig. 1, which may include:
step 101, acquiring knowledge data and a corresponding complexity label, wherein the complexity label comprises: an unstructured data tag;
102, if the complexity tag corresponding to the knowledge data is an unstructured data tag, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data;
and 103, constructing a knowledge graph represented by the ternary group according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the ternary group, the attribute data is assigned with a second position of the ternary group, and a third position of the ternary group is assigned according to the attribute factor data and the corresponding attribute values.
As shown in fig. 1, in the embodiment of the present invention, by acquiring knowledge data and a corresponding complexity tag, the complexity tag includes: an unstructured data tag; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. According to the embodiment of the invention, the attribute factor data is added when the knowledge graph is constructed, the complex logic structure is disassembled and carded, and then the knowledge graph represented by the triples is constructed according to the entity data, the attribute factor data and the corresponding attribute values, so that the knowledge data can be clearly represented.
The inventor finds that the knowledge source is numerous and complex for the insurance industry, and unlike other industries such as medical treatment, securities, banking industry and the like, the insurance industry only has a few standard concepts and structured data, and more unstructured text data, so that a construction method of a knowledge graph needs to be designed by combining the characteristics of the insurance industry.
In specific implementation, knowledge data and a corresponding complexity tag are obtained, wherein the complexity tag comprises: an unstructured data tag.
In an embodiment, insurance knowledge data and corresponding complexity tags may be obtained, the complexity tags including: unstructured data tags. Knowledge data may be acquired as follows: and acquiring knowledge data from the data resource website by using a crawler technology. The insurance knowledge data can be acquired from an insurance data resource website by using a crawler technology.
In this embodiment, crawlers are used to automatically crawl data from data resource websites, such as public network resources, and written crawler scripts are used to automatically crawl text data from public data websites such as encyclopedia, interactive encyclopedia, wiki encyclopedia, new and unrestrained financial, cyber-financial, and the like. Public network resources shall include any public text data in the internet that can be accessed using crawler technology and are not limited to the few websites listed above.
In an embodiment, knowledge data may be acquired as follows: knowledge data is obtained from an electronic document and/or a text database. Wherein insurance knowledge data can be obtained from insurance industry electronic documents and/or text databases.
In this embodiment, knowledge data may be acquired from an electronic document, electronic document data may be collected and collated manually, electronic document data that is published by a management network and electronic document data that can be published inside a company may be collected, and specific formats of the electronic document data include, but are not limited to, Word, Excel, PDF, TXT, and other document formats in which specific text content may be acquired by a document analysis tool.
In this embodiment, the knowledge data may be acquired from a text database, and the text data in the internal text database of the company may be acquired. The insurance knowledge data can be obtained from the insurance industry text database, and the text data in the insurance company internal text database can be obtained. Specifically, after the address, the account number and the password of the text database are obtained, the database is logged in, the table structure of the database is analyzed and judged, and the database and the table which can obtain the ternary-unit data are recorded and collected for later use.
In an embodiment, as shown in fig. 2, the complexity label may include three types, namely an unstructured data label, a structured data label and a semi-structured data label, to mark the complexity of the knowledge data. The complexity of the knowledge data is judged by the acquired knowledge format, the collected complete regular data is regarded as structured data, the collected partial regular data such as encyclopedic knowledge data and the like is regarded as semi-structured data, and the collected data which is disclosed by a network or has no rules in a plain text form such as various document materials inside a company and the like is regarded as unstructured data. Given the three types of complexity labels, the structured data label and the semi-structured data label represent simple knowledge data, and the unstructured data label represents complex knowledge data. In the embodiment of the invention, aiming at the characteristic that unstructured text data in the insurance industry is more, the complexity of the complexity label distinguishing data is utilized, and insurance knowledge data with different complexities are respectively processed, so that a construction method of a knowledge graph more suitable for the insurance industry is designed.
In specific implementation, if the complexity tag corresponding to the knowledge data is an unstructured data tag, entity data, attribute factor data and a corresponding attribute value are extracted from the knowledge data.
In the embodiment, if the complexity tag corresponding to the insurance knowledge data is an unstructured data tag, entity data, attribute factor data and a corresponding attribute value are extracted from the insurance knowledge data.
In specific implementation, a knowledge graph represented by a triplet is constructed according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triplet, the attribute data is assigned to a second position of the triplet, and a third position of the triplet is assigned according to the attribute factor data and the corresponding attribute values.
In an embodiment, a knowledge graph represented by triples is constructed according to entity data, attribute factor data and corresponding attribute values extracted from insurance knowledge data.
In an embodiment, as shown in fig. 3, the attribute factor data includes: an attribute factor name and an attribute factor value; according to the entity data, the attribute factor data and the corresponding attribute values, constructing a knowledge graph represented by the triples, wherein the knowledge graph comprises the following steps: and constructing the knowledge graph represented by the triples according to the entity data, the attribute factor names, the attribute factor values and the corresponding attribute values.
In an embodiment, constructing a knowledge graph represented by a triplet according to the entity data, the attribute factor data, and the corresponding attribute values includes: assigning the entity data to a first location of a triplet; assigning the attribute data to a second location of the triplet; and assigning a value to the third position of the triple according to the attribute factor data and the corresponding attribute value.
In an embodiment, assigning a value to the third position of the triple according to the attribute factor data and the corresponding attribute value includes: generating a key-value pair list according to the attribute factor data and the corresponding attribute values; assigning the list of key-value pairs to a third position of the triple. In this embodiment, the third position of the triple is assigned by the key-value pair list, so that the data analysis speed can be effectively increased, and the time complexity can be reduced.
Taking the following triples as an example, the assignment of the third position of the triples according to the attribute factor data and the corresponding attribute values may be expressed as:
(certain insurance product, premium, [ { age: 18-26 years, sex: male, social security present: present, attribute value: 2400 Yuan },
{ age: 18-26 years old, sex: male, social security: none, attribute value: 2800 elements },
{ age: 18-26 years old, sex: woman, with or without social security: there are, attribute values: 2200 yuan },
{ age: 18-26 years old, sex: woman, with or without social security: none, attribute value: 2560 yuan },
{ age: 27-35 years old, sex: male, social security: there are, attribute values: 2600 yuan },
{ age: 27-35 years old, sex: male, social security: none, attribute value: 3000 yuan (E) },
......])。
in the automatic question-answering system, if it is desired to know the premium of a certain insurance product on a social-insurance 25-year-old female, the entity and attribute in the above question, i.e., "certain insurance product" and "premium" can be known first, and the attribute factor data and attribute value can be known through the following query statement:
Select?result
Where
{ age of health insurance application? result }
The result of the query is:
[ { age: 18-26 years old, sex: male, social security: there are, attribute values: 2400 yuan, the next step is executed,
{ age: 18-26 years old, sex: male, social security: none, attribute value: 2800 elements },
{ age: 18-26 years old, sex: woman, with or without social security: there are, attribute values: 2200 yuan },
{ age: 18-26 years old, sex: woman, with or without social security: none, attribute value: 2560 yuan },
{ age: 27-35 years old, sex: male, social security: there are, attribute values: 2600 yuan },
{ age: 27-35 years old, sex: male, social security: none, attribute value: 3000 yuan (E) },
......]。
in an embodiment, assigning a value to the third position of the triple according to the attribute factor data and the corresponding attribute value includes: generating a nested list according to the attribute factor data and the corresponding attribute values; assigning the nested list to a third position of the triplet.
Taking the following triples as an example, the assignment of the third position of the triples according to the attribute factor data and the corresponding attribute values may be expressed as: [ [ age, sex, social security ], [18-26 years, male, none, 2800 yuan ], [18-26 years, present, none, 2200 yuan ], [18-26 years, female, none, 2560 yuan ] ].
In an embodiment, generating a nested list according to the attribute factor data and the corresponding attribute value includes: assigning the attribute factor name to a first list; assigning the attribute factor values and corresponding attribute values to a second list; and generating a nested list according to the first list and the second list.
In an embodiment, the method for constructing a knowledge graph further comprises: storing the knowledge-graph in a structured database. For example, the structured database may be MySQL.
In an embodiment, the method for constructing a knowledge graph further comprises: and converting the knowledge graph in the structured database into RDF data, and storing the RDF data into a graph database. Knowledge maps in a structured database can be converted into RDF data by means of tools such as D2RQ and stored in a map database jena.
For example, defining (health risk, insurance age, 100 years old) a triple that needs to be binned into a jena database, the triple can be converted into binned sparql statements by writing an automated script, as follows:
INSERT DATA
{
the health insurance application age is within 100 years.
}
Through the sparql statement, the data represented by the triple is stored in the jena database, and the following sparql statement can be used when a data result of 'the insurance age of health insurance' is required to be queried next time:
Select?result
Where
{ age of health insurance application? result }
The result can be queried to be "within 100" through the above spark ql statement.
In an embodiment, the method for constructing a knowledge graph further comprises: exporting the knowledge graph into a CSV file, and storing the CSV file into a graph database. The knowledge graph may be exported as a CSV file using a database management tool (e.g., Navicat, etc.), and the CSV file may be stored directly via the graph database neo4 j.
It should be noted that a graph database refers to a type of database, and graph theory is applied to store relationship information between entities. Commonly used graph databases include neo4j, jena and the like, and the standard format accepted when each graph database is stored is different, for example, the most accepted standard format for neo4j is text data in CSV format, and the most accepted standard format for jena is text data in RDF format. The CSV file format, i.e. the character separation value text format, is a very common file format. The RDF data (resource description framework) format is the W3C standard for describing web resources, RDF uses URIs (uniform resource identifiers) to identify elements.
It should be noted that entity data, attribute values, and triple representations are all general concepts of the knowledge graph, while attribute factor data (including attribute factor names and attribute factor values) is unique in the present invention, and the specific representation method of triples is also unique in the present invention. When data is sorted into triples and stored in a knowledge graph, a complex problem needing to be known by some expected users is considered to be manually sorted by the concept of an attribute factor. When a knowledge point is converted into a triple, if the attribute values of the entity are influenced by other factors to be different, the influence factors are used as attribute factors of the entity attribute, and the corresponding unique attribute value under the condition of the value of each attribute factor is determined according to the value of each attribute factor.
The following provides a specific embodiment, which illustrates a specific application of the method for constructing a knowledge graph in the case that the complexity label is an unstructured data label in the embodiment of the present invention.
The first embodiment: the complete three-element representation of the knowledge that the insurance premium for the elderly is 660 yuan for men and 600 yuan for women can be expressed as follows: (age prevention risk, premium, [ { gender: male, attribute value: 660 yuan } ], { gender: female, attribute value: 600 yuan } ]), in the above example, attribute values are listed in the form of a list together with their associated attribute factors.
The second embodiment: the specific explanation will be given by taking a knowledge point of "premium of a certain risk" in an actual situation as an example. The premium of a general insurance product is greatly differentiated according to the age, sex, social security record and the like of the applicant, for example, the specific premium of a certain insurance product is priced according to the three attribute factors of the age, sex and social security of the applicant, and the pricing is as shown in the following table 1.
TABLE 1
Figure BDA0002518792110000081
Figure BDA0002518792110000091
In the general data form of the knowledge map, for the knowledge that "premium of a certain insurance product is XXX" in this example, the triple form of (certain insurance product, premium, XXX) is stored, and actually, the premium of a certain insurance product varies depending on the information on the age, sex, and social security of the applicant. Here, the concept of "attribute factor" proposed by the present invention is used to individually process the case that the attribute value is affected by the association of the entity attribute and some factors in such triples. Factors influencing the attribute values of the entity attributes are used as attribute factors, the names of the attribute factors are the age, the gender and the social security, and the attribute values of the attributes are different when the attribute factors are different, and are listed in the following table 2.
TABLE 2
All attribute factors and their values Properties Attribute value
Age: 18-26 years old, sex: male, social security: is provided with Premium fee 2400 Yuan
Age: 18-26 years old, sex: male, social security: is free of Premium fee 2800 yuan
Age: 18-26 years old, sex: woman, with or without social security: is provided with Premium fee 2200 yuan
Age: 18-26 years old, sex: woman, with or without social security: is free of Premium fee 2560 Yuan
Age: 27-35 years old, sex: male, social security: is provided with Premium fee 2600 yuan
Age (age): 27-35 years old, sex: male, social security: is free of Premium fee 3000 yuan
…… …… ……
The final collated triplet of this knowledge is as follows:
(certain insurance product, premium, [ { age: 18-26 years, sex: male, social security present: present, attribute value: 2400 Yuan },
{ age: 18-26 years old, sex: male, social security: none, attribute value: 2800 elements },
{ age: 18-26 years old, sex: woman, with or without social security: there are, attribute values: 2200 yuan },
{ age: 18-26 years old, sex: woman, with or without social security: none, attribute value: 2560 yuan },
{ age: 27-35 years old, sex: male, social security: there are, attribute values: 2600 yuan },
{ age: 27-35 years old, sex: male, social security: none, attribute value: 3000 yuan (E) },
......])。
the third embodiment: the premium calculation rule for a severe risk is as follows:
the insurance premium of the product is 131 yuan for men and 105 yuan for women respectively when the insurance age is less than or equal to 1 year old.
The insurance age is more than 1 year old and less than or equal to 19 years old, and the monthly payment insurance fees of the product are 61 yuan for male and 105 yuan for female respectively.
Thirdly, the insurance age is more than 19 years old and less than or equal to 24 years old, and the monthly payment insurance fees of the product are respectively 80 yuan for men and 74 yuan for women.
Fourthly, the insurance age is more than 24 years old and less than or equal to 29 years old, and the monthly payment insurance fees of the product are 110 yuan for male and 125 yuan for female.
Described in the above text is a premium calculation rule for severe insurance, setting entity data, attribute data, and attribute values as "severe insurance", "premium", "XXX element (pending)", setting the insurable age and sex (male and female in the above) repeatedly emphasized in the calculation rule as attribute factor names, and uniquely determining the attribute value of attribute data of premium according to the attribute factor values of the insurable age and sex, then the following triples can be extracted from the above text:
(Risk of stress, premium, [ { sex: Male, age: 0-1 year, attribute value: 131 Yuan },
{ sex: female, age: 0-1 year old, attribute value: 105 yuan },
{ sex: male, age: 1-19 years old, attribute value: 61 yuan },
{ sex: male, age: 1-19 years old, attribute value: 105 yuan },
......])。
in an embodiment, the complexity tag further comprises: structured data tags and semi-structured data tags;
if the complexity labels corresponding to the knowledge data are structured data labels and semi-structured data labels, extracting entity data, attribute data and corresponding attribute values from the knowledge data;
and constructing a knowledge graph according to the entity data, the attribute data, the corresponding attribute values and a predefined triple representation model.
In this embodiment, the entity data, the attribute data and the corresponding attribute values may be extracted from the knowledge data by using a knowledge extraction tool, where the knowledge extraction tool refers to a knowledge extraction script written in a scripting language (such as python) according to data rules of the knowledge data (i.e., a database table structure of structured data and a data structure of semi-structured data). In the knowledge extraction script, the specific knowledge identity of each field of the database table and the knowledge identity of each piece of data in the semi-structured data are set in a preset rule form.
Specific embodiments are given below to illustrate specific applications of the method for constructing a knowledge graph in the case where the complexity labels are structured data labels and semi-structured data labels in the embodiments of the present invention.
The fourth embodiment: and extracting knowledge from the database, and identifying entity data, attribute data and attribute values in the knowledge. The following table 3 is the data of several fields of a table in the database.
TABLE 3
Name of the product Waiting period Insuring path
Million medical risks 30 days XXX
Million cancer prevention risks 15 days XXX
Severe disease risk 15 days XXX
Hospitalization care 60 days XXX
...... ...... ......
In the process of collecting and organizing knowledge, a rule "product name" is set as an entity class, each value under a field of the "product name" is entity data, an "insurance path" and a "waiting period" are set as attribute data, and values under the two fields of the "insurance path" and the "waiting period" are attribute values corresponding to each entity data, so after the processing by an automatic extraction tool, the entity data, the attribute data and the attribute values which can be extracted by the above table include:
(million medical risks, waiting period, 30 days);
(million cancer prevention risk, waiting period, 15 days);
(critical illness, waiting period, 15 days);
(hospitalization, waiting period, 60 days);
(million medical insurance, insurance path, XXX).
The fifth embodiment: and extracting knowledge from the semi-structured data crawled on the Internet, and identifying entity data, attribute data and attribute values in the knowledge. Table 4 below shows semi-structured data crawled for the hundred degree encyclopedia.
TABLE 4
Figure BDA0002518792110000111
Figure BDA0002518792110000121
In the process of collecting and sorting these semi-structured data, the main body "XX" of the table data is set as entity data (the main body can be obtained from the URL of the website or other specific mark), the left column of the data other than the table main body is attribute data, and the right column is attribute value, and the entity data, attribute value that can be extracted from the above table include:
(XX, Chinese name, XX Co., Ltd.);
(XX, foreign name, XX co., Ltd.);
(XX, headquarters, Beijing, China).
Based on the same inventive concept, the embodiment of the present invention further provides a knowledge graph constructing apparatus, as described in the following embodiments. Because the principles of solving the problems are similar to the knowledge graph construction method, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
Fig. 4 is a block diagram of a knowledge graph constructing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain knowledge data and a corresponding complexity label, where the complexity label includes: an unstructured data tag;
an extracting module 402, configured to extract entity data, attribute factor data, and a corresponding attribute value from the knowledge data if the complexity tag corresponding to the knowledge data is an unstructured data tag;
a building module 403, configured to build a knowledge graph represented by the triplets according to the entity data, the attribute factor data, and the corresponding attribute values, where the entity data is assigned to a first position of the triplets, the attribute data is assigned to a second position of the triplets, and a third position of the triplets is assigned according to the attribute factor data and the corresponding attribute values.
In summary, in the embodiments of the present invention, by acquiring knowledge data and a corresponding complexity label, the complexity label includes: an unstructured data tag; if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data; and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned to a first position of the triples, the attribute data is assigned to a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values. The embodiment of the invention adds the attribute factor data when constructing the knowledge graph, disassembles and combs the complex logic structure, and then constructs the knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, thereby realizing clear representation of the knowledge data.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A knowledge graph construction method is characterized by comprising the following steps:
acquiring knowledge data and a corresponding complexity label, wherein the complexity label comprises: an unstructured data tag;
if the complexity label corresponding to the knowledge data is an unstructured data label, extracting entity data, attribute factor data and a corresponding attribute value from the knowledge data;
and constructing a knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
2. The method of knowledge-graph construction according to claim 1 wherein the attribute factor data comprises: an attribute factor name and an attribute factor value;
according to the entity data, the attribute factor data and the corresponding attribute values, constructing a knowledge graph represented by the triples, wherein the knowledge graph comprises the following steps: and constructing the knowledge graph represented by the triples according to the entity data, the attribute factor names, the attribute factor values and the corresponding attribute values.
3. The method of knowledge-graph construction according to claim 1 wherein assigning a value to a third location of a triple based on the attribute factor data and corresponding attribute values comprises:
generating a key-value pair list according to the attribute factor data and the corresponding attribute values;
assigning the list of key-value pairs to a third position of the triple.
4. The method of knowledge-graph construction according to claim 1 wherein assigning a value to a third location of a triple based on the attribute factor data and corresponding attribute values comprises:
generating a nested list according to the attribute factor data and the corresponding attribute values;
assigning a third position of the triple to the nested list.
5. The method of knowledge-graph construction according to claim 4 wherein generating a nested list from the attribute factor data and corresponding attribute values comprises:
assigning the attribute factor name to a first list;
assigning the attribute factor values and corresponding attribute values to a second list;
and generating a nested list according to the first list and the second list.
6. The method of knowledge-graph construction of claim 1 wherein the complexity labels further comprise: structured data tags and semi-structured data tags;
if the complexity labels corresponding to the knowledge data are structured data labels and semi-structured data labels, extracting entity data, attribute data and corresponding attribute values from the knowledge data;
and constructing a knowledge graph represented by the triples according to the entity data, the attribute data and the corresponding attribute values.
7. The method of knowledge-graph construction according to claim 1, further comprising:
exporting the knowledge graph into a CSV file, and storing the CSV file into a graph database.
8. A knowledge-graph building apparatus, comprising:
an acquisition module for acquiring knowledge data and corresponding complexity tags, the complexity tags comprising: an unstructured data tag;
the extraction module is used for extracting entity data, attribute factor data and corresponding attribute values from the knowledge data if the complexity label corresponding to the knowledge data is an unstructured data label;
and the construction module is used for constructing the knowledge graph represented by the triples according to the entity data, the attribute factor data and the corresponding attribute values, wherein the entity data is assigned with a first position of the triples, the attribute data is assigned with a second position of the triples, and a third position of the triples is assigned according to the attribute factor data and the corresponding attribute values.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 7.
CN202010484904.1A 2020-06-01 2020-06-01 Knowledge graph construction method and device Active CN112115271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010484904.1A CN112115271B (en) 2020-06-01 2020-06-01 Knowledge graph construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010484904.1A CN112115271B (en) 2020-06-01 2020-06-01 Knowledge graph construction method and device

Publications (2)

Publication Number Publication Date
CN112115271A true CN112115271A (en) 2020-12-22
CN112115271B CN112115271B (en) 2024-05-03

Family

ID=73799230

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010484904.1A Active CN112115271B (en) 2020-06-01 2020-06-01 Knowledge graph construction method and device

Country Status (1)

Country Link
CN (1) CN112115271B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113238865A (en) * 2021-05-18 2021-08-10 苏明 Method for quickly constructing knowledge graph based on Excel one-key import

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof
US20190019088A1 (en) * 2017-07-14 2019-01-17 Guangdong Shenma Search Technology Co., Ltd. Knowledge graph construction method and device
CN109344262A (en) * 2018-10-31 2019-02-15 百度在线网络技术(北京)有限公司 Architectonic method for building up, device and storage medium
US20190294732A1 (en) * 2018-03-22 2019-09-26 Adobe Inc. Constructing enterprise-specific knowledge graphs
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN111061841A (en) * 2019-12-19 2020-04-24 京东方科技集团股份有限公司 Knowledge graph construction method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183869A (en) * 2015-09-16 2015-12-23 分众(中国)信息技术有限公司 Building knowledge mapping database and construction method thereof
US20190019088A1 (en) * 2017-07-14 2019-01-17 Guangdong Shenma Search Technology Co., Ltd. Knowledge graph construction method and device
US20190294732A1 (en) * 2018-03-22 2019-09-26 Adobe Inc. Constructing enterprise-specific knowledge graphs
CN109344262A (en) * 2018-10-31 2019-02-15 百度在线网络技术(北京)有限公司 Architectonic method for building up, device and storage medium
CN110287334A (en) * 2019-06-13 2019-09-27 淮阴工学院 A kind of school's domain knowledge map construction method based on Entity recognition and attribute extraction model
CN111061841A (en) * 2019-12-19 2020-04-24 京东方科技集团股份有限公司 Knowledge graph construction method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113238865A (en) * 2021-05-18 2021-08-10 苏明 Method for quickly constructing knowledge graph based on Excel one-key import

Also Published As

Publication number Publication date
CN112115271B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
Lemercier et al. Quantitative methods in the humanities: an introduction
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
NL2012438B1 (en) Resolving similar entities from a database.
CN111639190A (en) Medical knowledge map construction method
US11550856B2 (en) Artificial intelligence for product data extraction
CN106649223A (en) Financial report automatic generation method based on natural language processing
CN112182246B (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
CN113342976B (en) Method, device, storage medium and equipment for automatically acquiring and processing data
CN111339427A (en) Book information recommendation method, device and system and storage medium
CN107958406A (en) Inquire about acquisition methods, device and the terminal of data
CN113449046A (en) Model training method, system and related device based on enterprise knowledge graph
EP3408802A1 (en) Annotation system for extracting attributes from electronic data structures
CN113706251A (en) Commodity recommendation method and device based on model, computer equipment and storage medium
CN111143394B (en) Knowledge data processing method, device, medium and electronic equipment
US20130132289A1 (en) Oil and gas interest tracking system
Schürer et al. Standardising and coding birthplace strings and occupational titles in the British censuses of 1851 to 1911
Bicevskis et al. Data quality evaluation: a comparative analysis of company registers' open data in four European countries.
CN112115271A (en) Knowledge graph construction method and device
CN113420018A (en) User behavior data analysis method, device, equipment and storage medium
WO2016119508A1 (en) Method for recognizing large-scale objects based on spark system
CN111444368A (en) Method and device for constructing user portrait, computer equipment and storage medium
CN113642291B (en) Method, system, storage medium and terminal for constructing logical structure tree reported by listed companies
CN113706252A (en) Product recommendation method and device, electronic equipment and storage medium
JP6804913B2 (en) Table structure estimation system and method
CN108647298A (en) A kind of processing system of Radix Notoginseng data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant