CN111143394B

CN111143394B - Knowledge data processing method, device, medium and electronic equipment

Info

Publication number: CN111143394B
Application number: CN201911139105.4A
Authority: CN
Inventors: 杨铭; 潘申龄; 刘设伟
Original assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Current assignee: Taikang Insurance Group Co Ltd; Taikang Online Property Insurance Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2023-06-13
Anticipated expiration: 2039-11-20
Also published as: CN111143394A

Abstract

The invention provides a knowledge data processing method, a knowledge data processing device, a knowledge data processing medium and electronic equipment. The knowledge data processing method comprises the following steps: acquiring knowledge data; extracting entity information and attribute information in the knowledge data; if the attribute value of the entity in the entity information in the attribute information is not unique, determining sub-item information according to the entity information and the attribute value; determining a composite triplet according to the entity information and the sub-item information; receiving a query statement based on the structure of the composite triplet; and obtaining a query result from the database according to the query statement. The technical scheme provided by the invention has the advantages of strong data processing capability, high query speed and high processing efficiency.

Description

Knowledge data processing method, device, medium and electronic equipment

Technical Field

The present invention relates to the field of knowledge graph technologies, and in particular, to a knowledge data processing method, a knowledge data processing device, a knowledge data processing medium, and an electronic device.

Background

Nowadays, knowledge graph becomes one of the hot problems in the field of artificial intelligence, and a large number of intelligent applications such as question-answering systems, recommendation systems, search engines and the like can be generated through knowledge calculation.

The current general knowledge graph representation forms are to analyze knowledge points in the form of triples, and express the triples in the form of (entity, attribute value) to describe the relationship between the entity and the concept in the knowledge points. However, the knowledge graph representation cannot clearly represent knowledge with a complex logic structure. Particularly, when attribute values of corresponding attributes of entities are different due to interference of external factors, the current knowledge graph cannot be clearly and definitely represented.

Therefore, the existing knowledge graph can only be used for representing knowledge data with a single structure and simple logic, and knowledge with a complex logic structure cannot be clearly represented. When stored in a database, the query efficiency is low.

It should be noted that the information of the present invention in the above background section is only for enhancing the understanding of the background of the present invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The invention provides a knowledge data processing method, a knowledge data processing device, a knowledge data processing medium and electronic equipment for solving the problem of low data processing capacity.

Other features and advantages of the invention will be apparent from the following detailed description, or may be learned by the practice of the invention.

According to an aspect of the present invention, there is provided a knowledge data processing method including: obtaining knowledge data by using a crawler script or an electronic document parsing tool; extracting entity information and attribute information in the knowledge data; if the attribute value of the entity in the entity information in the attribute information is not unique, determining sub-item information according to the entity information and the attribute value; generating a composite triplet according to the entity information and the sub-item information, wherein the composite triplet comprises a main triplet (entity, containing, sub-entity) and a sub-triplet (sub-entity, sub-item value), (sub-entity, attribute value); receiving a query statement based on the structure of the composite triplet; and obtaining a query result from the database according to the query statement.

In one embodiment, the method further comprises: if the attribute value of the entity in the entity information in the attribute information is unique, determining a simple triplet according to the entity information and the attribute information; wherein the simple triplet includes the entity, an attribute of the entity in the attribute information, and the attribute value.

In one embodiment, the knowledge data includes structured data, semi-structured data, and unstructured data; the entity information and the attribute information are extracted from the structured data and the semi-structured data by using a knowledge extraction script; extracting the entity information and the attribute information from the unstructured data by a trained artificial intelligence processing model.

In one embodiment, the itemization information includes a sub-entity, an itemization, and an itemization value.

In one embodiment, the determining the sub-item information according to the entity information and the attribute value includes: taking factors influencing the relation between the attributes of the entities in the entity information as the sub-items of the entities; taking the value of the factor influencing the relation between the attributes of the entity as the subentry value of the entity; splitting the entity into sub-entities according to the sub-term values.

In one embodiment, the method further comprises: validating the composite triplet and the simple triplet; and storing the verified composite triplet and the simple triplet into a graph database.

In one embodiment, the query statement based on the structure of the composite triplet includes:

Selectresult

Where

{ is the entity involved? subItem

? subttem itemization value

? Is the subItem attribute? result }.

According to another aspect of the present invention, there is provided a knowledge data processing apparatus comprising: the data acquisition module is used for acquiring knowledge data; the extraction information module is used for extracting entity information and attribute information in the knowledge data; the item information determining module is used for determining item information according to the entity information and the attribute value if the attribute value of the entity in the entity information in the attribute information is not unique; the triplet determining module is used for determining a composite triplet according to the entity information and the sub-item information, wherein the composite triplet comprises a main triplet (entity, including sub-entity) and a sub-triplet (sub-entity, sub-item value), (sub-entity, attribute value); a query statement receiving module for receiving a query statement based on the structure of the composite triplet; and the query execution module is used for obtaining a query result from the database according to the query statement.

According to another aspect of the present invention, there is provided a computer readable medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements a knowledge data processing method as described in any of the above embodiments.

According to another aspect of the present invention, there is provided an electronic apparatus, comprising: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the knowledge data processing method as described in any one of the above embodiments.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

the invention provides a knowledge data processing method, a device, a medium and electronic equipment, which can be used for processing knowledge data with complex expression logic structure by determining the item information according to entity information and attribute value to determine the composite triplet based on the item information.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 shows a flow chart of a knowledge data processing method in an embodiment of the invention;

FIG. 2 shows a flow chart of a knowledge data processing method in an embodiment of the invention;

FIG. 3 shows a block diagram of a knowledge data processing apparatus in an embodiment of the invention;

fig. 4 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

The knowledge graph is a novel technical concept and belongs to the category of knowledge storage and knowledge representation. Knowledge representation refers to knowledge graph mainly used to describe various entities and concepts existing in the real world and their correlation. The current general knowledge graph representation forms are to analyze knowledge points in the form of triples to describe the relationship between entities and concepts in the knowledge points, and the triples are expressed in the form of (entities, attributes, attribute values). For example, the knowledge that "the country where the person is a mountain" may be expressed as (the country where the person is a mountain) where the person is an entity, the country where the person is a property, and the property value corresponding to the property of the entity. For another example, the knowledge that "the basic premium for health risk is 200 yuan and the premium is 100 ten thousand yuan" can be split into two triples (health risk, basic premium, 200 yuan) and (health risk, premium, 100 ten thousand yuan). Knowledge storage means that the knowledge graph can store data in the form of triples in the graph database, and when data is queried from the graph database, there are special query sentences, such as the special query sentences SPARQL (SPARQL Protocol and RDF Query Language) when the knowledge graph is stored in the form of RDF (Resource Description Framework). Taking (Zhang Zhen, hometown, shaoshan) as an example, when we want to query the attribute value corresponding to the 'hometown' attribute of the entity 'Zhang Zhen', the query statement is as follows:

Selectresult

Where

{ is a certain hometown of Zhang? result }

The inventors of the present application found that: the knowledge graph constructed in each current world can only process knowledge data with single structure and simple logic, and knowledge with complex logic structure cannot be clearly represented. Particularly, when attribute values of corresponding attributes of entities are different due to interference of external factors, the current knowledge graph cannot be clearly and definitely represented. For example, a knowledge of "the product of the cancer prevention risk for the aged, the guard charge of the male is 660 yuan, the guard charge of the female is 600 yuan" shows that the core point in the knowledge is (the cancer prevention risk for the aged, the guard charge, XXX) and cannot be clearly represented according to the current knowledge graph representation method.

FIG. 1 shows a flow chart of a knowledge data processing method in an embodiment of the invention.

In step S101, knowledge data is acquired.

In one embodiment, knowledge data is obtained using a crawler script or electronic document parsing tool.

In step S102, entity information and attribute information in the knowledge data are extracted.

In one embodiment, the knowledge data includes simple knowledge data that is structured data, semi-structured data, and complex knowledge data that is unstructured data. And extracting entity information and attribute information from the structured data and the semi-structured data by using a knowledge extraction script. Entity information and attribute information are extracted from unstructured data by means of human judgment.

In step S103, if the attribute value of the entity in the entity information is not unique in the attribute information, the item information is determined according to the entity information and the attribute value.

In one embodiment, the analysis information includes a sub-entity, a sub-term, and a sub-term value.

In step S104, a composite triplet is determined according to the entity information and the sub-item information, wherein the composite triplet includes a main triplet (entity, including, sub-entity) and a sub-triplet (sub-entity, sub-item value), (sub-entity, attribute value).

In one embodiment, the composite triplet is determined according to the entity, the sub-item, and the sub-item value corresponding to the entity in the entity information. The compound triplet includes an entity, a sub-item, and a sub-item value.

In step S105, a query statement based on the composite triplet structure is received.

In one embodiment, the query statement based on the composite triplet structure includes:

Selectresult

Where

{ is the entity involved? subItem

? subttem itemization value

? Is the subItem attribute? result }.

In step S106, a query result is obtained from the database according to the query statement.

According to the knowledge data processing method in the embodiment, the compound triplet with the sub-items is determined by determining the sub-item information according to the entity information and the attribute value, and the data is stored in the database in the mode of the compound triplet, so that the data can be queried based on query sentences of the compound triplet structure, the processing speed is high, and the query efficiency is high. The technical scheme can be used for expressing knowledge data with complex logic structures, and has strong data processing capability.

Fig. 2 shows a flowchart of a knowledge data processing method in an embodiment of the invention.

In step S201, knowledge data is acquired using a crawler script or an electronic document parsing tool.

In one embodiment, knowledge data is obtained by using a crawler script, specifically, text data is automatically crawled from public data websites such as Baidu encyclopedia, interactive encyclopedia, wikipedia, new wave financial and internet financial and the like by using the written crawler script. It should be clear that public network resources should include any public data available in the internet using crawler technology and are not limited to the several web sites listed above.

In one embodiment, using an electronic document parsing tool to obtain knowledge data concrete numbers refers to obtaining text data in a company's internal database. Specifically, after the address, account number and password of the database are obtained, the database is logged in, the table structure of the database is analyzed and judged, the database and the table which can obtain the triplet data are recorded and collected, the obtained text data can be analyzed by using common electronic file analysis software, and the electronic file analysis software can select common software, such as: an electronic document processor, an electronic document processing teacher, etc., the invention is not limited thereto.

In some embodiments, the knowledge data is obtained by manually collecting and sorting electronic document material. Specifically, the collection and arrangement of electronic document materials disclosed by a network and electronic document materials which can be disclosed in a company are referred to, and specific formats of the electronic document materials include, but are not limited to, word, excel, PDF, TXT and other document formats in which specific text contents can be acquired by a document analysis tool.

In step S202, entity information and attribute information are extracted from the structured data and the semi-structured data in the knowledge data using the knowledge extraction script. Specifically, the entity information includes an entity, and the attribute information includes an attribute and an attribute value of the entity.

In one embodiment, the complexity of the knowledge data is judged by the format of the acquired knowledge data, the simple knowledge data set comprising structured data and semi-structured data, the complex knowledge data comprising unstructured data. In step S201, there is mentioned a way to collect knowledge data, wherein the knowledge data in the collected database is structured data of complete rules, the collected encyclopedia knowledge data is semi-structured data of partial rules, and the various documents of the collected network public or company internal are unstructured data of plain text form without rules. The structured data and the semi-structured data are processed together by an automated tool, and unstructured data can also be processed manually. Specifically, for structured data and semi-structured data, knowledge extraction scripts are used to extract entity information and attribute information. For example, scripts are extracted from the data rules of a simple knowledge set (i.e., the database table structure of structured data and the data structure of semi-structured data) using a scripting language, such as python, written knowledge. In the knowledge extraction script, specific knowledge identities of all fields of the database table and knowledge identities of all pieces of data in the semi-structured data are set in a form of a set rule. The following describes automated knowledge extraction of structured data and semi-structured data, respectively, by way of example.

Example 1, knowledge is extracted from a database, and entities, attributes, and attribute values in the knowledge are identified. Examples of some of the fields of the structured data are shown in table 1 below.

Product name	Waiting period	Insuring path
			Millions of medical risks	For 30 days	XXX
Millions of cancer prevention risks	For 15 days	XXX
			Serious disease risk	For 15 days	XXX
Hospital insurance	For 60 days	XXX
			......	......	......

TABLE 1

Table 1 above shows an example of fields of structured data of an embodiment of the present invention.

As shown in table 1, in the process of collecting and organizing knowledge, the rule "product name" is set as an entity class, each value under the "product name" field is an entity, the "application path" and the "waiting period" are set as attributes, and the values under the "application path" and the "waiting period" are attribute values corresponding to each entity.

Example 2, knowledge is extracted from semi-structured data crawled over the internet, and entities, attributes, and attribute values in the knowledge are identified. Table 2 below is some of the semi-structured data crawled in hundred degrees encyclopedia.

TABLE 2

Table 2 above shows an example of fields of semi-structured data of an embodiment of the present invention.

As shown in Table 2, in the process of collecting and organizing the semi-structured data, the main body "Tai Kang Renshou" of the table data is set as an entity (the main body can be obtained from the URL of the website or other specific marks), and the data outside the table main body is shown as attributes in the left column and attribute values in the right column

The above example knowledge is illustrative of structured and unstructured data extraction entity information and data information and is not intended to limit the present invention.

In step S203, entity information and attribute information are extracted from the unstructured data.

In one embodiment, entity information and attribute information may be manually identified from unstructured text by knowledge management personnel, or may be identified from unstructured text by a trained artificial intelligence process model such as an artificial neural network. Example 3, for the following text, the premium calculation rules for heavy risks are as follows:

(1) the insurance age is less than or equal to 1 year old, and the month-paid premium of the product is respectively 131 yuan for men and 105 yuan for women.

(2) The insurance application age is more than 1 year old and less than or equal to 19 years old, and the month-to-month premium paid by the product is respectively male 61 yuan and female 105 yuan.

(3) The insurance application age is more than 19 years old and less than or equal to 24 years old, and the month-paid premium of the product is 80 yuan male and 74 yuan female respectively.

(4) The insurance application age is more than 24 years old and less than or equal to 29 years old, and the month-paid premium of the product is respectively 110 yuan for men and 125 yuan for women.

The above text describes the premium calculation rule of the serious illness, and the set entity, attribute and attribute value are "serious illness", "premium", "XXX element (pending)", respectively, and the attribute value of the premium is not unique.

In step S204, it is determined whether the attribute value of the entity in the entity information in the attribute information is unique. The process proceeds to step S205 when the attribute value is unique, and proceeds to step S206 when the attribute value is not unique.

In step S205, a simple triplet is determined according to the entity information and the attribute information; the simple triples include entities, attributes of the entities in the attribute information, and attribute values.

For example, the second three groups of entities, attributes and attribute values in example 1 in step S102 are:

(million medical risks, waiting period, 30 days)

(million cancer risk, waiting period, 15 days)

(serious illness, waiting period, 15 days)

(Hospital, waiting period, 60 days)

(million medical risks, insurance paths, XXX)

......

The second three groups of entities, attributes, and attribute values in example 2 are:

(Taikang life, chinese name, taikang life insurance stock Co., ltd.)

(Tai Kang Renshou, foreign name, taikang Life Insurance Co., ltd.)

(Taikang life, headquarter, beijing, china)

......

In step S206, the item information is determined based on the entity information and the attribute value. Specifically, the term information refers to a relevant factor that has an influence on an entity in the entity information.

In step S207, the sub-item information includes a sub-entity, a sub-item, and a sub-item value.

In step S208, a factor that affects the relationship between attributes of the entities in the entity information is taken as a term of the entity.

In one embodiment, the terms refer to: factors influencing attribute values corresponding to the attributes of the entity are taken as sub-items of the entity. The sub-items have corresponding sub-item values, and the attribute values corresponding to different entity attributes according to the sub-item values are also different. For example, in the knowledge that the product of the cancer prevention risk for the aged is 660 yuan for male and 600 yuan for female, the entity, the attribute and the attribute value of the product are respectively "cancer prevention risk for the aged", "premium" and "XXX yuan". The specific attribute value is influenced by the sex, and when the sex is male, the attribute value is 660 yuan; when "sex" is "female", the attribute value is "600 yuan". Here, we can use "sex" as the sub-term, and when the sub-term is different in value, the attribute value corresponding to the entity attribute is also different.

In step S209, the value of the factor that affects the relationship between the entity attributes is taken as the score value of the entity.

In one embodiment, the term value refers to a specific result of the term, and the term value of the term corresponding to one entity attribute is not less than two. As an example of step S208, the term value of the term "sex" is "male" or "female", and when the term takes different term values, the attribute values of the corresponding entity attributes are different.

In step S210, the entity is split into sub-entities according to the itemized values.

In one embodiment, the sub-entity refers to splitting the entity according to the item value of the item so that the attribute value of the corresponding attribute is unique. The fruiting body is an intermediate concept and is not a real entity, and belongs to the original entity. As an example in step S208, the entity "risk for preventing cancer for elderly people" may be split according to the term values "male" and "female" of the term "gender", into the fruiting body 1 and the fruiting body 2.

In step S211, determining a composite triplet according to the entity information and the sub-item information; the composite triplet includes the entities, sub-items, and sub-item values of the sub-items in the entity information.

In one embodiment, the triplet representation of complex knowledge involving the sub-items may first split the entity into multiple sub-entities according to the sub-items, generate a triplet of the main triplet (entity, including, sub-entity), and then the sub-entity may take the attribute of the original entity itself and all associated sub-items as attributes, and the sub-item value uniquely determined by the original entity and the sub-item value associated with the sub-item as attribute values. As an example in step S208, the composite triplet representation may be expressed as follows:

(senior cancer prevention risk, comprising, fruiting body 1)

(fruit body 1, sex, man)

(fruit body 1, premium, 660 yuan)

(cancer prevention risk for the elderly, comprising, fruiting body 2)

(fruit body 2, sex, female)

(fruit body 2, premium, 600 yuan)

When we want to implement a query with the function of "premium for elderly cancer-preventive men", a query statement similar to the following can be used:

Selectresult

Where

{ is the risk of preventing cancer in elderly? subItem

? subItem sex male

? Is the subItem premium? result }

For another example, example 3 in step S203, the composite triplet is expressed as:

(serious disease risk, include, fruiting body 1)

(fruit body 1, sex, male)

(fruit body 1, age, 0-1 year old)

(fruit body 1, premium, 131 yuan)

(serious disease risk, include, fruiting body 2)

(fruit body 2, sex, female)

(fruit body 2, age, 0-1 year old)

(fruit body 2, premium, 105 yuan)

(serious disease risk, include, fruiting body 3)

(fruit body 3, sex, male)

(fruit body 3, age, 1-19 years)

(fruit body 3, premium, 61 yuan)

......

In step S212, the composite triplet and the simple triplet are audited. Specifically, knowledge management staff can audit the accuracy of the established composite triples and simple triples, such as the triples with missing complete information or the triples with inaccurate expression.

In step S213, the audited composite triplet and simple triplet are stored in the graph database.

In one embodiment, the composite triples and simple triples are converted to standard formats acceptable to the graph database and then stored in the graph database. The commonly used graph databases include Neo4j, jena, etc., each of which stores text data in a different acceptable standard format, such as CSV format as the most commonly accepted standard format for Neo4j and RDF format as the most commonly accepted standard format for jena. The RDF (resource description framework) format is a W3C standard for describing network resources, and uses URIs (Uniform resource identifiers) to identify elements. The CSV file can be directly exported by means of a database management tool (such as Navicat and the like); if it is to be converted into RDF file, it can be realized by means of D2RQ, etc.

According to the knowledge data processing method in the embodiment, the item information is determined through the entity information and the attribute value, and the composite triplet with the item is determined.

Based on the knowledge data processing method in the above embodiment, the sub-item information plays a key role in the actual process of processing the knowledge data, and is exemplified as follows.

The premium of a general insurance product may be greatly differentiated according to the age, sex, social security record, etc. of the applicant, for example, a specific premium of a certain insurance product may be priced according to three attributes of the age, sex, and presence or absence of social security of the applicant, and the pricing is as shown in table 3 below.

TABLE 3 Table 3

Table 3 above shows a premium pricing schedule for certain insurance products.

According to the general data form of the knowledge graph, for the knowledge that "the premium of a certain insurance product is the XXX element" in this example, the stored triplet form is (certain insurance product, premium, XXX element), and in fact, the premium of a certain insurance product is changed according to the relevant information of the age, sex and the existence of social security of the applicant. Factors influencing the relation among the entity attributes are taken as the sub-items of the entity, the values of the factors influencing the entity are taken as the sub-item values, the entity is split into sub-entities according to the sub-items and the sub-item values in a first-level manner, and finally, the attribute value of the sub-entity in the entity attribute information is unique (namely, is not influenced by any factors any more) as a result. Then, in the above example, the age, sex, and presence or absence of social security are all the sub-items of the entity "certain insurance product" (in terms of attribute "premium") and the result of splitting into sub-entities is shown in table 4 below.

TABLE 4 Table 4

Table 4 above shows the results of premium tear down molecular entities for certain insurance products.

The triplet of knowledge that the final consolidated "premium for a certain insurance product is XXX" is as follows:

(some insurance product, contain, fruit body 1)

(fruit body 1, age, 18-26 years)

(fruit body 1, sex, man)

(fruit body 1, presence or absence of social security)

(fruit body 1, premium, 2400 yuan)

(some insurance product comprising, fruiting body 2)

(fruit body 2, age, 18-26 years)

(fruit body 2, sex, man)

(fruit body 2, presence or absence of social security)

(fruit body 2, premium, 2800 yuan)

......

Based on the knowledge data processing method in the above embodiment, the knowledge data with complex logic structure can be expressed as a triplet.

Fig. 3 shows a block diagram of a knowledge data processing apparatus in an embodiment of the invention.

Referring to fig. 3, the knowledge data processing apparatus 300 includes: the system comprises a data acquisition module 301, an extraction information module 302, a sub-item information module 303 and a triplet determination module 304.

The data acquisition module 301 is configured to acquire knowledge data.

The extraction information module 302 is configured to extract entity information and attribute information in the knowledge data.

The item information module 303 is configured to determine item information according to the entity information and the attribute value if the attribute value of the entity in the entity information in the attribute information is not unique.

The triplet determining module 304 is configured to determine a composite triplet according to the entity information and the sub-item information, where the composite triplet includes a main triplet (entity, including sub-entity) and a sub-triplet (sub-entity, sub-item value), (sub-entity, attribute value).

A query term receiving module 305, configured to receive a query term based on the composite triplet structure.

And the query execution module 306 is configured to obtain a query result from the database according to the query statement.

The knowledge data processing apparatus 300 in the above embodiment determines the complex triplet with the sub-term according to the entity information and the attribute value, and stores the data in the database in such a complex triplet manner, so that the data can be queried based on the query statement of the complex triplet structure, and the processing speed is high and the query efficiency is high. The technical scheme can be used for expressing knowledge data with complex logic structures, and has strong data processing capability.

Since each functional module of the information processing apparatus according to the exemplary embodiment of the present invention corresponds to a step of the exemplary embodiment of the knowledge data processing method, for details not disclosed in the apparatus embodiment of the present invention, please refer to the embodiment of the knowledge data processing method according to the present invention.

Referring now to FIG. 4, there is illustrated a schematic diagram of a computer system 400 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system 400 of the electronic device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of embodiments of the invention.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU) 401, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the system operation are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output portion 407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 408 including a hard disk or the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. The drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 410 as needed, so that a computer program read therefrom is installed into the storage section 408 as needed.

In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 409 and/or installed from the removable medium 411. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 401.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the knowledge data processing method as described in the above embodiment.

For example, the electronic device may implement the method as shown in fig. 1: in step S101, knowledge data is acquired. In step S102, entity information and attribute information in the knowledge data are extracted. In step S103, if the attribute value of the entity in the entity information is not unique in the attribute information, the item information is determined according to the entity information and the attribute value. In step S104, a composite triplet is determined according to the entity information and the sub-item information.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A knowledge data processing method, comprising:

obtaining knowledge data by using a crawler script or an electronic document parsing tool;

extracting entity information and attribute information in the knowledge data;

if the attribute value of the entity in the entity information in the attribute information is not unique, determining sub-item information according to the entity information and the attribute value, wherein the sub-item information comprises a sub-entity, a sub-item and a sub-item value, and the determining the sub-item information according to the entity information and the attribute value comprises: taking factors influencing the relation between the attributes of the entities in the entity information as the sub-items of the entities; taking the value of the factor influencing the relation between the attributes of the entity as the subentry value of the entity; splitting the entity into sub-entities according to the sub-term values;

generating a composite triplet according to the entity information and the sub-item information, wherein the composite triplet comprises a main triplet (entity, containing, sub-entity) and a sub-triplet (sub-entity, sub-item value), (sub-entity, attribute value);

receiving a query statement based on the structure of the composite triplet;

and obtaining a query result from the database according to the query statement.

2. The knowledge data processing method according to claim 1, characterized by further comprising:

if the attribute value of the entity in the entity information in the attribute information is unique, determining a simple triplet according to the entity information and the attribute information; wherein the simple triplet includes the entity, an attribute of the entity in the attribute information, and the attribute value.

3. The knowledge data processing method according to claim 1, wherein the knowledge data includes structured data, semi-structured data, and unstructured data; the entity information and the attribute information are extracted from the structured data and the semi-structured data by using a knowledge extraction script; extracting the entity information and the attribute information from the unstructured data by a trained artificial intelligence processing model.

4. The knowledge data processing method according to claim 2, characterized by further comprising:

validating the composite triplet and the simple triplet;

and storing the verified composite triplet and the simple triplet into a graph database.

5. The knowledge data processing method of claim 1, wherein query statements based on the structure of the composite triplet comprise:

Select ？result

Where

{ is the entity involved? subItem

? subttem itemization value

? Is the subItem attribute? result }.

6. A knowledge data processing apparatus, comprising:

the data acquisition module is used for acquiring knowledge data;

the extraction information module is used for extracting entity information and attribute information in the knowledge data;

the term information determining module is configured to determine term information according to the entity information and the attribute value if the attribute value of the entity in the entity information in the attribute information is not unique, where the term information includes a sub-entity, a term and a term value, and the determining term information according to the entity information and the attribute value includes: taking factors influencing the relation between the attributes of the entities in the entity information as the sub-items of the entities; taking the value of the factor influencing the relation between the attributes of the entity as the subentry value of the entity; splitting the entity into sub-entities according to the sub-term values;

the triplet determining module is used for determining a composite triplet according to the entity information and the sub-item information, wherein the composite triplet comprises a main triplet (entity, including sub-entity) and a sub-triplet (sub-entity, sub-item value), (sub-entity, attribute value);

a query statement receiving module for receiving a query statement based on the structure of the composite triplet;

and the query execution module is used for obtaining a query result from the database according to the query statement.

7. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the knowledge data processing method according to any one of claims 1 to 5.

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the knowledge data processing method of any of claims 1 to 5.