CN109766445B

CN109766445B - Knowledge graph construction method and data processing device

Info

Publication number: CN109766445B
Application number: CN201811530489.8A
Authority: CN
Inventors: 张昊; 张力锋; 郑毅
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2024-03-26
Anticipated expiration: 2038-12-13
Also published as: CN109766445A

Abstract

The application discloses a knowledge graph construction method, a data processing device and a computer readable storage medium, which are applied to the field of knowledge relation construction, wherein the method comprises the following steps: acquiring a graphic database containing entity definitions, entity relation definitions and attribute definitions and a medical treatment record of a user; extracting entities, entity relationships, attributes and attribute values pointed by the attributes in the medical treatment records; identifying entity definitions of entities, entity relationship definitions of entity relationships, and attribute definitions of attributes in the medical treatment record; and filling the entity, entity relation, attribute and attribute value in the medical treatment record into a graphic database to obtain a knowledge graph. According to the method and the system, the knowledge graph is established, complex and various association relations between data in the medical treatment records can be extracted and stored in the form of the graph database, so that a medical professional knowledge network is established, and the data acquisition efficiency is improved.

Description

Knowledge graph construction method and data processing device

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a knowledge graph construction method, a data processing device, and a computer readable storage medium.

Background

In the medical field, medical big data are more and more emphasized, the medical big data cover the whole life cycle of people, and the medical big data comprise various data such as personal health, medical service, disease prevention and control, health protection, food safety, health care and the like, and the medical big data are fully utilized to promote the application of internet health consultation, online pre-appointment sub-consultation, mobile payment and the like, so that the standard, shared and mutually trusted diagnosis and treatment flow is optimized. However, the medical big data is a data set with characteristics of huge quantity, scattered sources, various formats and the like, and how to store and correlate the acquired medical big data is critical.

The traditional database is mostly a relational database based on a relational database model, the relational database is read through forms, fields and the like, the hierarchy and the expression mode of the relation are complex and various, the quick reading and the modification are not facilitated, the writing of a large amount of data is also not facilitated, and the utilization degree of the data is not high due to the fact that the relation among the data is not visual.

Because the symptoms, diseases and diagnosis and treatment means in the medical field generally have an intricate and complex relationship, a medical database constructed by a traditional relationship database cannot quickly provide visual reference for medical staff and is inconvenient for expanding a medical knowledge system, a medical data storage mode which can intuitively embody the association of medical data and is convenient for flexible data access is lacking.

Disclosure of Invention

The embodiment of the application provides a knowledge graph construction method, which can collect scattered real and reliable medical data to form a knowledge graph so as to provide a medical data storage mode which can intuitively embody the association of the medical data and is convenient for flexible data access.

In a first aspect, an embodiment of the present application provides a knowledge graph construction method, where the method includes:

acquiring a graphic database and a medical treatment record of a user, wherein the graphic database comprises entity definitions, entity relation definitions and attribute definitions;

extracting an entity, an entity relationship, an attribute and an attribute value pointed by the attribute in the medical treatment record;

identifying an entity definition of an entity in the medical visit record, an entity relationship definition of the entity relationship, and an attribute definition of the attribute;

filling entities, entity relationships, attributes and attribute values pointed by the attributes in the medical treatment records into the graphic database, so that the entities are filled into corresponding entity definitions, the entity relationships are filled into corresponding entity relationship definitions, and the attributes and the attribute values are filled into corresponding attribute definitions to obtain at least one extraction map;

And integrating the at least one extraction map into a knowledge map by adopting an entity alignment mode.

With reference to the first aspect, in a first implementation manner of the first aspect, the medical treatment record includes identity information, a complaint condition, a medical history record condition, a treatment condition and/or treatment plan information of at least one user.

With reference to the first aspect, in a second implementation manner of the first aspect, the graphic database is a NEO4J graphic database.

With reference to the first aspect, in a third implementation manner of the first aspect, before extracting an entity, an entity relationship, an attribute, and an attribute value pointed to by the attribute in the medical treatment record, the method further includes:

acquiring medical insurance claim records, family relation information and/or medical common knowledge information, wherein the medical insurance claim records are insurance claim records, and comprise the insurance application time, the insurance application time and/or the insurance application diseases of at least one user;

the medical insurance claim records, family relationship information, and/or medical common knowledge information are added to the medical visit records.

With reference to the first aspect, in a fourth implementation manner of the first aspect, after the obtaining the graphic database and the medical treatment record of the user, before extracting the entity, the entity relationship, the attribute, and the attribute value pointed to by the attribute in the medical treatment record, the method further includes:

Displaying a visual graphical interface comprising the graphical database;

receiving a modification instruction, wherein the modification instruction comprises a creation instruction and a deletion instruction;

and modifying part of entity definitions, entity relation definitions and/or attribute definitions in the graphic database according to the modification instruction.

With reference to the first aspect, in a fifth implementation manner of the first aspect, after the fusing the at least one extraction map into one knowledge-graph, the method further includes:

displaying a visual graphical interface containing the knowledge graph;

receiving hidden instructions containing partial entities, entity relationships and/or attributes;

and hiding part of entities, entity relations and/or attributes in the knowledge graph in response to the hiding instruction.

With reference to the first aspect, in a sixth implementation manner of the first aspect, after the integrating the at least one extraction map into one knowledge-graph, the method further includes;

checking conflicting information in the knowledge graph;

acquiring the reliability level of each piece of information in the conflicting pieces of information;

and only the information with the highest reliability level in the conflicting information is reserved so as to correct the knowledge graph.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including a unit for executing the knowledge graph construction method of the first aspect, where the data processing apparatus includes:

The system comprises an acquisition unit, a storage unit and a processing unit, wherein the acquisition unit is used for acquiring a graphic database and a medical treatment record of a user, and the graphic database comprises entity definitions, entity relation definitions and attribute definitions;

the extraction unit is used for extracting an entity, an entity relationship, an attribute and an attribute value pointed by the attribute in the medical treatment record;

an identification unit for identifying an entity definition of an entity in the medical visit record, an entity relationship definition of the entity relationship, and an attribute definition of the attribute;

a filling unit, configured to fill an entity, an entity relationship, an attribute, and an attribute value pointed by the attribute in the medical treatment record into the graphic database, so that the entity is filled into a corresponding entity definition, the entity relationship is filled into a corresponding entity relationship definition, and the attribute value are filled into a corresponding attribute definition, thereby obtaining at least one extraction map;

and the fusion unit is used for fusing the at least one extraction map into a knowledge map by adopting an entity alignment mode.

With reference to the second aspect, in a first implementation manner of the second aspect, the medical treatment record includes identity information, a complaint condition, a medical history record condition, a treatment condition, and/or treatment plan information of at least one user.

With reference to the second aspect, in a second implementation manner of the second aspect, the graphic database is a NEO4J graphic database.

With reference to the second aspect, in a third implementation manner of the second aspect, the obtaining unit is further configured to obtain a medical insurance claim record, family relationship information and/or medical common sense information, where the medical insurance claim record is an insurance claim record, and includes an insurance application time, an insurance application time and/or an insurance application disease of at least one user;

the data processing apparatus further comprises an adding unit for adding the medical insurance claim record, family relation information and/or medical common sense information to the medical visit record.

With reference to the second aspect, in a fourth implementation manner of the second aspect, the data processing apparatus further includes a display unit, a receiving unit, and a modifying unit, and specifically:

a display unit for displaying a visual graphical interface comprising the graphical database;

the receiving unit is used for receiving modification instructions, wherein the modification instructions comprise creation instructions and deletion instructions;

and the modifying unit is used for modifying part of entity definitions, entity relation definitions and/or attribute definitions in the graphic database according to the modifying instruction.

With reference to the second aspect, in a fifth implementation manner of the second aspect, the data processing apparatus further includes a display unit and a receiving unit, and specifically:

the display unit is used for displaying a visual graphical interface containing the knowledge graph;

a receiving unit, configured to receive a hidden instruction containing a part of an entity, an entity relationship and/or an attribute;

the display unit is further configured to conceal a part of the entities, entity relationships, and/or attributes in the knowledge graph in response to the concealing instruction.

With reference to the second aspect, in a sixth implementation manner of the second aspect, the data processing apparatus further includes a checking unit and a correction unit, specifically;

the checking unit is used for checking the conflicting information in the knowledge graph;

the acquisition unit is further used for acquiring the reliability level of each piece of information in the conflicting pieces of information;

and the correcting unit is used for only reserving the information with the highest reliability level in the conflicting information so as to correct the knowledge graph. .

In a third aspect, an embodiment of the present application provides another data processing apparatus, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is configured to store a computer program that supports the data processing apparatus to execute the knowledge graph construction method described above, and the computer program includes program instructions, where the processor is configured to invoke the program instructions to execute the knowledge graph construction method of any implementation manner of the first aspect to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, where the computer program includes program instructions, when executed by a processor, to perform the knowledge graph construction method of any implementation manner of the first aspect to the first aspect.

The method comprises the steps of firstly obtaining a graphic database containing at least one entity definition, entity relation definition and/or attribute definition and at least one medical treatment record of a user, and then extracting corresponding important information in the medical treatment record according to defined contents in the graphic database. And assuming that entity definitions, entity relation definitions and entity attribute definitions are defined in the graphic database, extracting entity, entity relation, attribute and attribute value pointed by the attribute in the medical treatment record of each user according to the graphic database, filling the entity, entity relation, attribute and attribute value extracted from the medical treatment record into the corresponding defined entity definitions, entity relation definitions and attribute definitions in the graphic database, so that an extraction map can be obtained for the medical treatment record of each user, and then fusing all the extraction maps together by adopting an entity alignment method to obtain a complete knowledge map. It can be seen that the medical treatment records of the user are managed by adopting the knowledge graph, the medical treatment records from the real treatment process can be extracted and associated, and a medical professional knowledge network based on the medical treatment records is constructed, so that a medical data storage mode capable of intuitively representing the association of medical data and facilitating flexible data access is provided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below.

Fig. 1 is a schematic flow chart of a knowledge graph construction method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of a knowledge graph construction method according to another embodiment of the present application;

FIG. 3 is a schematic flow chart of a knowledge graph construction method according to another embodiment of the present application;

FIG. 4 is an exemplary diagram of a knowledge graph provided by an embodiment of the present application;

FIG. 5 is an exemplary diagram of a visual graphical interface including a graphical database provided in an embodiment of the present application;

FIG. 6 is an exemplary diagram of a graph database provided by an embodiment of the present application;

FIG. 7 is a schematic block diagram of a data processing apparatus provided in an embodiment of the present application;

FIG. 8 is a block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 9 is an exemplary diagram of another knowledge-graph provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

The present application is mainly applied to a data processing device, which may be a conventional data processing device, a large storage system, a desktop computer, a notebook computer, a tablet computer, a palm computer, a smart phone, a portable digital player, a smart watch, a smart bracelet, and the like, which is not limited in this application.

The terminal device described in the embodiments of the present application includes, but is not limited to, devices with communication functions, smart phones, tablet computers, notebook computers, desktop computers, portable digital players, smart bracelets, smart watches, and the like. When the terminal device sends data to the data processing device, recording and transmitting characteristics of the data according to a preset format, wherein the characteristics of the data comprise time, place, type and the like.

The graphic database is one type of NoSQL database, for example, neo4J graphic database, arango db graphic database, orientDB graphic database, flockDB graphic database, graphDB graphic database, infiniteGraph graphic database, titan graphic database, cayley graphic database, or the like. A graph database is a non-relational database, which may also be referred to as a graph-oriented/graph-based database. The basic meaning of a graphic database is to store and query data in a data structure such as a "graph". The data model is mainly embodied by nodes and edges, and compared with the traditional relational database, the data model has the advantage of being capable of rapidly solving complex relational problems, such as the relationships among people in a social network.

Nodes contained in the graphic database can be entity definitions or attribute definitions, edges represent entity relation definitions, and when entities, entity relations, attributes and attribute values in data are respectively filled into corresponding entity definitions, entity relation definitions and attribute definitions in the image database, extraction patterns are obtained, and then a plurality of extraction patterns are fused together to obtain a knowledge graph. In other words, the graphic database is a database framework of the knowledge graph, and the knowledge graph is obtained after the data is filled into the image database according to the format and the content required in the graphic database. Wherein, the entity definition, the entity relationship definition and the attribute definition refer to the type of the entity, the type of the entity relationship and the type of the attribute respectively.

It should be noted that, an entity is the most basic element in a knowledge graph, different entity relationships exist between different entities, and an entity represents a certain object which has distinguishability and exists independently, and world everything is composed of specific objects, which means an entity. Such as a person, a city, a plant, etc., a commodity, etc. An entity relationship refers to a relationship between two entities. Entities contain attributes such as "area", "population", "capital", etc. The attribute value refers to the actual content to which the attribute contained in the entity points, for example, the attribute "area" of the entity "china" points to the attribute value "9634057 square kilometer".

For example, as shown in fig. 4, there are three entities in the diagram, namely Zhang three, liu four and car, wherein Zhang three contains attribute birth date and attribute social account number, while Zhang three has attribute value 1969.9.21 of birth date, attribute value 123 of social account number, there are two entity relations between Zhang three and Liu four, one entity relation points to Liu four from Zhang three, represents Zhang Sanai Liu four, and the other entity relation points to Zhang three from Liu four, represents Liu four love to Zhu three; the vehicle and Zhang Sanzhi have an entity relationship, and Zhang Sanzhi vehicles represent Zhang Sanzhi vehicles; the vehicle and the Liqu have an entity relationship, and the Liqu points to the vehicle to represent that the Liqu owns the vehicle. In general, the graphic database in FIG. 4 expresses that Zhang three and Lifour, with a date of birth of 1969.9.21 and a social account of 123, are loving, and Zhang three-driven vehicles are Lifour.

It should be noted that, an entity contains an entity definition, an entity relationship contains an entity relationship definition, an attribute contains an attribute definition, and it is understood that different entities may contain the same entity definition, for example, the entity definition of "Zhang Sanu" and "Liqu" are both "users", different entity relationships may contain the same entity relationship definition, for example, the entity relationship definition of "mother and woman" and the entity relationship definition of "father and son" are both "direct relatives", and different attributes may contain the same attribute definition, for example, the attribute definition of "date of birth" and "place of residence" are both "identity information".

It should be further noted that, because the knowledge graph of the present application is based on the graph database, the knowledge graph inherits the advantage that the graph database can clearly and intuitively represent the connection between data, and is better than the conventional relational database in processing a large amount of complex, interconnected and changeable mesh data. The knowledge graph is far more efficient than the traditional relational database by hundreds, thousands or tens of thousands, and is especially suitable for the wide fields of social networks, real-time recommendation, banking loops, financial credit investigation systems and the like.

For example, looking up the old country with Zhang three among friends of Zhang three in the knowledge graph, only need to read the user who is old country with Zhang three among users who are apart from the relation length of Zhang three by 5, but if the traditional relation data is used, need to find a table containing the friends of Zhang three, then find the table containing the friends of Zhang three in turn.

Therefore, the knowledge graph has the advantages that firstly, the knowledge graph can process complex and various association analysis, the association between the scattered medical field data and expert experience is established in the most efficient mode, the knowledge graph is integrated, the medical professional knowledge network is constructed, and the information island is broken. And secondly, the knowledge graph is a data storage mode of a graph, and compared with a traditional storage mode, the data retrieval speed is higher, and real-time response of man-machine interaction is truly realized, so that quick intelligent retrieval and intelligent question-answering can be realized. And secondly, the knowledge graph can simulate the thinking process of a person to find, prove, reason and correct errors. Finally, knowledge logic and models can be continuously deposited according to learning functions such as reasoning, error correction and the like by the knowledge graph. In general, the knowledge graph can intuitively display data, greatly improve the efficiency and the speed of data reading and storage, realize real-time response, and simulate the human thinking process to discover, ask for evidence, reason, correct errors and the like. Thus, the present application can provide an efficient method of constructing a medical expertise network.

It should be noted that, in the present application, a knowledge graph based on Neo4j graphic database may be constructed. Neo4j graphic databases are one of nosqls, but have fast reading and storing of data compared to nosqls and traditional SQL relational databases, except for graphic databases, and their performance is not affected by large-scale data growth in the database, while maintaining efficient reading and querying performance throughout. In addition, neo4j can better and more intuitively represent the link between data. In detail, neo4j has several advantages over relational databases, and other NoSQL databases, including high performance of queries, flexibility of design, and agility of development, respectively.

In the field of graphic databases, besides Neo4j, there are various graphic databases such as an OrientDB graphic database, a Gigraph graphic database, an Allegrograph graphic database, and the like. The advantages of Neo4j compared to all of these graphic databases are in two ways. In one aspect Neo4j is a primitive graph computation engine that stores and uses data that is processed using primitive graph structure data throughout, unlike other graph databases, but uses graph structure data in the computation process and stores the data in a relational database in the storage process. On the other hand, neo4j is an open-source database, and the open-source community version attracts the use and popularization of a plurality of third parties, and simultaneously, the user is also supported by the user's hug and support of more developers, and the rich resources and cases for communication and learning are gathered. These support, popularization and a large number of uses in turn will drive the development of Neo4j well.

Overall, the high performance, ease of use, flexibility of design and agility of development of Neo4j queries, and the properties of a strong rock-like business management feature, fully demonstrate that use of Neo4j is a good choice. In particular, lightning-like read/write speeds, an unprecedented high performance; the unstructured data storage mode has great flexibility in database design; the method can be well adapted to the change of the demand and is suitable for using an agile development method; the method is easy to use, and the database can be used in an embedded mode, a data processing device mode, a distributed mode and the like; the data model can be designed by using a simple block diagram, so that modeling is convenient; the structural characteristics of the graph data can provide more and more excellent algorithm designs; providing a distributed high availability mode, which can support large-scale data growth; the database is safe and reliable, can backup data in real time, and is convenient for recovering the data; the data structure of the graph intuitively and vividly represents the application scene of the real world.

Referring to fig. 1, a schematic flowchart of a knowledge graph construction method according to an embodiment of the present application is provided, where the knowledge graph construction method shown in fig. 1 may include:

101: a graphic database and a medical visit record of the user are obtained.

In the embodiment of the application, a graphic database and a medical treatment record of a user are acquired, wherein the graphic database is a Neo4j graphic database, a FlockDB graphic database, an AllegroGrap graphic database, a GraphDB graphic database or an Infinigram graphic database and the like, the medical treatment record is from a trusted normal institution, and is generated from a diagnosis record filled in by a professional doctor according to a template when the doctor performs diagnosis and treatment for the user, and the diagnosis record comprises identity information, main complaint condition, medical history record condition, treatment condition and/or treatment scheme information and the like. The identity information comprises information such as date of birth, place of the birth and the like, and the complaint condition is used for describing the diseased condition of the user, and comprises diseases, expressed symptoms, diseased time and the like of the user; the medical history record is used for describing the historical illness condition of the user, and comprises all illness information and the like of the user; the treatment scheme information is used for describing treatment schemes set by medical staff such as doctors for users, and comprises medicines, treatment contents, expected treatment time, expected treatment effect and the like; the treatment situation is used to describe the situation in which a doctor or the like diagnoses the treatment that has been performed by the person for the user, including the progress of the treatment, the treatment time, the treatment effect, and the like.

The graphic database is defined with entity definition, entity relationship definition, attribute definition, etc., and the entity definition, the entity relationship definition, and the attribute definition refer to an entity type, an entity relationship type, and an attribute type, which indicate types of entities, types of entity relationships, and types of attributes that need to be extracted from a medical treatment record.

It should be further noted that, an entity contains an entity definition, an entity relationship contains an entity relationship definition, an attribute contains an attribute definition, and it is understood that different entities may contain the same entity definition, for example, the entity definition of "Zhang San" and "Liqu" are both "users", different entity relationships may contain the same entity relationship definition, for example, the entity relationship definition of "mother and woman" and the entity relationship definition of "father and son" are both "direct relatives", and different attributes may contain the same attribute definition, for example, the attribute definition of "date of birth" and "place of residence" are both "identity information".

For example, a framework as shown in fig. 6 is defined in the graphic database, where 6 attribute definitions, 1 entity definition, and 1 entity relationship definition are defined, specifically, an entity definition for a user, an entity relationship definition for a relationship, an attribute definition for treatment plan information, an attribute definition for a medical order care condition, an attribute definition for identity information, an attribute definition for application information, an attribute definition for a complaint condition, and an attribute definition for a treatment condition are defined.

It should also be noted that medical records are generally written in a unified manner by medical institutions according to requirements of national institutes of health, and thus can be regarded as data of structured data.

Further, a medical insurance claim record is added in the medical treatment claim record, wherein the medical insurance claim record is a claim record for insurance, and the insurance claim record comprises the insurance time, the insurance time and/or the insurance illness of at least one user.

The medical insurance claim records come from each major medical insurance institution, and the insurance business personnel fills the insurance records of the user according to a certain template, so that the data belonging to the structured data comprise the insurance application time, the insurance application time and/or the insurance application diseases, namely the diseases applied by the user, the insurance application time of each insurance application disease, the insurance application time of the insurance application diseases, and the like.

It can be seen that after the medical insurance claim records are added to the medical treatment records, new medical treatment records are obtained, and then the knowledge graph established based on the new medical treatment records not only contains various information of illness, treatment and recovery of the user, but also contains the insurance application condition and the insurance emergence condition of the user in medical insurance, and then the knowledge graph established based on the new medical treatment records synthesizes more reliable information, so that the information is more accurate, and can also be used for claim settlement databases in the field of medical insurance to help the service personnel of a medical insurance company to carry out nuclear insurance and the like.

Further, the above-mentioned medical records are added with structured data from encyclopedia sites and various vertical sites, and most of the common medical knowledge can be supplemented with these data.

Further, unstructured medical data in the form of hypertext markup language (HTML, hyper Text Markup Language) is added to the medical records to enrich the sources of medical data, so that coverage of the knowledge graph is continuously expanded by obtaining unstructured data.

It should be noted that, compared with the above-mentioned high-quality common sense knowledge, the knowledge data obtained from unstructured data is larger, can reflect the query requirement of the current user and can discover the latest entity or fact in time, but its quality is relatively poor and there is a certain error. This knowledge is exploited by redundancy of the internet to evaluate its confidence in subsequent mining by voting or other aggregation algorithms and is added to the knowledge-graph by manual auditing.

Further, family relation information is added to the medical treatment record, and the family relation information includes relative information related to the user.

Optionally, the graphic database is a NEO4J graphic database.

It should be noted that Neo4j graphic database belongs to one of nosqls, but has fast reading and storing of data compared to nosqls and traditional SQL relational databases except for graphic databases, and its performance is not affected by large-scale data growth in the database, but maintains efficient reading and querying performance all the time. In addition, neo4j can better and more intuitively represent the link between data. In detail, neo4j has several advantages over relational databases, and other NoSQL databases, including high performance of queries, flexibility of design, and agility of development, respectively. Moreover, neo4j is a primitive graph computation engine, on the one hand, that stores and uses data that is processed using primitive graph structure data throughout, unlike other graph databases, but uses the graph structure data in the computation process and stores the data in a relational database in the storage process. On the other hand, neo4j is an open-source database, and the open-source community version attracts the use and popularization of a plurality of third parties, and simultaneously, the user is also supported by the user's hug and support of more developers, and the rich resources and cases for communication and learning are gathered. These support, popularization and a large number of uses in turn will drive the development of Neo4j well.

Further, after acquiring the graphic database and the medical records of the at least one user, displaying a visual graphical interface comprising the graphic database prior to extracting at least one entity in the medical records of the at least one user; receiving a modification instruction, wherein the modification instruction comprises a creation instruction and a deletion instruction; and modifying part of entity definitions, entity relation definitions and/or attribute definitions in the graphic database according to the modification instruction.

In the embodiment of the application, new entity definitions, entity relationship definitions and/or attribute definitions can be created in the graphic database according to the requirements of users, and original entity definitions, entity relationship definitions and/or attribute definitions can be deleted. Specifically, after the data processing device acquires the graphic database, and optionally at least one entity definition, entity relationship definition and/or attribute definition, are displayed on a visual graphical interface of the data processing device. The user may delete the entity definition and/or the entity relationship definition in the graphic database by dragging or clicking on an icon, or may drag or click on an entity definition, an entity relationship definition, and/or an attribute definition next to the graphic database to add to the graphic database. The data processing apparatus may then receive a modification instruction of the user via the display interface, the modification instruction comprising a create instruction and a delete instruction, and then modify the entity definition, the entity relationship definition and/or the attribute definition in the graphic database according to the modification instruction to add or delete the entity definition and/or the entity relationship definition from the graphic database.

For example, as shown in fig. 5, the visual graphical interface includes a definition interface and an alternative interface, in which a defined graphical database is displayed, the alternative interface includes an icon for "clicking on a link" for connecting, and a plurality of attribute definitions and entity definitions that can be selected by a user, then the user can drag the entity definition and/or attribute definition of the alternative interface to the definition interface, click on the icon for "clicking on the link" to connect, or drag the entity definition and/or attribute definition to the definition interface, and the definition interface directly prompts the user which defined entities in the definition interface can be connected for the user to select entity relationship definitions, and the entity relationship definitions can also be customized by the user at the time of connecting, or be automatically generated after connecting.

It can be seen that, in the graphic database in the embodiment of the present application, the entity definition, the entity relationship definition and/or the attribute definition may be customized by the user, so that the user may obtain a new graphic database by deleting the entity definition and/or the entity relationship definition in the graphic database or adding a new entity definition and/or entity relationship definition in the graphic database, so that a more practical, specialized and personalized knowledge graph may be built based on the new graphic database.

Further, before the graphic database is acquired, an image database as shown in fig. 6 is established, and medical insurance claim records, family relationship information and medical common sense information are also added to the medical visit records.

102: an entity, entity relationship, attribute, and attribute value pointed to by the attribute in the medical visit record are extracted.

In the embodiment of the application, the entity relationship, the attribute and the attribute value pointed by the attribute in the medical treatment record of the user are extracted.

It should be noted that, the entity includes name information, identity number information, etc. of the entity definition as a user, the entity relationship includes father-son relationship, mother-woman relationship, etc. of which category is relatives, the attribute includes birth date, place, etc. of which category is identity information, disease, disorder, time of illness, etc. of which category is main complaint, history illness, etc. of which category is medical history record, treatment progress, treatment time, treatment effect, etc. of which category is treatment condition, and/or medicine, treatment content, expected treatment time, expected treatment effect, etc. of which category is treatment plan information, and the attribute value is specific content pointed by the attribute. Wherein, the symptoms refer to symptoms of symptoms appearing on a patient, the symptoms refer to diagnosis results made by doctors according to the symptoms of the patient, the medicines refer to medicines made by the doctors according to the diagnosis results, and the treatment scheme refers to treatment methods formulated by the doctors according to the symptoms of the patient.

The medical treatment data itself is structured data, and if non-formatted data is added to the medical treatment record, an algorithm extraction or the like may be used to extract an entity, entity relationship, attribute or the like in the medical treatment record. The algorithm extraction mode is that named entity recognition is carried out on the text through a natural language processing technology, and proper nouns and meaningful phrases are recognized from the non-formatted text and classified. For example, the relationship between the two entities "hundred degrees" and "internet company" and "yes" may be extracted from the text of "hundred degrees is one internet company".

It should be further noted that, if the medical treatment record further includes semi-formatted data, the semi-formatted data may be extracted in a regular manner, where the semi-formatted data includes a text, such as a segment of a resume, where the text often includes, for example, "name: zhang III, company name: format of science and technology limited, etc.

Further, the extracted result entity is manually corrected, so that the accuracy of the identified content is ensured.

103: entity definitions identifying entities in the medical visit record, entity relationship definitions of entity relationships, and attribute definitions of attributes.

In this embodiment of the present application, since the entity, entity relationship, and attribute extracted from the medical treatment record may belong to an undefined entity definition, entity relationship definition, and attribute definition in the graphic database, that is, the content that is not required in the knowledge graph in the present application, after the entity, entity relationship, attribute, and attribute value in the medical treatment record are extracted, the entity definition, entity relationship definition, and attribute definition of the entity, entity relationship, and attribute, respectively, are identified, and then the entity, entity relationship, and attribute in which the definition conforms to the content defined in the graphic database, and the attribute value pointed to by the selected attribute are selected.

Optionally, the entity, entity relationship, attribute and attribute value pointed by the attribute in the medical treatment record are extracted directly according to the entity definition, entity relationship definition and attribute definition defined in the graphic database.

104: and filling the entity, the entity relation, the attribute and the attribute value pointed by the attribute in the medical treatment record into a graphic database to obtain at least one extraction map.

In this embodiment of the present application, after the entity, the entity relationship, the attribute, and the attribute value pointed to by the attribute in the medical treatment record are extracted, the extracted entity, entity relationship, attribute, and attribute value are filled into the defined graphic database, specifically, because the entity definition of the entity, the entity relationship definition of the entity relationship, and the attribute definition of the attribute are identified, the entity relationship, the attribute, and the attribute value pointed to by the attribute may be filled into the corresponding definition, that is, the entity is filled into the corresponding entity definition, the entity relationship is filled into the corresponding entity relationship definition, and the attribute value pointed to by the attribute and the attribute is filled into the corresponding attribute definition. Thus, an extraction map can be established by the medical treatment record of the user.

For example, a framework as shown in fig. 6 is defined in the graphic database, where 6 attribute definitions, 1 entity definition, and 1 entity relationship definition are defined, specifically, an entity definition for a user, an entity relationship definition for a relationship, an attribute definition for treatment plan information, an attribute definition for a medical order care condition, an attribute definition for identity information, an attribute definition for application information, an attribute definition for a complaint condition, and an attribute definition for a treatment condition are defined. "Zhang Sanhas disease A, and uses drug B to treat", extract entity "Zhang Sano", attribute "disease", attribute value "A" pointed by attribute "disease", attribute "drug", attribute value "B" pointed by attribute "drug". The entity definition of the entity "Zhang Sano" is identified as "user", the attribute of the attribute "disease" is defined as "main complaint", the attribute of the attribute "drug" is defined as "treatment plan", and it is determined that the entity definition of the entity and the attribute definition of the attribute are already existing in the graphic database shown in fig. 6, so that the entity "Zhang Sano" is filled in the entity definition of the entity "user" in the graphic database shown in fig. 6, the attribute "disease" and the attribute value "a" pointed by the attribute "disease" are filled in the attribute definition of the attribute "main complaint" in the graphic database, and the attribute value "B" pointed by the attribute "drug" is filled in the attribute definition of the attribute "treatment plan" in the graphic database, thereby obtaining the knowledge graph shown in fig. 9.

105: and integrating the at least one extraction map into a knowledge map by adopting an entity alignment mode.

In the embodiment of the application, since the extraction of the knowledge graph refers to extracting various candidate entities and attribute associations thereof required for constructing the knowledge graph from various types of data sources, and forming individual isolated extraction graphs (Extraction Graphs), in order to form a real knowledge graph, the information islands need to be integrated together. Specifically, identifying entities in any two extraction maps; searching the same entity in the two extraction maps; and fusing the two extraction patterns based on the same entity until all the extraction patterns are fused together to obtain a knowledge pattern.

It should be noted that, an entity alignment manner may be adopted to integrate multiple extraction patterns into one knowledge pattern. Wherein the Object Alignment (Object Alignment) aims at discovering those entities with different IDs but representing the same Object in the real world, and merging the entities into one entity Object with a globally unique identity to be added to the knowledge-graph. The entity alignment adopts a clustering method, and the key of the clustering is to define proper similarity measures, wherein the similarity measures follow the following rules: (1) entities having the same description may represent the same entity (character similarity); (2) entities with the same attribute-value may represent the same object (attribute similarity); (3) entities with the same neighbors may point to the same object (structurally similar).

For example, the entities (1) having the same description may represent the same entity (similar characters), i.e., two entities containing the same identification information should be substantially the same, for example, when there is one entity containing the same identification number in two extraction maps respectively, it is indicated that the two entities are substantially the same, so that the two entities may be aligned and combined into one entity.

For example, the above (2) entities having the same attribute-value may represent the same object (attribute is similar), that is, when two different entities are found in the two extraction graphs respectively, but the attributes around the two entities are identical, it is indicated that the two entities are substantially one entity, for example, the identity of the patient may be the identity card number of the patient or the social security number of the patient, then the entities containing the identity card number and the social security number of the same patient may exist in the two extraction graphs respectively, and although the identity card number and the social security number of the same patient are different, they represent the same patient, then the entities around the entity representing the attribute should be identical, for example, the information about the birth date, name, sex, address, etc. of the patient is indicated that the entity representing the patient in the two extraction graphs is substantially one entity, then the entity may be aligned, and the two entities are combined into one entity.

For example, the above (3) entities having the same neighbor may point to the same object (similar structure), that is, the entities having the same neighbor may be the same except that the above (2) entities having the same attribute may be the same, for example, the patient 1 is the father of the patient 2, the mother of the patient 2 is the patient 3, and the husband of the patient 3 is the patient 4, then the patient 1 is the patient 4 substantially, and this example is only for illustrating the concept, and the case of divorce is not included.

Optionally, for the efficiency of entity alignment, the data partitioning or segmentation based algorithm divides the entities into subsets, finds potentially identical objects on these subsets using similarity based calculations, specifically, using existing alignment labeling data as training data, and then finds more identical entity pairs using graph based semi-supervised learning algorithms such as label pass (Label Propagation) in conjunction with similarity calculations.

Optionally, the results of the entity alignment are used as candidates for further manual review and filtering.

Further, after the at least one extraction pattern is integrated into one knowledge pattern, a visual graphical interface containing the knowledge pattern is displayed; receiving hidden instructions containing partial entities, entity relationships and/or attributes; and responding to the hiding instruction, and hiding part of entities, entity relations and/or attributes in the knowledge graph.

In the embodiment of the application, after the knowledge graph is built, the knowledge graph is displayed on the visual interface, because the knowledge graph can intuitively display the connection between data, a user can directly view the knowledge graph from the visual interface, and when viewing, the user can also slide or click on the visual interface according to the needs of the user to hide the entity or attribute which is not concerned by the user, so that the terminal equipment can receive the hiding instruction through the visual graphical interface, thereby hiding part of the content in the knowledge graph, namely hiding part of the entity, entity relation and/or attribute in the knowledge graph.

It can be seen that after the knowledge graph is built, the user can hide some entities and/or entity relationships which are not concerned by himself in the knowledge graph displayed on the visual graphical interface, and only display the parts which are concerned by himself, so that the flexibility of the knowledge graph built by the embodiment of the application is further improved.

Further, after the knowledge graph is established, checking conflicting information in the knowledge graph; acquiring the reliability level of each piece of information in the conflicting pieces of information; only the information with highest reliability level in the conflicted information is reserved to correct the knowledge graph.

In the embodiment of the application, when information from different data sources is fused to form a knowledge graph, some entities belong to two mutually exclusive categories at the same time or one attribute corresponding to a certain entity corresponds to a plurality of values, so that inconsistency occurs. Thus, after the knowledge-graph is established, it is also necessary to check and correct the inconsistency of the knowledge-graph. Specifically, whether the repulsive entity, entity relation and/or attribute exist in the knowledge graph or not is checked, and if the repulsive entity, entity relation and/or attribute exist, the knowledge graph is corrected according to the checked result.

It should be noted that, when correcting the knowledge graph according to the result of the verification, firstly verifying the conflicting information in the knowledge graph, then obtaining the source of the conflicting information and the reliability level of the source, wherein the higher the reliability level of the source is, the more reliable the information of the source element is, the higher the reliability level of the source is as the reliability level of the information in the source, and finally, the information with the highest reliability level in the conflicting information is reserved, and the other information with the non-highest reliability level is discarded to correct the knowledge graph. The present application then decides which of the conflicting information to ultimately choose by considering the reliability of the data source. I.e. those extracted by preferentially using highly reliable data sources such as encyclopedia or structured data,

For example, the knowledge graph contains conflicting data a and data B, the data a and the data B are derived from the source a and the source B respectively, and the reliability levels of the source a and the source B are a first reliability level and a second reliability level respectively, the first reliability level is higher than the second reliability level, and the reliability level of the source of the information is the reliability level of the information, so that the reliability levels of the data a and the data B are respectively the first reliability level and the second reliability level, the reliability level of the data a is higher than the reliability level of the data B, and therefore the data a in the knowledge graph is reserved, and the data B is discarded to correct the knowledge graph.

Optionally, counting the frequency of occurrence of each of the conflicting information in the plurality of data sources, and then retaining the information with the highest frequency of occurrence to correct the knowledge graph.

In this embodiment of the present application, if information appears in multiple data sources, it may be stated that the information is relatively reliable, so that the occurrence frequency of each of the above-mentioned conflicting information in the knowledge graph is counted, then the occurrence frequency of each of the multiple conflicting information is compared, then the information with the largest occurrence frequency in the conflicting information is retained in the knowledge graph, and other information except the information with the largest occurrence frequency in the conflicting information is discarded to correct the knowledge graph. For example, there are 5 data sources, data A conflicts with data B, so that statistics A occur for all 3 of the 5 data sources, thus data A occurs at 60% frequency, and data B occurs for all 2 of the 5 data sources, thus data B occurs at 40% frequency. It can be seen that the frequency of occurrence of the data a is greater than that of the data B, so that the data a is retained in the knowledge-graph, and the data B is discarded to complete the correction of the knowledge-graph.

Further, the category alignment calculation needs to be completed before the statistics of the occurrence frequency of the conflicting information in the data source, for example, for a numerical attribute value, the units of the attribute value need to be unified.

In the embodiment of the application, a graphic database defining at least one entity is acquired first, then the entity, entity relation and/or attribute in the medical treatment record of each user are extracted and filled into the graphic database, so that an extraction map for each user is obtained, and finally the extraction maps of each user are fused together to obtain a knowledge map. Therefore, the medical treatment record of the patient is managed by adopting the knowledge graph, the scattered medical data is extracted and correlated, the medical professional knowledge network is constructed, the information island is broken, the data can be rapidly stored, the real-time response is realized, and the thinking process of a person can be simulated to find, ask for evidence, reason, correct errors and the like. Thus, the present application can provide a more flexible and efficient management method for medical data.

Referring to fig. 2, which is a schematic flowchart of another knowledge graph construction method according to an embodiment of the present application, the knowledge graph construction method shown in fig. 2 may include:

201: a medical treatment record of a user and a graphic database containing entity definitions, entity relationship definitions and attribute definitions are obtained.

202: and displaying a visual graphical interface comprising the graphical database.

In the embodiment of the application, new entity definitions, entity relationship definitions and/or attribute definitions can be created in the graphic database according to the requirements of users, and original entity definitions, entity relationship definitions and/or attribute definitions can be deleted. Specifically, after the data processing device acquires the graphic database, and optionally at least one entity definition, entity relationship definition and/or attribute definition, are displayed on a visual graphical interface of the data processing device.

For example, as shown in fig. 5, the visual graphical interface includes a definition interface in which a defined graphical database is displayed, and an alternative interface including an icon for "click-to-connect" for connecting lines, and a plurality of attribute definitions and entity definitions that can be selected by the user.

203: a modification instruction is received, the modification instruction including a create instruction and a delete instruction.

In this embodiment of the present application, the user may delete the entity definition and/or the entity relationship definition in the graphic database by using an operation manner such as dragging or clicking on an icon, or drag or click on the entity definition, the entity relationship definition and/or the attribute definition beside the graphic database to add to the graphic database. The data processing apparatus may then receive a modification instruction of the user via the display interface, the modification instruction comprising a creation instruction for adding at least any one of an entity definition, an entity relationship definition and an attribute definition in the graphic database and a deletion instruction for deleting at least any one of the entity definition, the entity relationship definition and the attribute definition in the graphic database.

204: and modifying entity definitions, entity relation definitions and/or attribute definitions in the graphic database according to the modification instruction.

In the embodiment of the application, after receiving the modification instruction, the terminal device modifies the entity definition, the entity relationship definition and/or the attribute definition in the graphic database according to the modification instruction so as to add or delete the entity definition and/or the entity relationship definition from the graphic database. The modification instruction comprises a creation instruction and a deletion instruction, and when the terminal equipment receives the creation instruction, entity definitions and/or attribute definitions selected by a user are added into the graphic database, or when the terminal equipment receives the deletion instruction, the entity definitions and/or attribute definitions selected by the user are deleted from the graphic database.

For example, the user may drag the entity definition and/or attribute definition of the alternative interface to the definition interface, click on the icon of "click on line" to connect, or drag the entity definition and/or attribute definition to the definition interface, the definition interface directly prompts the user which of the defined entities in the definition interface the dragged content can connect with for the user to select the entity relationship definition, and the entity relationship definition may be customized by the user at the time of connecting, or may be automatically generated after connecting.

205: an entity, entity relationship, attribute, and attribute value pointed to by the attribute in the medical visit record are extracted.

206: entity definitions identifying entities in the medical visit record, entity relationship definitions of entity relationships, and attribute definitions of attributes.

207: and filling the entity, the entity relation, the attribute and the attribute value pointed by the attribute in the medical treatment record into a graphic database to obtain at least one extraction map.

208: and integrating the at least one extraction map into a knowledge map by adopting an entity alignment mode.

209: and displaying a visual graphical interface containing the knowledge graph.

In the embodiment of the application, after the knowledge graph is built, the knowledge graph is displayed on the visual interface, and because the knowledge graph can intuitively display the connection between data, a user can directly view the knowledge graph from the visual interface.

210: hidden instructions are received that contain part of the entities, entity relationships and/or attributes.

In the embodiment of the application, the user can directly view the knowledge graph on the visual interface and simultaneously can also perform operations such as sliding or clicking on the visual interface according to own requirements to hide entities or attributes which are not concerned by the user, so that the terminal equipment can receive a hiding instruction through the visual graphical interface, and part of entities, entity relationships and attributes contained in the hiding instruction, namely, the content which the user wants to hide.

211: and hiding part of the entities, entity relations and/or attributes in the knowledge graph in response to the hiding instruction.

In the embodiment of the present application, when the hiding instruction is received, part of the entities, entity relationships and/or attributes included in the knowledge graph are hidden according to the hiding instruction, so that part of the content in the knowledge graph is hidden.

In the embodiment of the application, the user can obtain a new graphic database by deleting entity definitions, attribute definitions and/or entity relationship definitions in the graphic database or adding new entity definitions, attribute definitions and/or entity relationship definitions in the graphic database so as to redefine the entity definitions, entity relationship definitions and/or attribute definitions in the graphic database, so that a more practical, specialized and personalized knowledge graph can be established based on the redefined graphic database later. In addition, in the embodiment of the application, after the knowledge graph is built, the knowledge graph is displayed on the visual graphical interface, and the user can select some entities, entity relations and/or attributes which are not concerned in the hidden knowledge graph and only display the parts which are concerned, so that the flexibility of the knowledge graph built by the embodiment of the application is further improved.

Referring to fig. 3, which is a schematic flowchart of another knowledge graph construction method according to an embodiment of the present application, the knowledge graph construction method shown in fig. 3 may include:

301: a graphic database and a medical visit record of the user are obtained.

302: obtaining medical insurance claim records, family relation information and medical common sense information, and adding the medical insurance claim records, the family relation information and the medical common sense information into the medical treatment records.

It should be noted that, the medical insurance claim records come from each major medical insurance institution, and the insurance business personnel fills the insurance records of the user according to a certain template, so that the data belonging to the structured data includes the insurance application time, the insurance application time and/or the insurance application diseases, that is, the insurance application time of the disease applied by the user, and the insurance application time of each insurance application disease and the insurance application time of the disease already insurance.

303: an entity, entity relationship, attribute, and attribute value pointed to by the attribute in the medical visit record are extracted.

304: entity definitions identifying entities in the medical visit record, entity relationship definitions of entity relationships, and attribute definitions of attributes.

305: and filling the entity, the entity relation, the attribute and the attribute value pointed by the attribute in the medical treatment record into a graphic database to obtain at least one extraction map.

306: and integrating the at least one extraction map into a knowledge map by adopting an entity alignment mode.

307: and checking the conflicting information in the knowledge graph.

In the embodiment of the application, when information from different data sources is fused to form a knowledge graph, some entities belong to two mutually exclusive categories at the same time or one attribute corresponding to a certain entity corresponds to a plurality of values, so that inconsistency occurs. Thus, after the knowledge graph is established, it is also necessary to check the inconsistency of the knowledge graph, that is, check whether there are repulsive entities, entity relationships, and/or attributes in the knowledge graph.

308: the reliability level of each of the conflicting information is obtained.

In this embodiment of the present application, after the conflicting information is detected, the reliability level of each piece of information is obtained, specifically, the source of each piece of conflicting information and the reliability level of the source are obtained first, where a higher reliability level of the source indicates that the information of the source element is more reliable, and the reliability level of the source is used as the reliability level of the information in the source.

309: and only the information with the highest reliability level in the conflicting information is reserved so as to correct the knowledge graph.

In the embodiment of the application, if the knowledge graph is checked whether the repulsive entity, entity relationship and/or attribute exist or not, the knowledge graph is corrected according to the checked result. Specifically, after the reliability level of each piece of information in the conflicting information is obtained, the information with the highest reliability level in the conflicting information is reserved finally, and other information with the non-highest reliability level is discarded so as to correct the knowledge graph. The present application then decides which of the conflicting information to ultimately choose by considering the reliability of the data source. I.e. those extracted by preferentially using highly reliable data sources such as encyclopedia or structured data,

In the embodiment of the invention, the medical insurance claim records, the medical common sense knowledge and the family relation information are added to the medical treatment records to obtain new medical treatment records, so that the knowledge graph established based on the new medical treatment records not only contains various information of illness, treatment and recovery of the user, but also contains the information of the family members of the user, the information of the insurance risk of the user and the medical common sense knowledge, and the knowledge graph established based on the new medical treatment records synthesizes more reliable information, is more accurate, can also be used in a claim database in the medical insurance field to help a salesman of a medical insurance company to carry out nuclear insurance and the like, and the medical common sense knowledge can supplement most of medical common sense knowledge and enrich the knowledge graph, and the family relation information can also add family members of the user to the knowledge graph, thereby being convenient for inquiring the family genetic medical history of the user, the probability of the possible illness of the user and the like. It can be seen that the knowledge graph established by the embodiment of the application synthesizes more reliable information and is more accurate.

It should be noted that, the foregoing descriptions of the various embodiments are intended to emphasize the differences between the various embodiments, and the same or similar features thereof may be referred to each other for brevity and will not be repeated herein.

The embodiment of the application also provides a data processing device, which is used for executing the unit of the knowledge graph construction method of any one of the above. In particular, referring to fig. 7, a schematic block diagram of a data processing apparatus is provided in an embodiment of the present application. The data processing apparatus of the present embodiment includes: an acquisition unit 701, an extraction unit 702, an identification unit 703, a filling unit 704, and a fusion unit 705.

An obtaining unit 701, configured to obtain a graphic database and a medical treatment record of a user, where the graphic database includes an entity definition, an entity relationship definition, and an attribute definition;

an extracting unit 702, configured to extract an entity, an entity relationship, an attribute, and an attribute value pointed to by the attribute in the medical treatment record;

an identification unit 703 for identifying an entity definition of an entity in the medical treatment record, an entity relationship definition of the entity relationship, and an attribute definition of the attribute;

a filling unit 704, configured to fill the entity, the entity relationship, the attribute, and the attribute value pointed by the attribute in the medical treatment record into the graphic database, so that the entity is filled into a corresponding entity definition, the entity relationship is filled into a corresponding entity relationship definition, and the attribute value are filled into a corresponding attribute definition, thereby obtaining at least one extraction map;

And a fusion unit 705, configured to fuse the at least one extraction map into a knowledge map by adopting an entity alignment manner.

Further, the medical record includes identity information, complaint conditions, medical history records, treatment conditions, and/or treatment regimen information of the at least one user.

Further, the graphic database is a NEO4J graphic database.

Further, the obtaining unit 701 is further configured to obtain a medical insurance claim record, family relationship information, and/or medical common sense information, where the medical insurance claim record is an insurance claim record, and includes an insurance application time, and/or an insurance application disease of at least one user; the data processing apparatus further comprises an adding unit 706 for adding the medical insurance claim record, family relation information and/or medical common sense information to the medical visit record.

Further, the above data processing apparatus further includes a display unit 707, a receiving unit 708, and a modifying unit 709, specifically: a display unit 707 for displaying a visual graphical interface including the graphical database; a receiving unit 708 for receiving modification instructions, the modification instructions including a create instruction and a delete instruction; a modifying unit 709, configured to modify a part of entity definitions, entity relationship definitions and/or attribute definitions in the graphic database according to the modifying instruction.

Further, the above data processing apparatus further includes a display unit 707 and a receiving unit 708, specifically: a display unit 707 for displaying a visual graphical interface including the knowledge graph; a receiving unit 708, configured to receive a hidden instruction including a part of an entity, an entity relationship, and/or an attribute; the display unit 707 is further configured to conceal a part of the entities, entity relationships, and/or attributes in the knowledge-graph in response to the concealing instruction.

Further, the above-mentioned data processing apparatus further includes a checking unit 710 and a correcting unit 711, in particular; the checking unit 710 is configured to check conflicting information in the knowledge graph; the acquiring unit 701 is further configured to acquire a reliability level of each of the conflicting information; the correction unit 711 is configured to retain only the information with the highest reliability level among the conflicting information, so as to correct the knowledge graph.

In this embodiment of the present application, the obtaining unit 701 obtains a graphic database defining at least one entity, then the extracting unit 702 extracts the entity, entity relationship and/or attribute in the medical treatment record of each user, and fills the entity, entity relationship and/or attribute into the graphic database, so as to obtain an extraction map for each user, and finally the merging unit 705 merges the extraction maps of each user together, so as to obtain a knowledge map. Therefore, the medical treatment record of the patient is managed by adopting the knowledge graph, the scattered medical data is extracted and correlated, the medical professional knowledge network is constructed, the information island is broken, the data can be rapidly stored, the real-time response is realized, and the thinking process of a person can be simulated to find, ask for evidence, reason, correct errors and the like. Thus, the present application can provide a more flexible and efficient management method for medical data.

Referring to fig. 8, a schematic block diagram of a data processing apparatus according to another embodiment of the present application is provided. The data processing apparatus in the present embodiment as shown in the drawings may include: one or more processors 810, input devices 820, and output devices 830, and a memory 840. The processor 810 and the memory 840 are connected by a bus 850. The memory 840 is used to store a computer program comprising program instructions, and the processor 810 is used to execute the program instructions stored in the memory 840.

A processor 810 for performing the functions of the acquisition unit 701 for acquiring a graphic database containing entity definitions, entity relationship definitions and attribute definitions and medical treatment records of the user; the execution extraction unit 702 is further configured to extract an entity, an entity relationship, an attribute, and an attribute value pointed to by the attribute in the medical treatment record; and also for performing the function of an identification unit 703 for identifying an entity definition of an entity in the medical treatment record, an entity relationship definition of the entity relationship, and an attribute definition of the attribute; the function of the filling unit 704 is further configured to fill the entity, the entity relationship, the attribute, and the attribute value pointed by the attribute in the medical treatment record into the graphic database, so that the entity is filled into a corresponding entity definition, the entity relationship is filled into a corresponding entity relationship definition, and the attribute value are filled into a corresponding attribute definition, thereby obtaining at least one extraction map; and is further configured to perform a function of a fusion unit 705, configured to fuse the at least one extraction map into a knowledge graph in an entity alignment manner.

Further, the graphic database is a NEO4J graphic database.

Further, the processor 810 is further configured to obtain a medical insurance claim record, family relationship information, and/or medical common sense information, where the medical insurance claim record is an insurance claim record, and includes an insurance application time, and/or an insurance application disease of at least one user; and also to perform the function of the adding unit 706 for adding the medical insurance claim records, family relation information and/or medical common sense information to the medical visit records.

Further, the output device 830 is configured to perform the function of the display unit 707 for displaying a visual graphical interface including the above-described graphical database.

Accordingly, the input device 820 is configured to perform the functions of the receiving unit 708 and is configured to receive modification instructions, where the modification instructions include a create instruction and a delete instruction.

Correspondingly, the processor 810 is further configured to execute the function of the modifying unit 709, for modifying part of the entity definitions, the entity relationship definitions and/or the attribute definitions in the graphic database according to the modifying instruction.

Further, the output device 830 is further configured to display a visual graphical interface including the knowledge graph.

Accordingly, the input device 820 is further configured to receive hidden instructions containing part of the entity, entity relationship and/or attribute; correspondingly, the output device 830 is further configured to conceal a part of the entities, entity relationships, and/or attributes in the knowledge graph in response to the concealing instruction.

Further, the processor 810 is further configured to perform functions of the checking unit 710 and the correcting unit 711, specifically, the processor 810 is configured to perform functions of the checking unit 710, and to check conflicting information in the knowledge graph; the method is also used for acquiring the reliability level of each piece of information in the conflicting pieces of information; the processor 810 is further configured to perform the function of the correction unit 711, and to only retain the information with the highest reliability level among the conflicting information, so as to correct the knowledge graph.

It should be appreciated that in embodiments of the present application, the processor 810 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 840 may include read only memory and random access memory and provide instructions and data to the processor 810. A portion of memory 840 may also include non-volatile random access memory. For example, the memory 840 may also store information of device type.

In a specific implementation, the processor 810 described in the embodiments of the present application may execute the implementation manners described in the first embodiment, the second embodiment, and the third embodiment of the knowledge graph construction method provided in the embodiments of the present application, or may execute the implementation manner of the data processing apparatus described in the embodiments of the present application, which is not described herein again.

In another embodiment of the present application, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program comprising program instructions for execution by a processor.

The computer readable storage medium may be an internal storage unit of the data processing apparatus of any of the foregoing embodiments, such as a hard disk or a memory of the data processing apparatus. The computer readable storage medium may also be an external storage device of the data processing apparatus, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the data processing apparatus. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the data processing apparatus. The computer readable storage medium is used to store a computer program and other programs and data required by the data processing apparatus. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. The skilled person may use different knowledge-graph construction methods for each specific application to achieve the described functionality, but such implementation should not be considered beyond the scope of the present application.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the data processing apparatus and unit described above may refer to the corresponding process in the foregoing embodiment of the knowledge graph construction method, which is not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed data processing apparatus and knowledge graph construction method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present application.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a data processing apparatus, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. The knowledge graph construction method is characterized by comprising the following steps of:

adding unstructured medical data in the hypertext markup language to the medical treatment record;

displaying a visual graphical interface containing the graphical database, wherein the visual graphical interface comprises a definition interface and an alternative interface, the definition interface displays a defined graphical database, and the alternative interface contains an icon for connecting a 'click connection', and a plurality of attribute definitions and entity definitions for user selection;

receiving a modification instruction, wherein the modification instruction comprises a creation instruction and a deletion instruction, the creation instruction is used for adding at least any one of entity definition, entity relation definition and attribute definition in a graphic database, the deletion instruction is used for deleting at least any one of entity definition, entity relation definition and attribute definition in the graphic database, the modification instruction comprises a creation instruction that a user drags the entity definition and/or the attribute definition of an alternative interface to a definition interface, and then clicks an icon of 'click on a connection' to trigger connection;

Modifying part of entity definitions, entity relationship definitions and/or attribute definitions in the graphic database according to the modification instruction, including: automatically generating partial entity definitions, entity relationship definitions and/or attribute definitions after entity definition and/or attribute definition connection;

extracting an entity, an entity relationship, an attribute and an attribute value pointed by the attribute in the medical treatment record, wherein the extracting comprises the following steps: when the medical treatment record comprises non-formatted data, extracting entities, entity relations, attributes and attribute values pointed by the attributes in the medical treatment record based on an algorithm extraction mode, wherein the algorithm extraction mode is a mode of identifying named entities of texts through a natural language processing technology, identifying proper nouns and meaningful phrases from the non-formatted texts and classifying the proper nouns and the meaningful phrases;

Integrating the at least one extraction spectrum into a knowledge spectrum by adopting an entity alignment mode, wherein the entity alignment mode is obtained by adopting a clustering-based method, and similarity measurement defined by the clustering method follows the following rules: entities with the same description represent the same entity, entities with the same attribute-value represent the same object and entities with the same neighbor point to the same object;

displaying a visual graphical interface containing the knowledge graph;

hiding part of entities, entity relations and/or attributes in the knowledge graph in response to the hiding instruction;

checking conflicting information in the knowledge graph;

acquiring a source of each piece of information in the conflicting information, and taking the reliability level of the source as the reliability level of the information in the source;

and reserving the information with the highest reliability level in the conflicting information so as to correct the knowledge graph.

2. The method of claim 1, wherein the medical treatment record contains at least one of user identity information, complaint conditions, medical history conditions, treatment conditions, and/or treatment regimen information.

3. The method of claim 1, wherein the graphic database is a NEO4J graphic database.

4. The method of claim 1, wherein prior to extracting the entity, entity relationship, attribute, and attribute value to which the attribute points in the medical visit record, further comprising:

5. A data processing apparatus for performing the method of any of claims 1-4, comprising:

the fusion unit is used for fusing the at least one extraction map into a knowledge map in an entity alignment mode;

the display unit is further used for responding to the hiding instruction and hiding part of entities, entity relations and/or attributes in the knowledge graph;

the acquisition unit is further used for acquiring the source of each piece of information in the conflicted information, and taking the reliability level of the source as the reliability level of the information in the source;

and the correcting unit is used for reserving the information with the highest reliability level in the conflicting information so as to correct the knowledge graph.

6. A data processing apparatus comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-4.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-4.