CN115630697B

CN115630697B - Knowledge graph construction method and system capable of distinguishing single-phase and double-phase affective disorder

Info

Publication number: CN115630697B
Application number: CN202211317234.XA
Authority: CN
Inventors: 汪洋
Original assignee: Luzhou Vocational and Technical College
Current assignee: Luzhou Vocational and Technical College
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-04-07
Anticipated expiration: 2042-10-26
Also published as: CN115630697A

Abstract

The invention relates to the field of construction of an emotional disorder knowledge map, in particular to a method and a system for constructing a knowledge map capable of distinguishing single-phase and double-phase emotional disorders, wherein the method for constructing the knowledge map capable of distinguishing the single-phase and double-phase emotional disorders ensures the comprehensiveness of data by using a multi-source heterogeneous data source, judges the correctness of the data by calculating the credibility of a relation triple, further achieves the purpose of screening and updating the data in an emotional disorder database in real time, and ensures the accuracy and the real-time performance of the knowledge map; the knowledge graph obtained by the method can distinguish monophasic affective disorder and biphasic affective disorder, has fine content, multi-source data and perfect data fusion, and can support patients and family members with affective disorder to realize self-diagnosis and monitoring and assist doctors in making decisions clinically.

Description

Knowledge graph construction method and system capable of distinguishing single-phase and double-phase affective disorder

Technical Field

The invention relates to the field of construction of an emotional disorder knowledge map, in particular to a method and a system for constructing a knowledge map capable of distinguishing single-phase and double-phase emotional disorders.

Background

Affective disorder (depression) is the fourth disease in the world, however, the medical treatment and prevention of affective disorder are still in a state of low recognition rate, and with the increasing degree of social information, affective disorder begins to appear in a low-age trend. Although the technology for constructing the knowledge graph by using the natural language processing and data mining technology is mature day by day, because the affective disorder has the characteristics of multiple symptoms, high complexity, easy misdiagnosis, difficult type identification and the like in clinic, the content of the constructed Chinese medical knowledge graph related to the affective disorder is simple and rough, particularly the two parts of monophasic affective disorder and bidirectional affective disorder which can be distinguished have similar expression symptoms and are difficult to identify, and the affective disorder knowledge graph which is easy to misdiagnose is not enough to support patients and families with affective disorder to realize self-diagnosis and monitoring and assist doctors in making decisions in clinic. At present, no fine knowledge graph capable of distinguishing single-phase and double-phase affective disorder is successfully constructed, and meanwhile, the constructed knowledgegraph of the diseases of the department of pediatrics has the problems of single data source and low persuasion, so that a method for constructing a fine knowledge graph capable of distinguishing the single-phase and double-phase affective disorder and having multiple data sources and a perfect fusion algorithm is urgently needed.

Disclosure of Invention

Aiming at the defects and the actual requirements in the prior art, the invention provides a method for constructing a knowledge graph capable of distinguishing single-phase and double-phase affective disorders, which comprises the following steps: acquiring multisource heterogeneous affective disorder information, and constructing a basic corpus by using the affective disorder information, wherein the affective disorder information comprises monophasic affective disorder information and biphasic affective disorder information; constructing an initial three-tuple set by extracting entity units and entity relations in the basic corpus; integrating the initial three-tuple set to obtain a target three-tuple set; identifying the polarity of the target triple set relationship triples, and calculating the reliability of the relationship triples according to the polarity; according to the credibility of the relation triplets, dynamically updating the target triplets to obtain an affective disorder database; and constructing a knowledge map capable of distinguishing monophasic affective disorder from bipolar affective disorder by using the dynamically updated affective disorder database. The invention discloses a method for constructing a knowledge graph capable of distinguishing single-phase and double-phase affective disorder, which ensures the comprehensiveness of data through a multi-source heterogeneous data source, judges the correctness of the data through calculating the credibility of a relation triple, achieves the aim of screening and updating data in an affective disorder database in real time, and ensures the accuracy and the real-time performance of the knowledge graph; the knowledge graph obtained by the method can distinguish monophasic affective disorder and biphasic affective disorder, has fine content, multi-source data and perfect data fusion, and can support patients and family members with affective disorder to realize self-diagnosis and monitoring and assist doctors in making decisions clinically.

Optionally, the obtaining of the multi-source heterogeneous affective disorder information and the building of the basic corpus by using the affective disorder information include the following steps: acquiring structured, semi-structured and unstructured monophasic affective disorder information and bipolar affective disorder information from medical authoritative books, medical scientific research papers, hospital electronic medical records and internet data resources respectively; performing data cleansing and format processing on the unipolar affective disorder information and the bipolar affective disorder information; and constructing and obtaining a basic corpus by using the monophasic affective disorder information and the biphasic affective disorder information which are subjected to data cleaning and format processing.

Optionally, the constructing an initial three-tuple set by extracting the entity units and the entity relationships in the base corpus includes the following steps: training a maximum entropy Markov model by a gradient descent method and a convolutional neural network by using the basic corpus; extracting entity units in the basic corpus by using the maximum entropy Markov model; extracting entity relations in the basic corpus by using a convolutional neural network based on the maximum entropy Markov model; extracting attribute features in the entity units, and constructing a binary attribute list by using the attribute features and the entity units; matching the entity units through the binary attribute list to obtain an attribute ternary group set; building a relation three-tuple set by using the entity unit and the entity relation; and summarizing the attribute triple set and the relation triple set, and normalizing the attribute triple set and the relation triple set to obtain an initial triple set.

Optionally, the integrating the initial triplet set to obtain a target triplet set includes the following steps: integrating entity units in the initial triples through entity alignment, and eliminating entity naming reference conflicts in the initial triples; and integrating the attribute characteristics in the initial triples through attribute alignment, and eliminating the attribute characteristic reference conflict in the initial triples.

Optionally, the identifying a polarity of a relationship triplet in the target triplet set, and calculating a reliability of the relationship triplet according to the polarity includes the following steps: tracing the data source of the relation triple, and setting the reliability weight of the relation triple according to the data source; classifying the relation triples according to the entity units to obtain a plurality of groups of relation triple subsets, wherein the content of the entity units of all the relation triples in any relation triple subset is the same; dividing elements in the relation triple subset into positive relation triples and negative relation triples according to entity relations in the relation triples; constructing a credibility function of the relationship triples by using the number of the positive relationship triples, the number of the negative relationship triples and the credibility weights of the relationship triples, wherein the credibility function satisfies the following relationship:

，

wherein the content of the first and second substances,

representing the total number of relation triples in the target triplet set,

denotes the first

A three-way set of relationships,

representing the total number of relational triplets as

In the target triplet set

The trustworthiness of an individual relationship triplet,

，

is shown as

The number of forward relationship triples in the subset of relationship triples in which the individual relationship triples reside,

is shown as

The first in the relation triple subset of the relation triple

The number of forward-direction relationship triplets is,

to represent

The confidence level of the received signal is determined,

，

is shown as

The number of negative-going relationship triples in the subset of relationship triples in which the individual relationship triples reside,

is shown as

The first in the relation triple subset of the relation triple

A negative-going relationship triple of the one,

to represent

The confidence weight of (2); according to the reliability function, obtainAnd taking the reliability of the relation triple function.

Optionally, the dynamically updating the target triplet set according to the reliability of the relationship triplets to obtain the affective disorder database includes the following steps: tracing the data source of the relation triple, and setting the credibility threshold of the relation triple according to the data source; acquiring an untrusted relation triple by combining the credibility with the credibility threshold; and rejecting the unreliable relation triples in the target triples set, and dynamically updating the target triples set to obtain the affective disorder database.

Optionally, the confidence threshold satisfies the following formula:

，

wherein, the first and the second end of the pipe are connected with each other,

is shown as

The confidence threshold of each relationship triplet may be,

denotes the first

The maximum confidence weight of the forward relation triple in the relation triple subset in which the relation triple is located,

denotes the first

And the minimum credibility weight of the negative relation triple in the relation triple subset in which the relation triple is positioned.

Optionally, the obtaining an untrusted relationship triple by using the reliability in combination with the reliability threshold includes: marking the relation triples with the credibility being more than or equal to the credibility threshold as credibility relation triples; and marking the relation triplets with the credibility less than the credibility threshold value as the non-credible relation triplets.

Optionally, the constructing a knowledge map capable of distinguishing monophasic affective disorder from bipolar affective disorder by using the dynamically updated affective disorder database comprises the following steps: tracing the data source of the triples in the dynamically updated affective disorder database; dividing the affective disorder database into a monophasic affective disorder sub-database and a bipolar affective disorder sub-database according to the data source; constructing a single-layer single-phase affective disorder knowledge map by utilizing the single-phase affective disorder sub-database; constructing a single-layer bipolar affective disorder knowledge graph by using the bipolar affective disorder sub-database; extracting common factors in the monophasic affective disorder sub-database and the bipolar affective disorder sub-database, wherein the common factors comprise entity units of the triplets; constructing a middle layer of monophasic affective disorder and biphasic affective disorder through common factors; and constructing a knowledge graph capable of distinguishing monophasic affective disorder from bipolar affective disorder by using the single-layer monophasic affective disorder knowledge graph, the single-layer bipolar affective disorder knowledge graph and the middle layer.

In a second aspect, the present invention further provides a knowledge graph constructing system capable of distinguishing single-phase and double-phase affective disorders, which includes a processor, an input device, an output device and a memory, where the processor, the input device, the output device and the memory are connected to each other, where the memory is used for storing a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method for constructing a knowledge graph capable of distinguishing single-phase and double-phase affective disorders according to the first aspect.

Drawings

FIG. 1 is a flow chart of a method for constructing a knowledge base map capable of distinguishing single-phase and double-phase affective disorders;

FIG. 2 is a partial content schematic diagram of a knowledge-map of the present invention that distinguishes between unipolar and bipolar affective disorders;

FIG. 3 is a diagram of a construction system of a knowledge map for distinguishing single-phase and double-phase affective disorders in accordance with the present invention.

Detailed Description

Specific embodiments of the present invention will be described in detail below, and it should be noted that the embodiments described herein are only for illustration and are not intended to limit the present invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that: it is not necessary to employ these specific details to practice the present invention. In other instances, well-known circuits, software, or methods have not been described in detail in order to avoid obscuring the present invention.

Throughout the specification, reference to "one embodiment," "an embodiment," "one example" or "an example" means: the particular features, structures, or characteristics described in connection with the embodiment or example are included in at least one embodiment of the invention. Thus, the appearances of the phrases "in one embodiment," "in an embodiment," "one example" or "an example" in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures, or characteristics may be combined in any suitable combination and/or sub-combination in one or more embodiments or examples. Further, those of ordinary skill in the art will appreciate that the illustrations provided herein are for illustrative purposes and are not necessarily drawn to scale.

Referring to fig. 1, in one embodiment, the present invention provides a method for constructing a knowledge graph capable of distinguishing single-phase and double-phase affective disorders, comprising the following steps:

s1, obtaining multisource heterogeneous affective disorder information, and building a basic corpus by using the affective disorder information.

The affective disorder information includes unipolar affective disorder information and bipolar affective disorder information. In an optional embodiment, the obtaining of the multi-source heterogeneous affective disorder information and building a basic corpus by using the affective disorder information in step S1 includes the following steps: acquiring structured, semi-structured and unstructured monophasic affective disorder information and bipolar affective disorder information from medical authoritative books, medical scientific research papers, hospital electronic medical records and internet data resources respectively; performing data cleaning and format processing on the monophasic affective disorder information and the biphasic affective disorder information; and constructing and obtaining a basic corpus by using the monophasic affective disorder information and the biphasic affective disorder information which are subjected to data cleaning and format processing.

In this embodiment, the medical authoritative book is selected from textbooks used by the medical college and medical books published by the medical publishing house. The medical scientific research paper is obtained by downloading websites such as the Hopkins, the Uygur Pups, the Wanfang websites and the like; the hospital electronic medical record is obtained by carrying out desensitization treatment on the visit data of the hospital; the desensitization process is a process of transforming the visit data to remove individual privacy information in the visit data, and is intended to protect the security of information such as privacy data. Crawling the internet data resources by using a web crawler, wherein the web crawler is specifically selected as a focused web crawler; the medical or popular science type network related to the affective disorder theme which is defined in advance is selectively crawled by utilizing the focused web crawler, hardware and network resources are greatly saved by utilizing the focused web crawler, meanwhile, the updating is quick, and the requirement on the information in the medical field can be well met. Structured, semi-structured and unstructured monophasic affective disorder information data and biphasic affective disorder information data are obtained through multiple ways, and richness and comprehensiveness of data sources are guaranteed. And the data cleaning represents that the data resources of the monophasic affective disorder information and the biphasic affective disorder information are screened, and wrong data resources are removed. The format processing means that the data resource formats of the unipolar affective disorder information and the bipolar affective disorder information are uniformly formatted. And the multisource heterogeneous affective disorder information is converted into a uniform format by using data cleaning and format processing, and a basic corpus is constructed by using the multisource heterogeneous affective disorder information with the uniform format, so that the subsequent data processing is facilitated, and the efficiency of the subsequent data processing is improved.

In yet another alternative embodiment, the data sources of unipolar affective disorder information and bipolar affective disorder information include, but are not limited to, medical authoritative books, medical research papers, hospital electronic medical records and internet data sources, i.e., the data sources can be adjusted and selected according to actual situations.

And S2, constructing an initial three-tuple set by extracting entity units and entity relations in the basic corpus.

In an optional embodiment, the constructing an initial three-tuple set by extracting entity units and entity relationships in the base corpus in step S2 includes the following steps: training a maximum entropy Markov model by a gradient descent method and a convolutional neural network by utilizing the basic corpus; extracting entity units in the basic corpus by using the maximum entropy Markov model; extracting entity relations in the basic corpus by using a convolutional neural network based on the maximum entropy Markov model; extracting attribute features in the entity units, and constructing a binary attribute list by using the attribute features and the entity units; matching the entity units through the binary attribute list to obtain an attribute ternary group set; constructing a relation three-tuple set by using the entity unit and the entity relation; and summarizing the attribute triple set and the relation triple set, and normalizing the attribute triple set and the relation triple set to obtain an initial triple set.

The maximum entropy Markov model is a model commonly used for extracting part of entity units in the construction of the knowledge graph, but only the maximum entropy Markov model is used for extracting the entity units in the basic corpus, so that the extraction process is long in time consumption, high in cost and poor in expansibility; therefore, the method adopts a convolution neural network which utilizes a gradient descent method and deep learning to combine with a maximum entropy Markov model to realize entity unit extraction, wherein the convolution neural network comprises two hidden convolution layers, a maximum pooling layer and a full-link layer, the convolution layers are used for extracting features in a basic corpus, and the maximum pooling layer is used for reserving main features in the features, so that the purpose of dimension reduction is achieved; the full connection layer is used for classifying the main features. The entity units in the basic corpus are extracted by the maximum entropy Markov model trained continuously by utilizing the gradient descent method and the convolutional neural network, so that the loss in the process of extracting the entity units is greatly reduced, and the generalization capability of the original maximum entropy Markov model is improved by increasing the diversity of the features extracted from the basic corpus.

In this embodiment, the entity unit includes entity terms such as affective disorder type, treatment drug name, clinical symptom name, etc.; the attribute characteristics comprise characteristic attribute vocabularies such as 'caution items', 'adverse reactions', 'contraindications' and the like of the medicine for treating the bidirectional affective disorder; entity relationships include relational terms such as actions, and the like between nouns of entities, for example: "treat", "side effects", "for". The form of the attribute feature may be specifically selected as L = { L1, …, li }, where Li represents a certain attribute feature in the attribute list, and a binary attribute list is constructed by using the attribute feature and the entity unit, that is, the entity unit randomly and any attribute feature select to form a binary attribute group, and the binary attribute list can be obtained by summarizing the binary attribute group. The binary attribute group form in the binary attribute list is selected as < entity, attribute >. And matching the entity units again through the binary attribute list to obtain an attribute ternary group set, wherein the attribute ternary group in the attribute ternary group set is in the form of an entity, an attribute and an entity. The form of the relationship triplet set in the relationship triplet set built by using the entity unit and the entity relationship is < entity, relationship, entity >.

In an alternative embodiment, the notes, adverse reactions, contraindications and the like of the medicine for treating the bidirectional affective disorder are extracted and matched by entity units and attribute features to obtain a corresponding binary attribute list, and the binary attribute list comprises the following binary attribute groups: < drugs for treating bidirectional affective disorder, precautions >, < drugs for treating bidirectional affective disorder, adverse reactions > and < drugs for treating bidirectional affective disorder, contraindications >; and then through the attribute three-tuple set obtained by the binary attribute list matching entity unit, the method comprises the following steps: < drugs for treating bilateral affective disorder, attention points, sleep >, < drugs for treating bilateral affective disorder, adverse reactions, emesis > and < drugs for treating bilateral affective disorder, taboo, alcohol >.

In yet another optional embodiment, the aggregating the attribute triple set and the relationship triple set, and the initial triple set obtained by normalizing the attribute triple set and the relationship triple set includes: (ii) a lithium carbonate tablet, a side effect, diarrhea > and a paroxetine hydrochloride tablet, treatment, major depressive disorder >, wherein the lithium carbonate tablet, side effect, diarrhea > are from a ternary set of attributes; the paroxetine hydrochloride tablet, treatment and major depressive disorder are from the relation triad.

And S3, integrating the initial three-tuple set to obtain a target three-tuple set.

In an optional embodiment, the integrating the initial triplet set to obtain the target triplet set in step S3 includes the following steps: integrating entity units in the initial triples through entity alignment, and eliminating entity naming reference conflicts in the initial triples; and integrating the attribute characteristics in the initial triples through attribute alignment, and eliminating the attribute characteristic reference conflict in the initial triples.

In this embodiment, by aligning the entity units and the attribute features of the triples in the initial triplet set, language expression problems including, but not limited to, entity naming reference conflicts, attribute naming reference conflicts, and the like are eliminated, where the entity naming reference conflicts can be specifically expressed as: the drug Paroxetine hydrochloride tablet for treating major depressive disorder has other names including leter, le You, sultam and pareoxetine. Error data, repeated data, redundant data, ambiguous data and relevance conflict data in the initial triple set are eliminated through alignment operation, the accuracy of the target triple set is guaranteed, and meanwhile noise generated in subsequent operation based on the target triple set is reduced.

And S4, identifying the polarity of the target triple set relationship triples, and calculating the reliability of the relationship triples according to the polarity.

In an optional embodiment, the identifying the polarity of the relationship triplet in the target triplet set and calculating the reliability of the relationship triplet according to the polarity in step S4 includes the following steps: tracing the data source of the relation triple, and setting the reliability weight of the relation triple according to the data source; classifying the relation triples according to the entity units to obtain a plurality of groups of relation triple subsets, wherein the content of the entity units of all the relation triples in any relation triple subset is the same; dividing elements in the relation triple subset into positive relation triples and negative relation triples according to entity relations in the relation triples; constructing a reliability function of the relation triples by using the number of the positive relation triples, the number of the negative relation triples and the reliability weights of the relation triples, wherein the reliability function satisfies the following relations:

，

representing the total number of relation triples in the target triplet set,

is shown as

A three-way set of relationships,

the total number of the relation triplets is represented as

In the target triplet set

The trustworthiness of an individual relationship triplet,

，

is shown as

is shown as

The first in the relation triple subset of the relation triple

The number of forward-direction relationship triplets is,

，

represent

The confidence value of (a) is calculated,

，

is shown as

The number of negative-going relational triples in the subset of relational triples in which the respective relational triples lie,

is shown as

The first in the relation triple subset of the relation triple

A negative-going relationship triple of the one,

，

to represent

The confidence weight of (2); and acquiring the reliability of the relation triple function according to the reliability function. In this embodiment, the reliability of any relation triple in the target triple set is obtained through the reliability function constructed by the method, and a data basis is provided for removing error data by using the reliability subsequently.

In an alternative embodiment, a relational triple < entity a, relational, entity B > has 2 data sources, one from the medical paper and another from the internet crawl data: the confidence weight of the data source from the medical paper can be set to 1.5, and the confidence weight of the data source on the internet can be set to 1.2.

In an alternative embodiment, a triple of relationship is a positive triple of relationship and a negative triple of relationship, specifically, the positive triple of relationship is < paroxetine hcl, available for treatment, major depressive disorder >; the negative relation triple is < paroxetine hydrochloride tablets, no treatment, major depression >.

And S5, dynamically updating the target ternary group set according to the reliability of the relation ternary group to obtain an affective disorder database.

In an optional embodiment, the step S5 of dynamically updating the target triplet set according to the credibility of the relationship triplets to obtain the dysaffective disorder database includes the following steps: tracing the data source of the relation triple, and setting the credibility threshold of the relation triple according to the data source; acquiring an untrusted relation triple by combining the credibility with the credibility threshold; and rejecting the unreliable relation triples in the target triples set, and dynamically updating the target triples set to obtain the affective disorder database. In this embodiment, the obtaining an untrusted relationship triple by using the reliability in combination with the reliability threshold includes the following steps: the relation triples with the credibility being larger than or equal to the credibility threshold are marked as credibility relation triples; and marking the relation triplets with the credibility less than the credibility threshold value as the non-credible relation triplets. In the invention, each relation triple < entity, relation and entity > needs to be judged to be correct by credibility, and is screened by using credibility, so that the correct credible relation triple is kept to enter map construction, and the accuracy of the knowledge map is further ensured.

In a further alternative embodiment, by the confidence function proposed in an alternative embodiment of step S4:

on the basis, a self-adaptive weighting reliability threshold estimation method related to a data source of a relation triple is designed, namely, for different relation triples, a corresponding reliability threshold can be obtained through a reliability threshold function provided by the method, and the specific reliability threshold function meets the following formula:

，

wherein the content of the first and second substances,

is shown as

The confidence threshold of each relationship triplet may be,

is shown as

is shown as

And the minimum credibility weight of the negative relation triple in the relation triple subset in which the relation triple is positioned. Because the data source of the invention is dynamically updated, along with the update of the data, the credibility values of different relation triples and the corresponding credibility threshold values are also dynamically updated, namely along with the increase of the data quantity and the credibility of the data, the contents in the dysthymia database for creating the knowledge graph are also continuously updated and iterated, and the accuracy and the real-time performance of the data are ensured.

In yet another alternative embodiment, the operations of aligning, identifying polarity, and the like performed by steps S4 and S5 on the initial triplet set may be implemented by the following codes:

s41, begin; // Start;

s42, initialization; v/initialization;

s43, for Ei ϵ E ← aligned; // physical alignment operation;

s44, for Li ϵ L ← aligned; // attribute alignment operation;

s45, RDG = update (RDF); updating the initial triple to obtain a target triple;

S46、end；

s47, for q in RDG; calculating the polarity of the relationship triples in the updated target triples;

S48、

(ii) a V/calculate relationship triple confidence

A value;

S49、if (

>=

) then//

(ii) a The reliability is greater than or equal to the reliability threshold value

；

S410, q = 1; the polarity of a relation triple q is set to be 1;

s411, D1 ← q; the/relation triple q is put into a credible library D1;

S412、else if (

<

) then //

(ii) a Confidence level less than a confidence level threshold

；

S413, q = 0; the polarity of a relation triple q is set to be 0;

s414, D2 ← q; the/relation triple q is put into the untrusted library D2;

S415、end if；

S416、end；

s417, if Normal ← Normal ≦ Nupdate } the n; // dynamically updating the base corpus;

s418, update RDG; // update the target triplet set again;

s419, starting the iteration process; // an iterative process;

s420, update D1 and D2; updating a dynamic trusted library D1 and an untrusted library D2;

S421、end.

the above procedure obtains a trusted library D1 and an untrusted library D2 by inputting an initial triplet set, a confidence threshold and a base corpus. Namely, the relationship triplets in the trusted library D1 are trusted relationship triplets, and the relationship triplets in the untrusted library D1 are untrusted relationship triplets.

And S6, constructing a knowledge map capable of distinguishing monophasic affective disorder from biphasic affective disorder by utilizing the dynamically updated affective disorder database.

In an alternative embodiment, the constructing of the knowledgeable map capable of distinguishing between unipolar affective disorder and bipolar affective disorder using the dynamically updated affective disorder database in step S6 comprises the steps of: tracing the data source of the triples in the dynamically updated affective disorder database; dividing the affective disorder database into a monophasic affective disorder sub-database and a biphasic affective disorder sub-database according to the data source; constructing a single-layer single-phase affective disorder knowledge map by utilizing the single-phase affective disorder sub-database; constructing a single-layer bipolar affective disorder knowledge graph by using the bipolar affective disorder sub-database; extracting common factors in the monophasic affective disorder sub-database and the bipolar affective disorder sub-database, wherein the common factors comprise entity units of the triplets; constructing a middle layer of monophasic affective disorder and biphasic affective disorder through common factors; and constructing a knowledge graph capable of distinguishing monophasic affective disorder from bipolar affective disorder by using the single-layer monophasic affective disorder knowledge graph, the single-layer bipolar affective disorder knowledge graph and the middle layer. In this embodiment, the monophasic affective disorder sub-database, the bipolar affective disorder sub-database, and the sub-database formed by the common factors all adopt the open source graph database Neo4j as a bottom storage structure.

The invention is used for constructing the method of the knowledge graph capable of distinguishing the single-phase and double-phase affective disorder diseases, the comprehensiveness of data is ensured through a multi-source heterogeneous data source, and the correctness of the data is judged by calculating the credibility of the relation triple, so that the purposes of screening and updating the data in the affective disorder disease database in real time are achieved, and the accuracy and the real-time performance of the knowledge graph are ensured; the knowledge graph obtained by the method can be used for distinguishing monophasic affective disorder and biphasic affective disorder, meanwhile, the knowledge graph is fine in content, multi-source in data and complete in data fusion, and can support patients and family members with affective disorder to realize self-diagnosis and monitoring and assist doctors in making clinical decisions.

In an alternative embodiment, the local content of the knowledge graph constructed by the method of the present invention, in which the entity units in the triple data model are represented by different colored geometric figures, is shown in fig. 2, and in detail, in the present embodiment, red circles are used to represent the disease types, specifically including bipolar affective disorder and unipolar depression; blue circles are used for representing clinical psychological conditions corresponding to the disease condition, specifically comprising depression, mania, abnormal mood, depressed mood, thought retardation and somatization symptoms, yellow circles are used for representing therapeutic drugs, specifically comprising lithium carbonate tablets and paroxetine hydrochloride tablets, and green circles are used for representing clinical physiological conditions corresponding to the disease condition, specifically comprising nausea, vomiting and dizziness; in fig. 2, TM represents a therapeutic relationship, SE represents a therapeutic drug side effect relationship, SYM represents a symptom relationship of an entity, and EC represents an confusable relationship between entities. And different entity units are represented by using the graphs with different colors, so that the visual identification degree of the knowledge graph is improved.

Referring to fig. 3, the present invention further provides a system for constructing a knowledge graph capable of distinguishing single-phase and double-phase affective disorders, which includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method for constructing a knowledge graph capable of distinguishing single-phase and double-phase affective disorders according to the present invention. The knowledge graph construction system capable of distinguishing the single-phase and double-phase affective disorder is high in integration level, compact in structure and good in stability, and can efficiently execute the knowledge graph construction method capable of distinguishing the single-phase and double-phase affective disorder, so that the practicability of the knowledge graph construction system is further improved.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A knowledge graph construction method capable of distinguishing single-phase and double-phase affective disorders is characterized by comprising the following steps:

acquiring multisource heterogeneous affective disorder information, and constructing a basic corpus by using the affective disorder information, wherein the affective disorder information comprises monophasic affective disorder information and biphasic affective disorder information;

constructing an initial three-tuple set by extracting entity units and entity relations in the basic corpus;

integrating the initial three-tuple set to obtain a target three-tuple set;

identifying the polarity of the target triple set relationship triples, and calculating the reliability of the relationship triples according to the polarity;

according to the credibility of the relation triplets, dynamically updating the target triplets to obtain an affective disorder database;

constructing a knowledge graph capable of distinguishing monophasic affective disorder from biphasic affective disorder by utilizing the dynamically updated affective disorder database;

the method for acquiring the multisource heterogeneous affective disorder information and building a basic corpus by utilizing the affective disorder information comprises the following steps:

acquiring structured, semi-structured and unstructured monophasic affective disorder information and bipolar affective disorder information from medical authoritative books, medical scientific research papers, hospital electronic medical records and internet data resources respectively;

performing data cleansing and format processing on the unipolar affective disorder information and the bipolar affective disorder information;

constructing a basic corpus by using the monophasic affective disorder information and the biphasic affective disorder information which are subjected to data cleaning and format processing;

the method for establishing the knowledge graph capable of distinguishing the monophasic affective disorder from the biphasic affective disorder by utilizing the dynamically updated affective disorder database comprises the following steps of:

tracing the data source of the triples in the dynamically updated affective disorder database;

dividing the affective disorder database into a monophasic affective disorder sub-database and a biphasic affective disorder sub-database according to the data source;

constructing a single-layer single-phase affective disorder knowledge graph by using the single-phase affective disorder sub-database;

constructing a single-layer bipolar affective disorder knowledge graph by using the bipolar affective disorder sub-database;

extracting common factors in the monophasic affective disorder sub-database and the bipolar affective disorder sub-database, wherein the common factors comprise entity units of the triples;

constructing a middle layer of monophasic affective disorder and biphasic affective disorder through common factors;

and constructing a knowledge graph capable of distinguishing monophasic affective disorder from bipolar affective disorder by using the single-layer monophasic affective disorder knowledge graph, the single-layer bipolar affective disorder knowledge graph and the middle layer.

2. The method for constructing a knowledge graph of distinguishable unipolar and bipolar affective disorders according to claim 1, wherein the constructing of the initial three-tuple set by extracting entity units and entity relationships in the base corpus comprises the following steps:

training a maximum entropy Markov model by a gradient descent method and a convolutional neural network by using the basic corpus;

extracting entity units in the basic corpus by using the maximum entropy Markov model;

extracting entity relations in the basic corpus by using a convolutional neural network based on the maximum entropy Markov model;

extracting attribute features in the entity units, and constructing a binary attribute list by using the attribute features and the entity units;

matching the entity units through the binary attribute list to obtain an attribute ternary group set;

building a relation three-tuple set by using the entity unit and the entity relation;

and summarizing the attribute triple set and the relation triple set, and normalizing the attribute triple set and the relation triple set to obtain an initial triple set.

3. The method for constructing a knowledge graph of distinguishable unipolar and bipolar affective disorders according to claim 1, wherein said integrating said initial triad set to obtain a target triad set comprises the steps of:

integrating entity units in the initial triples through entity alignment, and eliminating entity naming reference conflicts in the initial triples;

and integrating the attribute characteristics in the initial triples through attribute alignment, and eliminating the attribute characteristic reference conflict in the initial triples.

4. The method for constructing a knowledge graph of distinguishable unipolar and bipolar affective disorders according to claim 1, wherein said identifying the polarity of the relationship triplet in the target triplet set and calculating the confidence level of the relationship triplet according to the polarity comprises the steps of:

tracing the data source of the relation triple, and setting the reliability weight of the relation triple according to the data source;

classifying the relation triples according to the entity units to obtain a plurality of groups of relation triple subsets, wherein the content of the entity units of all the relation triples in any relation triple subset is the same;

dividing elements in the relation triple subset into positive relation triples and negative relation triples according to entity relations in the relation triples;

constructing a reliability function of the relation triples by using the number of the positive relation triples, the number of the negative relation triples and the reliability weights of the relation triples, wherein the reliability function satisfies the following relations:

，

wherein the content of the first and second substances,

represents the total number of relation triples in the target triplet set, and is asserted>

Indicates the fifth->

A relationship triple, <' >>

A total number of triples representing a relationship of @>

Is selected based on the target triplet set>

Confidence of individual relationship triples, <' >>

，/>

Represents a fifth or fifth party>

The number of forward relational triples in the subset of relational triples in which a respective relational triple is located, and/or>

Represents a fifth or fifth party>

The ^ th or greater than or equal to in the subset of relationship triples in which the respective relationship triple is located>

A positive relationship triplet +>

Represents->

In the value of (1), in conjunction with the confidence value of (4)>

，/>

Indicates the fifth->

The number, of negative-going relationship triples in the subset of relationship triples in which a respective relationship triple is located, is/are asserted>

Indicates the fifth->

Hydro-based relation triplet in a subset of relation triples in which a plurality of relation triples are located>

A negative relationship triplet, <' > asserted>

Represents->

The confidence weight of (2);

and acquiring the reliability of the relation triple function according to the reliability function.

5. The method for constructing a knowledge graph capable of distinguishing the unipolar and the bipolar affective disorders according to claim 4, wherein the step of dynamically updating the target triad set to obtain the affective disorder database according to the credibility of the relationship triad comprises the following steps:

tracing the data source of the relation triple, and setting the credibility threshold of the relation triple according to the data source;

acquiring an untrusted relation triple by combining the credibility with the credibility threshold;

and rejecting the unreliable relation triples in the target triples set, and dynamically updating the target triples set to obtain the affective disorder database.

6. The method of constructing a knowledge-graph distinguishing between unipolar and bipolar affective disorders according to claim 5, wherein said confidence threshold satisfies the following equation:

，

indicates the fifth->

Individual relationship triple confidence threshold, <' >>

Indicates the fifth->

The maximum confidence weight of the forward relation triple in the relation triple subset in which each relation triple is positioned is determined by the value of the maximum confidence value of the forward relation triple in the relation triple subset in which each relation triple is positioned>

Indicates the fifth->

7. The method for constructing a knowledge graph for distinguishing between unipolar and bipolar affective disorders according to claim 5, wherein said obtaining of said triplet of incredible relationships using said confidence level in combination with said confidence level threshold comprises the steps of:

marking the relation triples with the credibility being more than or equal to the credibility threshold as credibility relation triples;

and marking the relation triplets with the credibility less than the credibility threshold value as the non-credible relation triplets.

8. A system for constructing a knowledgeable map capable of distinguishing between single and double phase affective disorders, comprising a processor, an input device, an output device, and a memory, wherein the processor, the input device, the output device, and the memory are connected to each other, wherein the memory is used for storing a computer program, and the computer program comprises program instructions, and the processor is configured to call the program instructions to execute the method for constructing a knowledgeable map capable of distinguishing between single and double phase affective disorders according to any one of claims 1 to 7.