CN112579790A

CN112579790A - Method and device for constructing severe disease knowledge base, storage medium and electronic equipment

Info

Publication number: CN112579790A
Application number: CN202011497501.7A
Authority: CN
Inventors: 徐艳军; 孙永樯; 李东; 张春龙
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-03-30

Abstract

The present disclosure relates to a method and an apparatus for constructing a severe disease knowledge base, a storage medium, and an electronic device, so as to construct a severe disease knowledge base having a high reference value in clinical diagnosis. The construction method comprises the following steps: acquiring relevant data of multiple types of severe diseases, wherein the relevant data comprises first structural data and non-structural data; inputting the non-structural data into a preprocessing model to obtain second structural data output by the preprocessing model, wherein the preprocessing model is obtained by training labeled non-structural sample data based on the severe disease; and constructing a storage structure based on the attribute items included in the first structural data and the second structural data, and filling the attribute values included in the first structural data and the second structural data into corresponding positions in the storage structure to obtain the severe disease knowledge base, wherein the structural data includes the mapping relationship between the attribute items and the attribute values.

Description

Method and device for constructing severe disease knowledge base, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of critical illness knowledge bases, and in particular, to a method and an apparatus for constructing a critical illness knowledge base, a storage medium, and an electronic device.

Background

The important role of severe medicine in the treatment of critically ill patients is widely accepted. Particularly in the last decade, critical medicine has rapidly developed and has become one of the most important clinical medicine specialties. The high-level technologies of multiple professions such as internal medicine, surgery, anesthesia, nursing, nutrition and the like are integrated, and the high-level technologies not only put higher requirements on medical care personnel engaged in the critical medicine, but also put higher requirements on an information system for assisting in the critical diagnosis and treatment, in particular to a knowledge base playing a role in assisting and supporting the information system. At present, although many knowledge bases for drugs and clinical medicine are available, no knowledge base specific to severe cases is available. Therefore, a method for constructing a severe disease knowledge base is required for the purpose of constructing a severe disease knowledge base.

Disclosure of Invention

The purpose of the present disclosure is to provide a method, an apparatus, a storage medium, and an electronic device for constructing a critical illness knowledge base, which has a high reference value in clinical diagnosis.

In order to achieve the above object, a first aspect of the present disclosure provides a method for constructing an intensive disease knowledge base, including:

acquiring relevant data of multiple types of severe diseases, wherein the relevant data comprises first structural data and non-structural data;

inputting the non-structural data into a preprocessing model to obtain second structural data output by the preprocessing model, wherein the preprocessing model is obtained by training labeled non-structural sample data based on the severe disease;

and constructing a storage structure based on the attribute items included in the first structural data and the second structural data, and filling the attribute values included in the first structural data and the second structural data into corresponding positions in the storage structure to obtain the severe disease knowledge base, wherein the structural data includes the mapping relationship between the attribute items and the attribute values.

Optionally, the preprocessing model includes a labeling module and an output module, and accordingly, the training process of the preprocessing model includes:

dividing non-structural sample data in a preset database into training sample data and verification sample data;

taking the training sample data as an input parameter, taking the marked result of the training sample data as an output parameter, training the marking module until the marking accuracy of the marking module on the verification sample data reaches a preset value, finishing the training, and obtaining the marking module;

carrying out sequence labeling on the non-structural sample data through a trained labeling module;

and determining a conversion parameter of the output module according to the sequence labeling result output by the labeling module, the predefined association relation and second structural data corresponding to the non-structural sample data, so that the second structural data can be obtained after the sequence labeling result and the predefined association relation pass through the conversion parameter.

Optionally, the performing, by the trained labeling module, sequence labeling on the non-structural sample data includes:

marking the critical medical knowledge type attribute items and the disease related type attribute items of the non-structural type sample data through the trained marking module;

wherein the critical medical knowledge class attribute items include at least one of: a clinical medical concept, medical terms belonging to the clinical medical concept, an association between medical terms under the same clinical medical concept, an association between medical terms under different clinical medical concepts;

the disease-related class attribute items include an item of an attribute related to a severe disease and/or an item of an attribute related to an individual patient.

Optionally, the building a storage structure based on the attribute items included in the first structural data and the second structural data includes:

constructing a data dictionary storage structure based on the critical medical knowledge class attribute items included in the first structural data and the second structural data;

and constructing a tree-shaped storage structure based on the disease-related class attribute items included in the first structural data and the second structural data.

Optionally, the constructing a tree storage structure based on the disease-related class attribute items included in the first structural data and the second structural data includes:

for each type of severe disease, a tree-shaped storage structure is constructed according to the disease-related class attribute items included in the first structural data and the second structural data of the type of severe disease.

Optionally, the disease-related class attribute items include an attribute item related to severe disease and an attribute item related to individual patients, and the populating attribute values included in the first structural data and the second structural data into corresponding positions in the storage structure to obtain the severe disease knowledge base includes:

for each type of severe disease, determining a high-risk factor of the severe disease according to attribute items related to individual patients and included in the first structural data and the second structural data of the type of severe disease, and filling the high-risk factor into a corresponding position in the tree-shaped storage structure to obtain a severe disease model of the type of severe disease;

the critical illness knowledge base is obtained according to the critical illness model of each type of critical illness.

Optionally, before the building a storage structure based on the attribute items included in the first structural data and the second structural data, the method further includes:

and updating the identifiers of the attribute items included in the first structural data and the second structural data according to a preset updating relationship so as to update different identifiers representing the same attribute item into the same identifier.

The second aspect of the present disclosure also provides an apparatus for constructing an intensive disease knowledge base, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring related data of multiple types of severe diseases, and the related data comprises first structural data and non-structural data;

the second acquisition module is used for inputting the non-structural data into a preprocessing model to obtain second structural data output by the preprocessing model, and the preprocessing model is obtained by training labeled non-structural sample data based on the severe disease;

and the generation module is used for constructing a storage structure based on the attribute items included in the first structural data and the second structural data, and filling the attribute values included in the first structural data and the second structural data into corresponding positions in the storage structure to obtain the severe disease knowledge base, wherein the structural data includes the mapping relation between the attribute items and the attribute values.

The third aspect of the present disclosure also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided by the first aspect of the present disclosure.

The fourth aspect of the present disclosure also provides an electronic device, including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of any of the methods provided by the first aspect of the disclosure.

Through the technical scheme, the non-structural data in the related data of multiple types of severe diseases can be converted into the second structural data through the preprocessing model, then the storage structure is constructed according to the attribute items included in the related data including the first structural data and the second structural data, and the attribute values included in the first structural data and the second structural data are filled in the corresponding positions in the storage structure, so that the purpose of constructing the severe disease knowledge base can be achieved. In addition, the established severe disease knowledge base comprises multiple types of severe diseases, provides rich models for scientific research and teaching of severe diseases, and has high reference value in clinical diagnosis.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a method of constructing a critical illness knowledge base according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of training a pre-processing model in accordance with an exemplary embodiment.

FIG. 3 is a diagram illustrating a clinical medical concept and medical terms subordinate to the clinical medical concept according to an exemplary embodiment.

Fig. 4 is a schematic diagram illustrating an intensive care model of the intensive care ARDS according to an exemplary embodiment.

Fig. 5 is a schematic diagram illustrating an apparatus for constructing an intensive care disease knowledge base according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In the related art, some software systems (e.g., doctor mai tong) have an intensive care knowledge base, but lack modeling for severe diseases, lack of information related to diseases, and few types of severe diseases, and cannot be directly used by other information systems. In order to solve the problems of lack of a severe disease knowledge base and lack of support for diagnosis, treatment, scientific research and teaching of severe diseases in the related art, the disclosure provides a method and a device for constructing a severe disease knowledge base, a storage medium and electronic equipment.

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

FIG. 1 is a flow chart illustrating a method of constructing a critical illness knowledge base according to an exemplary embodiment. As shown in fig. 1, the construction method may include the following steps.

In step 101, relevant data of multiple types of severe diseases is obtained, wherein the relevant data comprises first structural data and non-structural data.

In the present disclosure, various approaches may be utilized to collect relevant data for the various types of severe illness used to generate the severe illness knowledge base, including but not limited to: data collection from NMDIC (Neusoft Medical Data-sets for Intensive Care, Data collection from Intensive clinical Data of Medical institutions), Data collection from Medical literature, and capture from the network. Among them, NMDIC is a chinese data set developed by the eastern soft clinical medicine intensive research and development group, including data related to about 10,000 intensive care unit visits, which may include demographics, vital signs, laboratory examinations, medications, and the like. All data resources in the NMDIC are strictly processed to remove identity information.

It should be noted that the acquired data on the severe disease includes the first structural data and the non-structural data in any of the above-described ways. For example, data related to severe diseases is obtained from NMDIC, in which information on medical orders, i.e., non-structural data, is generally present.

In addition, different data types may be acquired by different acquisition routes. In order to be able to acquire relevant data of different types of critical illness, in one possible embodiment, the electronic device executing the method for constructing the critical illness knowledge base is provided with multiple types of data interfaces, for example, an interface for acquiring character type (String) data, an interface for acquiring Stream type (Stream) data, an interface for acquiring File type (File) data, an interface for acquiring JavaScript Object Notation (JavaScript Object Notation) data, an interface for acquiring binary type data, and the like. In another possible embodiment, the electronic device performing the method of constructing the critical illness knowledge base is provided with only an interface for acquiring the target type data. In this embodiment, first, a data type conversion technique of the related art is adopted to convert non-target type data into target type data, and then, the converted target type data is acquired through the interface for acquiring target type data. The target type data may be any of the above types of data. The present disclosure does not specifically limit the specific manner of obtaining the relevant data of multiple types of severe diseases.

After the relevant data of the severe disease is acquired, structural data and non-structural data in the relevant data can be further identified. For example, whether the data is structured data or unstructured data may be identified by identifying the format of the data. For example, if data related to a critical illness is acquired from a database of a medical institution, entity attributes such as a table name (or a data set name) and a field name can be identified by database data pattern extraction (for example, JSON-format data). Some non-numeric types of fields in the identified entity attributes are then identified as non-structured data, or some data of the source medical paper is identified as non-structured data in the future. The JSON data pattern extraction technology is mature, and for example, Google's gson, ali's fastjson and the like can be used for fast and accurate extraction, and the method is not particularly limited by the disclosure.

In one embodiment, the result of the recognition may be output to allow the user to determine whether the recognition of the structural data and the non-structural data is accurate, and if not, to receive user modifications to accurately recognize the structural data and the non-structural data, and so on.

Additionally, relationships may be identified by relationship identification of the database. And the uniqueness of the data is kept by utilizing the identified relationship so as to ensure that the subject of the data after subsequent processing is unchanged.

In the present disclosure, the relational identification of the database is obtained by reading records of tables, fields and types in the database, and different databases store different tables, and therefore, different positions of the records of the tables, fields and types. For example: the Mysql database, tables, fields, and types are recorded in INFORMATION _ schema.columns, and tables, fields, and types need to be read from nform _ schema.columns. For example, in the postgresql database, table information is recorded in PG _ CLASS, field information is recorded in PG _ ATTRIBUTE, field TYPE information is recorded in PG _ TYPE, and comment information is recorded in PG _ DESCRIPTION, the table, the field, and the TYPE need to be read from the corresponding locations. For example, an Oracle database, tables, fields, and types are recorded in USER _ TAB _ column, and the tables, fields, and types need to be read from the USER _ TAB _ column. For example, in the Sql Server database, information is recorded in syssobjects, field information is recorded in syscolmns, field type information is recorded in systypees, and comment information is recorded in syscomm, the table, the field, and the type are read from the corresponding locations. For example, if the DB2 database, table, field, and type are recorded in sysibm. And reading the table, the field and the type in the above mode, and identifying the relationship according to the read table, the field and the type.

After the structural data, the unstructured data and the relationship are identified in the above manner, additional uniqueness information can be maintained for the unstructured data, that is, the identified relationship is associated with the unstructured data, so as to ensure that the relationship with the original subject can be maintained after the unstructured data is subjected to subsequent processing. For example, to ensure that the examination report data for patient number 001 is retrieved for patient number 001 after subsequent processing.

In step 102, the unstructured data is input into a preprocessing model, which is trained based on labeled unstructured sample data of the severe disease, to obtain second structured data output by the preprocessing model.

In order to distinguish from data which is already structural in the related data, structural data included in the related data is referred to as first structural data, and structural data obtained by converting non-structural data is referred to as second structural data.

In the present disclosure, only structural data can construct the critical illness knowledge base, and in practical applications, the data related to the critical illness usually also includes non-structural data, so that the non-structural data needs to be converted into structural data. For example, unstructured data may be transformed through a pre-processing model. The preprocessing model is used to convert the non-structural data into the structural data, so that only the non-structural data needs to be input into the preprocessing model for conversion, and the specific training mode of the preprocessing model will be described in detail below.

In step 103, a storage structure is constructed based on the attribute items included in the first structural data and the second structural data, and the attribute values included in the first structural data and the second structural data are filled in corresponding positions in the storage structure, so as to obtain an intensive disease knowledge base, wherein the structural data includes the mapping relationship between the attribute items and the attribute values.

By adopting the technical scheme, the non-structural data in the related data of multiple types of severe diseases can be converted into the second structural data through the preprocessing model, then the storage structure is constructed according to the attribute items included in the first structural data and the second structural data included in the related data, and the attribute values included in the first structural data and the second structural data are filled in the corresponding positions in the storage structure, so that the purpose of constructing the severe disease knowledge base can be realized. In addition, the established severe disease knowledge base comprises multiple types of severe diseases, provides rich models for scientific research and teaching of severe diseases, and has high reference value in clinical diagnosis.

In order to better understand the method for constructing the severe disease knowledge base provided by the present disclosure, the method is described below as a complete example.

First, a training method of the preprocessing model will be described. Fig. 2 is a flowchart illustrating a training method of a preprocessing model according to an exemplary embodiment, where the preprocessing model may include a labeling module for performing sequence labeling on input unstructured data and an output module for converting the unstructured data into structured data according to a sequence labeling result. As shown in fig. 2, the training method may include the following steps.

In step 201, the non-structural sample data in the preset database is divided into training sample data and verification sample data. The training sample data is used for training the labeling module, and the verification sample data is used for verifying the accuracy of labeling of the labeling module.

For example, the annotation module can be trained and verified using unstructured sample data in the NMDIC. For example, non-structural sample data of 6000 patients out of 10,000 patients with severe disease in NMDIC is used as training sample data, and non-structural sample data of 4000 patients is used as verification sample data.

In step 202, training the labeling module by using training sample data as an input parameter and using the labeled result of the training sample data as an output parameter until the labeling accuracy of the labeling module on the verification sample data reaches a preset value, ending the training, and obtaining the labeling module.

In the present disclosure, the training sample data may be subjected to sequence labeling in advance. The sequence tagging refers to classifying an input sequence, is a common method in Natural Language Processing (NLP), and can be used for processing word segmentation, part of speech tagging, named entity recognition, relation extraction and the like of the input sequence. Wherein, in the present disclosure, each element in the sequence can be labeled as "B-X", "I-X", or "O" by employing the BIO labeling method. Wherein "B-X" indicates that the fragment in which the element is located belongs to X type and the element is at the beginning of the fragment, "I-X" indicates that the fragment in which the element is located belongs to X type and the element is in the middle position of the fragment, and "O" indicates that the fragment does not belong to any type. For example, if X is a label classification, and the unstructured sample data is "small and bright living in the city of shenyang", and the information of main interest in this text is a Person and a place, the labels of "Person" and "Location" may be established in advance, and the labeling result of the unstructured sample data is as follows:

xiaoming living in Shenyang city

B-P I-P O O O B-L I-L I-L

The labeling may be layered, and each layer represents one type of processing, for example, when the second layer performs part-of-speech processing, and performs second layer labeling on the non-structural sample data, X is represented as part-of-speech, a Noun Phrase (Noun Phrase, NP) and a Verb Phrase (Verb Phrase, VP), and the two-layer labeling result of the non-structural sample data is as follows:

xiaoming living in Shenyang city

B-P I-P O O O B-L I-L I-L

B-NP I-NP B-VP I-VP O B-NP I-NP I-NP

It should be noted that the number of layers labeled by the labeling module is not limited in the present disclosure.

The Bi-LSTM (Bi-directional Long Short-Term Memory neural network) is strong in sequence modeling, Long-distance context information can be captured, and the Bi-LSTM has the capability of fitting nonlinearity of the neural network. More consideration for the CRF (Conditional Random Field) is the linear weighted combination of local features of the whole sentence (scanning the whole unstructured data through a feature template), so in the present disclosure, the labeling module employs a joint model of Bi-directional Long Short-Term Memory neural network) + CRF (Conditional Random Field) to label the unstructured data.

In the disclosure, the labeling module is trained by using training sample data, wherein the labeling module can be trained by referring to a method for training a model in the related art, and the specific implementation process of the training is not described in the disclosure. In the training process, after each training sample data is used, the marking module is verified by using the verification sample data. Namely, the marking module is used for carrying out sequence marking on the verification sample data, and when the marking accuracy rate is determined to reach a preset value according to the marking result, the training is finished, and the marking module is obtained. And when the marking accuracy rate is determined not to reach the preset value, continuing training by using the training sample data until the marking accuracy rate reaches the preset value.

In step 203, sequence labeling is performed on the non-structural sample data through the trained labeling module.

In the present disclosure, when the labeling module is trained, the labeled result of the training sample data may include a label for an attribute item of an intensive medical knowledge class and/or a label for an attribute item of a disease-related class. Accordingly, the trained labeling module can perform sequence labeling on the critical medical knowledge type attribute items and/or disease related type attribute items of the non-structural sample data.

Therefore, in a possible manner, the performing sequence labeling on the unstructured sample data by the trained labeling module may further include: and marking the critical medical knowledge type attribute items and the disease related type attribute items of the non-structural sample data through the trained marking module.

Wherein the critical medical knowledge class attribute items include at least one of: clinical medical concepts, medical terms belonging to clinical medical concepts, associative relations between medical terms under the same clinical medical concept, associative relations between medical terms under different clinical medical concepts; the disease-related class attribute items include an attribute item related to a severe disease and/or an attribute item related to an individual patient.

The clinical medical concept can be an object which exists objectively, such as a breathing machine, or an abstract thing, such as a diagnosis and treatment idea.

Medical terms that depend on clinical medicine concepts refer to linguistic references to clinical medicine concepts, where multiple terms may exist simultaneously under a medical concept, each term having a unique identifier representation. There is one and only one first term under each concept, the first term being the term most commonly used clinically to express this clinical medical concept, and the other terms under this clinical medical concept being defined as terms that are available. Illustratively, FIG. 3 is a schematic diagram illustrating a clinical medical concept and medical terms subordinate to the clinical medical concept according to an exemplary embodiment. As shown in fig. 3, the identifier of the clinical medicine concept is "10123547", and the medical terms subordinate to the clinical medicine concept include "glucagon tumor", "α cell adenoma", "alpha cell adenoma", "glucagon tumor". Where "glucagon tumor" is the first term, other terms are referred to as useful terms.

The association relationship includes an inheritance relationship and an attribute relationship. Where an inheritance relationship is a hierarchical relationship, one clinical medical concept may be a subclass of another or more clinical medical concepts. For example, the disease "congenital plagiocephaly with pelvic tilt" is a subclass of the disease "congenital plagiocephaly of skull" and also of the disease "plagiocephaly". Attribute relationships are used to characterize clinical medical concepts. For example, the pathological process of the disease "congenital plagiocephaly with pelvic tilt" is the "pathological development process", the position of the generated anatomical structure is the "skull", and the morphological change is a "deformity".

In the labeling process, the association relationship is represented by a pre-established classification set, so that the association relationship does not need to be additionally identified in the labeling process. For example, non-structural data is "pneumonia and progression to sepsis" in the "past medical history," the "relationship" classification is Disease (D, Disease), expressed in terms of t (term), and the results are labeled:

pneumonia and development of sepsis

B-D I-D O O O O B-D I-D I-D

B-T I-T O O O O B-T I-T I-T

From this annotation, it can be seen that "pneumonia" and "sepsis" are both the next level of relationship to "disease", i.e., "pneumonia" and "sepsis" are subclasses of "disease".

For another example, the unstructured data is "small patch shadow visible to the left lung", and the labeling result is as follows, where the first layer is labeled as attribute relationship (position) label, P represents the detection position, the second layer is labeled as term label, the third layer is labeled as attribute relationship (examination result), R represents the examination result:

in addition, in practical applications, the attribute items of the critical care medical knowledge class also include term mapping, which refers to establishing the relationship between the clinical medical concept of one term or the code, the clinical medical concept or the term with the same or similar semantic meaning with another term. The term set can be linked to other terms by mapping. Each mapping may be represented by a unique identifier. It is worth mentioning that the term mapping is established manually, and the disclosure does not limit this specifically.

The severe disease related attribute items may include, but are not limited to: disease category, disease characteristics, pathogenic cause of disease, clinical manifestation of disease, examination of disease, diagnosis of disease, differential diagnosis of disease, and treatment of disease. The patient-individual related attribute items may include, but are not limited to: patient identification, height, weight, age, blood type, permanent location, type of illness, past medical history, eating habits, living habits, prognosis.

In step 204, a conversion parameter of the output module is determined according to the sequence labeling result output by the labeling module, the predefined association relationship and the second structural sample data corresponding to the non-structural sample data of the training sample, so that the second structural sample data can be obtained after the sequence labeling result and the predefined association relationship pass through the conversion parameter.

In the disclosure, the labeling module may be trained first, after the labeling training is finished, the labeling module after the training is finished is used to perform sequence labeling on the non-structural sample data, and then the output module is trained according to the sequence labeling result and the predefined association relationship.

For example, the sequence labeling result and the predefined association relationship may be used as input parameters, the second structure type data corresponding to the non-structure type sample data may be used as output parameters, and the output module may be trained to determine the conversion parameters of the output module. The conversion parameter is a parameter used for converting the sequence labeling result and the predefined association relationship into the second structure type data.

For example, the case where the labeling module labels the critical care medical knowledge attribute items of the non-structural sample data will be described. The non-structural data is "the small spot shadow visible in the left lung", the labeling result is as above, and the second structural data obtained by conversion is shown in table 1. The determined conversion parameters may include: a term identification field, an association relation category identification field, an association relation identification field, and a term of the term "left lung" is identified as "21104537", a term of the term "small spot shadow" is identified as "22016744", an attribute relation is identified as "11102312", the attribute relation is subdivided under the attribute relation, the attribute relation may be divided into a scan position relation and an examination result relation, the scan position relation is identified as "R0102186", and the examination result relation is identified as "R0102195". That is, if the predefined association is the scan position relationship, the corresponding conversion parameter is the attribute relationship identifier "11102312", the scan position relationship identifier "R0102186", and if the predefined association is the check result relationship, the corresponding conversion parameter is the attribute relationship identifier "11102312", and the check result relationship "R0102195".

TABLE 1

T_ID	CATEGORY_ID	RELATION_ID	TERM
				21104537	11102312	R0102186	Left lung
22016744	11102312	R0102195	Small spot shadow

By adopting the scheme, the purpose of training the labeling module and the output module in the preprocessing model can be realized, and the trained preprocessing model can have the function of converting non-structural data into second structural data.

After the non-structural data is converted into the second structural data according to the trained preprocessing model, the severe disease knowledge base is constructed based on the first structural data and the second structural data.

It is worth noting that in the present disclosure, a critical medicine knowledge system model and a critical disease model can be established separately.

First, a critical medicine knowledge system model will be explained. In the present disclosure, an intensive medical knowledge system model may be characterized with a structured data dictionary. Illustratively, the concrete way of establishing the critical medicine knowledge system model is as follows: first, a data dictionary storage structure is constructed based on the critical care medical knowledge type attribute items included in the first structural data and the second structural data, and then, attribute values corresponding to the critical care medical knowledge type attribute items included in the first structural data and the second structural data are filled in corresponding positions in the data dictionary storage structure.

For example, table 2 shows a structural data corresponding to the clinical medical concept.

TABLE 2

ID	CONCEPT
		10101001	Severe disease

Table 3 is a structural data corresponding to medical terms belonging to the clinical medical concept.

TABLE 3

T_ID	CONCEPT_ID	TERM
			10112315	10101001	ARDS

Table 4-1 shows a structure type data corresponding to the inheritance relationship. Table 4-2 shows second structural data corresponding to the non-structural data "small cell carcinoma belongs to neuroendocrine tumor" correlation. The inheritance relationship is represented using "parent" and "child" attributes. In tables 4-1 and 4-2, the designation "R0102113" characterizes an inheritance relationship, and the character "small cell carcinoma" is a subclass of the character "neuroendocrine tumor".

TABLE 4-1

R_ID	RELATION
		R0102113	Inheritance relationship

TABLE 4-2

T_ID	RELATION_ID	PARENT_ID	TERM
				19008764	R0102113	10101001	Neuroendocrine tumors
19008785	R0102113	19008764	Small cell carcinoma

Table 5-1 shows a structure type data corresponding to the attribute relationship. Table 5-2 shows a second structural data corresponding to the association relationship of the non-structural data "small patch shadow visible to the left lung". Wherein the attribute relationship is represented using a "composition" attribute. In tables 5-1 and 5-2, the identifier "R0102186" and the identifier "R0102195" both belong to an attribute relationship, where the identifier "R0102186" represents a scan position relationship and the identifier "R0102195" represents an inspection result relationship. That is, the character "left lung" is the scanning position of the marker "R0102186", and the character "small patch shadow" is the examination result of the marker "R0102195".

TABLE 5-1

R_ID	RELATION
		R0102186	Scanning position
R0102195	Examination results

TABLE 5-2

In the above table, the first row is an attribute item of the intensive care medical knowledge class, and the other rows except the first row are attribute values corresponding to the attribute items of the intensive care medical knowledge class.

Thus, according to the above manner, the critical medical knowledge system model can be constructed and stored in the critical illness knowledge base in the form of the structure of the data dictionary. It should be noted that, since the critical care medical knowledge system model includes the bed medical concept, the medical terms belonging to the clinical medical concept, the relationship between the medical terms under the same clinical medical concept, and the relationship between the medical terms under different clinical medical concepts, it is possible to determine that the different medical terms belong to the same disease for the same critical care even if they are expressed by different medical terms, so that when a certain type of critical care is analyzed, the disease-related class attributes of the different medical terms belonging to the clinical medical concept of the critical care can be analyzed to enrich the disease-related information.

Next, the severe disease model will be explained. In the present disclosure, a structural dendrogram can be used to characterize an acute disease model. Illustratively, the specific way to establish the severe disease model is as follows: firstly, a tree-shaped storage structure is constructed based on disease-related attribute items included in the first structural data and the second structural data, and then attribute values corresponding to the disease-related attribute items included in the first structural data and the second structural data are filled in corresponding positions in the storage structure to obtain an intensive disease knowledge base.

It should be noted that, in order to facilitate disease classification and subsequent search for relevant knowledge of a certain type of severe disease, in the present disclosure, for each type of severe disease, a tree-shaped storage structure may be constructed according to the disease-related attribute items included in the first structural data and the second structural data of the type of severe disease, and the attribute values corresponding to the disease-related attribute items included in the first structural data and the second structural data of the type of severe disease are filled into corresponding positions in the tree-shaped storage structure, so as to construct a severe disease model of the type of severe disease.

Illustratively, firstly, the non-structural data is converted into second structural data by using the predefined association relationship and the sequence labeling result of the disease-related class attribute items output by the labeling model. For example, the non-structural data is "lower right lung: small cell carcinoma, bronchial margin (-), metastatic carcinoma in lymph nodes (6/17). IHC TTF-1+ + +, Cg-A +, Syn + +, CD56+ + +, Ki-67+ + + 70%, ALK (V) + ". The labeling of the relationships such as "diseased position", "immunohistochemistry", etc. by the labeling module can be converted into the second structural data as shown in table 6:

TABLE 6

Properties	Results
		Type of pathology	Histological pathology
Pathological results	Neuroendocrine tumors, small cell carcinomas
		Tumor location	Lower right lung
Nerve infringement	Yin (kidney)
		Tumor residue	Yin (kidney)
Bronchial stump	Yin (kidney)
		Number of transfers	6
Number of samples	17
		TTF-1	+++
Cg-A	+
		Syn	++
CD56	+++
		Ki-67	+++70％
ALK(V)	+

And then, constructing a tree-shaped storage structure according to the second structure type data obtained by conversion.

Illustratively, a tree structure example is constructed according to preset disease-related class attribute items, then, the obtained second structure type data and the first structure type data of the severe disease are converted, and the tree structure example is filled layer by layer according to the second structure type data and the first structure type data. When each layer is filled, the attribute items in the second structure type data and the first structure type data may be compared with the attribute items in the tree structure example, the attribute items that do not exist in the second structure type data and the first structure type data are deleted from the tree structure example, the attribute items that exist in the second structure type data and/or the first structure type data but do not exist in the tree structure example are added to the layer, and finally the tree-shaped storage structure for the type of severe disease is obtained.

And finally, filling attribute values corresponding to the disease-related attribute items included in the first structural data and the second structural data into corresponding positions in the storage structure to obtain the severe disease model of the severe disease represented by the tree structure.

It should be noted that the second structured data is obtained by performing unstructured data transformation, and therefore, some attribute items may exist in the second structured data, which have no attribute value, that is, the attribute value is null. For example, the blood type data belongs to the structural data, that is, the blood type data is not processed by the preprocessing model, and therefore, the attribute value corresponding to the attribute item of the blood type information in the second structural data is empty. In this case, it is necessary to fill the attribute value corresponding to the attribute item of the blood type information in the first structural data of the patient into the position of the attribute item of the blood type information in the tree-shaped storage structure.

In addition, in practical applications, different medical institutions may call different related attribute items of the same disease, for example, medical institution a uses identifier 1 to represent blood type information, and medical institution B uses identifier 2 to represent blood type information. Therefore, before building the storage structure based on the attribute items included in the first structural data and the second structural data, the identification of the relevant attribute items needs to be updated.

Illustratively, according to a preset updating relationship, the attribute items included in the first structural data and the second structural data are updated, so that different identifiers representing the same attribute item are updated to be the same identifier. For example, the preset updating relationship may be to update the identifier 1and the identifier 2 representing the blood type information to the identifier 3. In this way, the identification of the same attribute item in the first structural type data and the second structural type data is made the same. When the storage structure is constructed, the same attribute item is prevented from being mistaken for two different attribute items.

In one embodiment, the disease-related attribute items include an item related to a severe disease and an item related to an individual patient, and first, a tree frame is created using the item related to a severe disease, each node of the tree frame is a name of the item related to a severe disease, and some of the items are further divided into a plurality of sub-categories. Then, the filling is performed based on the attribute value corresponding to the attribute item related to the serious disease. In addition, it can be populated with attribute items relating to individual patients. For example, if an attribute item related to a severe disease is a health degree, and a sub-category of the health degree is BMI (Body Mass Index), it is necessary to add BMI to a calculation formula of Body weight (kg)/height (m)²And make it intoThe attribute values of the formula are "calculated" for the child nodes of the BMI.

In another embodiment, the disease-related attribute items include an attribute item related to the severe disease and an attribute item related to the individual patient, and the populating the attribute values included in the first structural data and the second structural data into corresponding positions in the storage structure to obtain the severe disease knowledge base may further include:

for each type of severe disease, determining a high-risk factor of the severe disease according to attribute items related to individual patients and included in the first structural data and the second structural data of the type of severe disease, filling the high-risk factor into a corresponding position in the tree-shaped storage structure to obtain a severe disease model of the type of severe disease, and obtaining a severe disease knowledge base according to the severe disease model of each type of severe disease.

Exemplarily, for sepsis, a patient-individual-related attribute term for sepsis is counted. For example, the height, blood type, sex, age, treatment location, etc. of 500 sepsis patients are analyzed, extracted AND arranged to obtain a high risk group of sepsis, e.g., a group of men AND ages greater than 65 are more likely to suffer from sepsis, i.e., men AND ages greater than 65 are high risk factors of sepsis, AND thus, the high risk factors may be translated into the formula "person.gen 1AND person.age > 65" AND added to the corresponding location in the tree storage structure.

It should be noted that, when analyzing the attribute items related to the individual patients, the discrete data is usually analyzed, and therefore, before analyzing, it is determined whether a certain attribute item related to the individual patient is a continuous type, and if the certain attribute item is the continuous type, the certain attribute item is converted into the discrete type.

Illustratively, the attribute items related to the individual patient of the type of the severe disease are first acquired. Then, for each attribute item, it is determined whether the attribute item is discrete data. For example, if the attribute item is stored in the data dictionary, the attribute item is discrete data, otherwise, the attribute item is continuous data. For the attribute item of the discrete data, for example, the height, a wavecluster (a clustering algorithm based on a network) can be used for classification, and the classification result replaces the continuous classification value of the attribute item, which is equivalent to discretizing the continuous classification value first. Thereafter, all data are classified using a CPAR (a classification algorithm based on association rules) algorithm to determine high risk factors for severe disease.

For example, fig. 4 is a schematic diagram illustrating an severe disease model of the severe disease ARDS according to an exemplary embodiment. As shown in fig. 4, the severe disease model was characterized using a tree structure. In the tree structure shown in FIG. 4, each "+" node can be further expanded. In this case, a severe disease model as shown in fig. 4 can be generated for each type of severe disease. And constructing a severe disease knowledge base by using the severe disease model of each type of severe disease.

By adopting the scheme, the development process of the severe disease can be shown by utilizing the tree structure, and the high-risk factor of the severe disease can be determined according to the attribute items of the severe disease, which are related to the individual patient, so that the reference value in clinical diagnosis is further improved.

Based on the same inventive concept, the invention also provides a device for constructing the severe disease knowledge base. Fig. 5 is a schematic diagram illustrating an apparatus for constructing an intensive care disease knowledge base according to an exemplary embodiment. As shown in fig. 5, the critical illness knowledge base construction apparatus 500 may include:

a first obtaining module 501, configured to obtain related data of multiple types of severe diseases, where the related data includes first structural data and non-structural data;

a second obtaining module 502, configured to input the non-structural data into a preprocessing model, so as to obtain second structural data output by the preprocessing model, where the preprocessing model is obtained by training based on labeled non-structural sample data of the severe disease;

a creating module 503, configured to construct a storage structure based on the attribute items included in the first structural data and the second structural data, and fill the attribute values included in the first structural data and the second structural data in corresponding positions in the storage structure to obtain the severe disease knowledge base, where the structural data includes a mapping relationship between the attribute items and the attribute values.

Optionally, the preprocessing model includes a labeling module and an output module, and accordingly, the apparatus further includes:

the classification module is used for classifying the non-structural sample data in the preset database into training sample data and verification sample data;

the training module is used for taking the training sample data as an input parameter, taking the marked result of the training sample data as an output parameter, training the marking module until the marking accuracy of the marking module on the verification sample data reaches a preset value, finishing the training and obtaining the marking module;

the sequence marking module is used for carrying out sequence marking on the non-structural sample data through the trained marking module;

and the determining module is used for determining the conversion parameters of the output module according to the sequence labeling result output by the labeling module, the predefined association relation and the second structural data corresponding to the non-structural sample data, so that the second structural data can be obtained after the sequence labeling result and the predefined association relation pass through the conversion parameters.

Optionally, the sequence labeling module is configured to: marking the critical medical knowledge type attribute items and the disease related type attribute items of the non-structural type sample data through the trained marking module;

Optionally, the creating module 503 includes:

a first creating sub-module for constructing a data dictionary storage structure based on the critical medical knowledge class attribute items included in the first structural data and the second structural data;

and the second creating submodule is used for constructing a tree-shaped storage structure based on the attribute items of the disease-related classes included in the first structural data and the second structural data.

Optionally, the second creating sub-module is configured to: for each type of severe disease, a tree-shaped storage structure is constructed according to the disease-related class attribute items included in the first structural data and the second structural data of the type of severe disease.

Optionally, the disease-related class attribute items include an attribute item related to a severe disease and an attribute item related to an individual patient, and the creating module 503 includes:

a first determining sub-module, configured to determine, for each type of severe disease, a high-risk factor of the severe disease according to an attribute item related to an individual patient included in the first structural data and the second structural data of the type of severe disease, and fill the high-risk factor into a corresponding position in the tree-shaped storage structure to obtain a severe disease model of the type of severe disease;

and creating a submodule for obtaining the severe disease knowledge base according to the severe disease model of each type of severe disease.

Optionally, the apparatus further comprises:

and the updating module is used for updating the identifiers of the attribute items included in the first structural data and the second structural data according to a preset updating relationship so as to update different identifiers representing the same attribute item into the same identifier.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Based on the same inventive concept, the present disclosure also provides an electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of any of the methods provided by the present disclosure.

Illustratively, FIG. 6 is a block diagram illustrating one type of electronic device 6 according to an exemplary embodiment. As shown in fig. 6, the electronic device 600 may include: a processor 601 and a memory 602. The electronic device 600 may also include one or more of a multimedia component 603, an input/output (I/O) interface 604, and a communications component 605.

The processor 601 is configured to control the overall operation of the electronic device 600 to complete all or part of the steps in the method for constructing the severe disease knowledge base. The memory 602 is used to store various types of data to support operation at the electronic device 600, such as instructions for any application or method operating on the electronic device 600 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 602 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 603 may include a screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 602 or transmitted through the communication component 605. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 604 provides an interface between the processor 601 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 605 is used for wired or wireless communication between the electronic device 600 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 605 may therefore include: Wi-Fi module, Bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for executing the above-mentioned method for constructing the critical illness knowledge base.

In another exemplary embodiment, there is also provided a computer-readable storage medium including program instructions which, when executed by a processor, implement the steps of the method of constructing the severe disease knowledge base described above. For example, the computer readable storage medium may be the memory 602 comprising program instructions executable by the processor 601 of the electronic device 600 to perform the method for constructing the critical illness knowledge base described above.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned method of constructing an intensive disease knowledge base when executed by the programmable apparatus.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for constructing a severe disease knowledge base, comprising:

2. The method of claim 1, wherein the pre-processing model comprises a labeling module and an output module, and accordingly, the training process of the pre-processing model comprises:

3. The method according to claim 2, wherein the performing sequence labeling on the unstructured sample data by the trained labeling module comprises:

4. A method according to any one of claims 1-3, wherein said building a storage structure based on attribute items included in said first structural data and said second structural data comprises:

5. The method according to claim 4, wherein constructing a tree-like storage structure based on the disease-related class attribute items included in the first structural data and the second structural data comprises:

6. The method according to claim 5, wherein the disease-related class attribute items include an item of an attribute related to a severe disease and an item of an attribute related to an individual patient, and wherein the populating the attribute values included in the first structural type data and the second structural type data to corresponding locations in the storage structure to obtain the severe disease knowledge base includes:

7. The method of claim 1, wherein prior to said building a storage structure based on attribute items included in said first structural data and said second structural data, said method further comprises:

8. An apparatus for constructing a critical illness knowledge base, comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.