CN116469570A - Malignant tumor complication analysis method based on electronic medical record - Google Patents

Malignant tumor complication analysis method based on electronic medical record Download PDF

Info

Publication number
CN116469570A
CN116469570A CN202310370759.8A CN202310370759A CN116469570A CN 116469570 A CN116469570 A CN 116469570A CN 202310370759 A CN202310370759 A CN 202310370759A CN 116469570 A CN116469570 A CN 116469570A
Authority
CN
China
Prior art keywords
disease
complication
malignant tumor
data
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310370759.8A
Other languages
Chinese (zh)
Inventor
王黎明
覃桂敏
史彦芊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202310370759.8A priority Critical patent/CN116469570A/en
Publication of CN116469570A publication Critical patent/CN116469570A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/22Social work or social welfare, e.g. community support activities or counselling services
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Child & Adolescent Psychology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the application relates to the technical field of electronic medical record data mining, in particular to a malignant tumor complication analysis method based on electronic medical records, which comprises the following steps: preprocessing the malignant tumor electronic medical record data to obtain processed medical record data; constructing a complication network model based on the processed medical record data; sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality; based on a community detection algorithm, analyzing complication modes related to malignant tumors in different communities, mining the complication modes of malignant tumors of different types, and obtaining analysis results; and the analysis result is visualized. According to the method, the information in the electronic medical record is mined, the analysis method of the complex network is used for analyzing the complications of the malignant tumor, the working efficiency of doctors can be improved, the method for researching the complications is enriched, and the application value of medical information data is improved.

Description

Malignant tumor complication analysis method based on electronic medical record
Technical Field
The embodiment of the application relates to the technical field of electronic medical record data mining, in particular to a malignant tumor complication analysis method based on electronic medical records.
Background
Malignant tumor is a heterogeneous disease, which is a disease that occurs outside normal tissue and organs and has significant influence on the functions and structures, and is a disease that seriously endangers human health. Malignant tumors can occur not only in different organs but also in multiple parts of the body, so that serious complications occur to different patients, and research shows that the development speed of the malignant tumors is obviously faster than that of other tumors.
Research on malignant tumor complications is widely carried out at home and abroad, and the research direction comprises prevention, diagnosis and treatment of malignant tumor complications. Numerous scholars have studied the complications of cancer treatment and have formulated various strategies to alleviate these complications. However, since malignant tumor is a complex disease, and the pattern of complications is described to some extent depending on the experience judgment of doctors and the method of psychometrics, there is no effective measure, and thus research on malignant tumor complications still requires more effort.
In clinical diagnosis of doctors, a large amount of diagnosis and treatment data and medical records data have been generated, but these data have not been fully utilized and utilized. The data of these electronic medical records contains many sophisticated physician experiences and contains more comprehensive malignancy types. Therefore, how to utilize the abundant electronic medical record data to mine the complication modes of malignant tumors, analyze the complication modes of malignant tumors under different details and visually display the complication modes by using a complex network method becomes an important field worthy of research.
Disclosure of Invention
The embodiment of the application provides a malignant tumor complication analysis method based on an electronic medical record, which is used for solving the problems of insufficient utilization of the electronic medical record, unclear malignant tumor complication mode and single malignant tumor complication research method in the existing research.
In order to solve the technical problems, an embodiment of the present application provides a method for analyzing malignant tumor complications based on electronic medical records, including the following steps: firstly, preprocessing malignant tumor electronic medical record data to obtain processed medical record data; next, constructing a complication network model based on the processed medical record data; then, sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality; finally, based on a community detection algorithm, analyzing the complication modes related to malignant tumors in different communities, mining the complication modes of malignant tumors of different types, and obtaining an analysis result; and carrying out visualization processing on the analysis result.
In some exemplary embodiments, preprocessing electronic medical record data of malignant tumors to obtain processed medical record data comprises: acquiring malignant tumor electronic medical record data, and performing desensitization treatment on the data to extract malignant tumor patient data; according to the attribute characteristics, matching the data of the control group one by one from the data of patients not suffering from malignant tumor according to the data of the malignant tumor patients to obtain the data of the diseased crowd of the control group; preprocessing the data of the malignant tumor patient and the data of the control group diseased crowd respectively, supplementing the missing value in the data, removing redundant data, and normalizing ICD codes in the data; and selecting attribute features for complication research in the electronic medical record, marking and extracting required features to obtain processed medical record data.
In some exemplary embodiments, desensitizing the data includes: deleting the example with quality problem in the data and removing the sensitive information of the patient; attribute characteristics include gender, age, year of admission, season of admission, and disease diagnosis.
In some exemplary embodiments, constructing a complication network model based on the processed medical record data includes: extracting disease nodes and association relations among the disease nodes based on the processed medical record data; the disease node is a malignant tumor complication; and constructing a complication network model based on the disease nodes and the association relation between the disease nodes.
In some exemplary embodiments, extracting disease nodes and associations between disease nodes based on processed medical record data includes: extracting all ICD codes of diseases from the processed medical record data, wherein the attribute characteristics of the ICD codes are disease diagnosis data; calculating the relation among all disease nodes; the disease node is represented by ICD code; calculating the occurrence frequency of each disease node; filtering disease nodes with occurrence times smaller than a threshold value; calculating the number of times each disease pair co-occurs; defining an associated intensity index for measuring the disease pair; and calculating the association strength between different disease pairs based on the association strength index, and taking the association strength as a weight of an edge for constructing a complication network.
In some exemplary embodiments, the correlation strength index for a disease pair is shown as follows:
wherein C is ij Representing the number of co-occurrence of each disease pair; ij represents a disease pair; i. j represents a disease i, a disease j, respectively; p (P) i Representing the number of occurrences of disease i; p (P) j Indicating the number of occurrences of disease j.
In some exemplary embodiments, after deriving the strength of association between different pairs of diseases, further comprising: obtaining association relations of different disease nodes according to association strength between different disease pairs; abstracting the extracted diseases as nodes of the complication network, abstracting the association relation of different disease nodes as continuous edges of the complication network, and constructing a complication network model.
In some exemplary embodiments, topology analysis, node centrality analysis, and measurement algorithm based on degree and betweenness centrality are sequentially performed on the complication network model to extract the relationshipA bond disorder comprising: based on the complication network model, using an adjacency matrix A= [ a ] ij ]∈R n×n Representing a network; calculating the degree and the median centrality of each disease node; calculating the average degree and average median centrality of all disease nodes in the network; calculating a neighbor node set of each disease node; respectively defining the association strength between different disease nodes and the importance index of the disease node; and sequencing the disease nodes according to the importance index values of the disease nodes, obtaining the maximum value of the importance index of the disease node, and taking the disease node corresponding to the maximum value as a key disease.
In some exemplary embodiments, based on a community detection algorithm, analyzing complication patterns related to malignant tumors in different communities, mining complication patterns of malignant tumors of different types, and obtaining analysis results, including: embedding vector representations of nodes in the learning complication network using the graph based on modularity; determining a community quantity range based on the modularity; determining the optimal community quantity by using a Gap static algorithm; dividing all nodes into different communities based on the optimal community number by using a KMeans++ algorithm to obtain community division of the nodes; the node classification is mapped back to the complication network to obtain a community detection result of the complication network; based on the community detection result, analyzing the complication modes related to malignant tumors of different communities to obtain an analysis result.
In some exemplary embodiments, visualizing the analysis results includes: and visualizing the detection results of the complication network and the critical disease community respectively.
The technical scheme provided by the embodiment of the application has at least the following advantages:
the embodiment of the application provides a malignant tumor complication analysis method based on an electronic medical record, which comprises the following steps: firstly, preprocessing malignant tumor electronic medical record data to obtain processed medical record data; next, constructing a complication network model based on the processed medical record data; then, sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality; finally, based on a community detection algorithm, analyzing the complication modes related to malignant tumors in different communities, mining the complication modes of malignant tumors of different types, and obtaining an analysis result; and carrying out visualization processing on the analysis result.
According to the malignant tumor complication analysis method based on the electronic medical record, the complex network analysis method is used for analyzing malignant tumor complications by mining information in the electronic medical record, so that the working efficiency of doctors can be improved, the complication research method is enriched, and the application value of medical information data is improved. On one hand, the method extracts complications related to malignant tumor and the relation between the complications from the electronic medical record, and abstracts the complications into a complex network model; the complex network analysis method is used for explaining the complication mode, so that the time and resources for researching the complications are saved to a great extent; the method for mining the key nodes of the complex network is proposed to be used for mining key diseases in complications, so that the complications modes of malignant tumors are clearer. On the other hand, the community detection algorithm based on graph embedding of the combined module degree can divide a complication network according to the type of malignant tumor, so that the complication mode can be analyzed more easily; and mining malignant tumor complication modes by using a community detection algorithm, and supplementing the blank of the complication mode research of some types of malignant tumors. In addition, the method and the device for detecting the network of the community have the advantages that the visualization technology is used for visualizing the network of the complications and the detection result of the community, the node colors, the shapes and the thicknesses of the edges are used for representing rich information in the network, the readability of the detection result is higher, and the working efficiency is improved.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, which are not to be construed as limiting the embodiments unless specifically indicated otherwise.
Fig. 1 is a flow chart of a method for analyzing malignant tumor complications based on an electronic medical record according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for analyzing malignant tumor complications based on an electronic medical record according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of preprocessing electronic medical record data of malignant tumor according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of constructing a network model of complications according to an embodiment of the present application;
FIG. 5 is a flow chart of a key node extraction algorithm of a complication network according to an embodiment of the present disclosure;
fig. 6 is a schematic flow chart of a complication network community detection according to an embodiment of the present application.
Detailed Description
As known from the background art, in the existing research, there are technical problems that the electronic medical record is not fully utilized, the malignant tumor complication mode is not clear, and the malignant tumor complication research method is single.
In clinical diagnosis of doctors, a large amount of diagnosis and treatment data and medical records data have been generated, but these data have not been fully utilized and utilized. The data of these electronic medical records contains many sophisticated physician experiences and contains more comprehensive malignancy types. How to utilize the abundant electronic medical record data to mine the complication modes of malignant tumors, analyze the complication modes of malignant tumors under different details and visually display the complication modes by using a complex network method becomes an important field worthy of research. At present, doctors mainly depend on own expertise and experience accumulation in the clinical diagnosis and treatment process, along with the rapid development of medical knowledge and the higher and higher complexity of diagnosis and treatment of related diseases, the speed of the doctors needing the expertise is far from the growth speed of the medical knowledge, and the complexity of the medical science and subjectivity of the doctors in the self diagnosis and treatment process lead to great challenges of the current doctors in the clinical diagnosis and treatment process.
With the development of social science and technology, traditional medical record recording modes have been replaced by electronic medical records. At present, electronic medical records represent a rapidly developing situation in informationized hospitals. There have been a great deal of research based on electronic medical records at home and abroad, including clinical auxiliary diagnosis, electronic medical record text mining, medicine record differences, etc.
The current definition of complications is: there are multiple diseases on the same patient, which are independent. Complications are not simple arithmetic additions of the disease. Research based on complications is currently focused mainly on the fields of Alzheimer's disease, cardiovascular diseases, depression, diabetes and the like. Most of the methods for researching complications are based on methods such as statistics, and network construction and analysis methods are different.
In recent years, due to the development of artificial intelligence, some students at home and abroad use related means to assist in diagnosing malignant tumors, and the application of the artificial intelligence technology in diagnosing malignant tumors is discussed. Still other scholars have early diagnosis of malignancy based on biomarkers and imaging. At present, the problems of insufficient utilization of medical electronic medical record data, incomplete malignant tumor complication modes, limited complication analysis methods and difficult visualization display of complications based on complex network analysis are solved.
In order to solve the technical problems, an embodiment of the present application provides a method for analyzing malignant tumor complications based on electronic medical records, including the following steps: firstly, preprocessing malignant tumor electronic medical record data to obtain processed medical record data; next, constructing a complication network model based on the processed medical record data; then, sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality; finally, based on a community detection algorithm, analyzing the complication modes related to malignant tumors in different communities, mining the complication modes of malignant tumors of different types, and obtaining an analysis result; and carrying out visualization processing on the analysis result. According to the malignant tumor complication analysis method based on the electronic medical record, the complex network analysis method is used for analyzing malignant tumor complications by mining information in the electronic medical record, so that the working efficiency of doctors can be improved, the complication research method is enriched, and the application value of medical information data is improved.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, as will be appreciated by those of ordinary skill in the art, in the various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
Referring to fig. 1, an embodiment of the present application provides a method for analyzing malignant tumor complications based on electronic medical records, including the following steps:
step S1, preprocessing the electronic medical record data of the malignant tumor to obtain processed medical record data.
And S2, constructing a complications network model based on the processed medical record data.
And S3, sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality.
S4, analyzing complication modes related to malignant tumors in different communities based on a community detection algorithm, mining the complication modes of different types of malignant tumors, and obtaining an analysis result; and the analysis result is visualized.
Aiming at the research of malignant tumor complications, the application provides a malignant tumor complications analysis method based on electronic medical record driving based on clinical experiments and genomics research, and the malignant tumor complications mode is mined, so that the existing research is supplemented. As shown in fig. 2, first, electronic medical record data of a malignant tumor patient for analysis is constructed by using a data preprocessing model and an algorithm, discretization and normalization of data missing values are performed, and a data warehouse is constructed; next, a statistical model is used for extracting and evaluating the complications related to the malignant tumor and the relation between the complications, and a malignant tumor complication network model is constructed; then, carrying out topology analysis on the complication network model, carrying out node centrality analysis, and extracting key diseases by combining centrality; finally, the complication network model is visualized based on machine learning, graph embedding and community detection algorithm mining of the complication modes.
According to the method, a complex network and machine learning method and the like are combined by fully utilizing clinical data and electronic medical record data, so that a malignant tumor complication mode visualization system is provided; the method realizes the identification of the complications modes of malignant tumors of different categories and the analysis of the complications modes of malignant tumors of different ages through the electronic medical record, and assists doctors in diagnosis and treatment judgment by using the form visualization result of the analysis result as a graph to jointly promote medical progress.
In some embodiments, in step S1, preprocessing the electronic medical record data of the malignant tumor to obtain processed medical record data, including:
and step S101, acquiring malignant tumor electronic medical record data, and performing desensitization treatment on the data to extract malignant tumor patient data.
And step S102, matching the data of the control group one by one according to the data of the malignant tumor patients from the data of the non-malignant tumor patients according to the attribute characteristics, and obtaining the data of the diseased crowd of the control group.
Step S103, preprocessing the data of the malignant tumor patient and the data of the control group diseased crowd respectively, supplementing the missing value in the data, removing redundant data, and normalizing ICD codes in the data.
And step S104, selecting attribute features for complication research in the electronic medical record, marking and extracting required features to obtain processed medical record data.
Specifically, step S1 is mainly a process of extracting and preprocessing malignancy-related data, and selecting and extracting feature attributes related to complication study. Specifically, ICD codes are disease codes, and ICD classification is based on four main causes of disease, namely etiology, location, pathology and clinical manifestations including symptoms signs, stage, typing, gender, age, time of acute and chronic onset, etc. Each feature constitutes a classification criterion forming a classification axis.
In some embodiments, the desensitizing the data in step S101 includes: deleting the example with quality problem in the data and removing the sensitive information of the patient; the attribute features in step S102 include sex, age, year of admission, season of admission, and disease diagnosis.
The extraction and pretreatment of the malignancy-related data and the selection and extraction process of the related characteristic attributes are shown in fig. 3, and mainly comprise the following steps: acquiring all malignant tumor electronic medical record data, desensitizing the data, deleting the example with quality problem in the data, removing sensitive information of patients, and extracting the data of patients suffering from malignant tumors; next, the malignant tumor patient data are matched one by one through age, sex, year of admission, season of admission and the like to obtain a control group of diseased people; preprocessing sample data of patients suffering from malignant tumors and patients suffering from control groups, supplementing missing values in the data, removing redundant data, and carrying out standardization processing on ICD codes; then, the attribute features of the top page of the medical records of each sample, such as gender, age, year of admission, season of admission, disease diagnosis, etc., are acquired and selected.
In some embodiments, constructing a complication network model based on the processed medical record data in step S2 includes:
step S201, extracting disease nodes and association relations among the disease nodes based on the processed medical record data; the disease node is a malignant tumor complication.
And step S202, constructing a complication network model based on the disease nodes and the association relation among the disease nodes.
Specifically, step S2 is mainly a process of extracting network complications and association relations of malignant tumor complications and constructing a model.
FIG. 4 is a schematic flow chart of constructing a network model of complications according to an embodiment of the present application, referring to FIG. 4, first, extracting ICD codes for diagnosing diseases from the preprocessed electronic medical records; then, calculating the relation between ICD codes; next, the number of occurrences P of each ICD code is calculated i The method comprises the steps of carrying out a first treatment on the surface of the Then, filter P i ICD codes of equal to or more than 10; counting the number of co-occurrence of each disease C ij . Then, calculate the disease pair [ ]ICD coding pair) associated intensity index SCI ij The method comprises the steps of carrying out a first treatment on the surface of the Finally, the ICD encoding pair is abstracted into a complex network model.
In some embodiments, extracting the association between the disease node and the disease node in step S201 based on the processed medical record data includes:
and step 2011, extracting all ICD codes of the diseases from the processed medical record data, wherein the attribute characteristics of the ICD codes are disease diagnosis data.
Step S2012, calculating the relation among all disease nodes; the disease node is represented in ICD code.
Step S2013, calculating the occurrence number P of each disease node i
And step S2014, filtering the disease nodes with occurrence times smaller than the threshold value.
Wherein the threshold may be set to 10. To exclude sporadic factors, combining medical knowledge, P will be i Disease codes < 10 are filtered out, leaving only codes that occur more than 10.
Step S2015, count Cij of co-occurrence number of each disease pair. If both diseases i and j occur in the same patient, the number of occurrences of the disease pair increases by 1.
Step S2016, defining a correlation strength index SCI for measuring the disease pair ij
Step S2017, based on the correlation strength index SCI ij And calculating the association strength between different disease pairs as the weight of the edge for constructing the complication network.
In some embodiments, the correlation strength index SCI of the disease pair in step S2016 ij The following formula is shown:
wherein C is ij Representing the number of co-occurrence of each disease pair; ij represents a disease pair; i. j represents a disease i, a disease j, respectively; p (P) i Representing the number of occurrences of disease i; p (P) j Indicating the number of occurrences of disease j.
In some embodiments, after deriving the strength of association between different pairs of diseases, further comprising: obtaining association relations of different disease nodes according to association strength between different disease pairs; abstracting the extracted diseases as nodes of the complication network, abstracting the association relation of different disease nodes as continuous edges of the complication network, and constructing a complication network model.
Specifically, after the association strength between different disease pairs is obtained, the extracted and calculated results are abstracted into an undirected weighted complication network according to the association relationship between the extracted disease nodes and the different disease nodes.
Step S3 is mainly a process of topology analysis of the complication network, extracting key diseases in the complication network based on network centrality. The extraction process of the malignant tumor key diseases is shown in fig. 5, and the extraction method is measured by using a measurement algorithm SDB based on degree and medium centrality. The algorithm represents both local and global features, and the importance values of the nodes are calculated and sequenced by integrating degree and betweenness centrality through the connection strength between the nodes.
In some embodiments, in step S3, topology analysis, node centrality analysis, and measurement algorithm based on degree and betweenness centrality are sequentially performed on the complication network model, and the extracting key diseases includes:
step S301, based on the complication network model, using an adjacency matrix A= [ a ] ij ]∈R n×n Representing the network G.
If node i is connected to node j, then a ij =1; otherwise, a ij =0。
Step S302, calculating the degree k of each disease node i And median centrality b i . Wherein, the calculation formula is respectively:
wherein g st Is the shortest path from node s to node t,is the number of shortest paths through node i between node s and node t.
Step S303, calculating the average degree k of all disease nodes in the network G avg And mean median centrality b avg
Step S304, calculating a neighbor node set N (i) of each disease node.
Step S305, respectively defining the association strength S between different disease nodes ij Importance index SDB of disease node i
Wherein, the calculation formula is as follows:
and step S306, sorting the disease nodes according to the importance index values of the disease nodes, obtaining the maximum value of the importance index of the disease node, and taking the disease node corresponding to the maximum value as a key disease.
Specifically, the nodes are ordered in the order of the value of SDB from large to small, and the first 10% of nodes are extracted as key nodes and analyzed.
In some embodiments, in step S4, based on the community detection algorithm, the complication patterns related to the malignant tumor in different communities are analyzed, and the complication patterns of different types of malignant tumors are mined to obtain an analysis result, which includes:
step S401, based on modularity, embedding vector representation of nodes in the learning complication network by using the graph.
And step S402, determining the community quantity range based on the modularity.
And S403, determining the optimal community quantity by using a Gap static algorithm.
And step S404, dividing all the nodes into different communities based on the optimal community number by using a KMeans++ algorithm to obtain community division of the nodes.
Step S405, node classification is mapped back to the complication network to obtain a community detection result of the complication network; based on the community detection result, analyzing the complication modes related to malignant tumors of different communities to obtain an analysis result.
To analyze the pattern of malignant complications, we use a community detection algorithm to community the malignant complications network, creating communities of different types of malignant clusters. Complications patterns of different types of malignant tumors are mined by analyzing complications related to malignant tumors in different communities and compared with clinical studies and experiments. The analytical flow chart of this module is shown in fig. 6.
First learn a vector representation of nodes in a complication network using deep walk, the algorithm of deep walk has the steps of:
(1) Inputting a graph G (V, E), a window size w, an output dimension d, a path number Y starting with each node, and a path length t;
(2) Randomly initializing a representation matrix Φ e R |V| *d;
(3) Randomly arranging the vertexes, o=shuffle (V);
for each node v i E, O, firstly sampling the junction point by utilizing a random walk algorithm to obtainThen performing skip-gram training along with the sampling data to obtain vector representation of the nodes;
specifically, the specific steps of sampling each node by using the RanomWALK algorithm are as follows:
(1) Let f (x)) Is a multiple function with n variables, x= (x) 1 ,x 2 ,…,x n ) For an n-dimensional vector:
(2) Giving an initial iteration point x, a primary walking step length lambda and a control precision epsilon (used for ending a control algorithm);
given the iteration control number N, k is the current iteration number, k is set to k=1;
(3) When k < N, an N-dimensional vector u= (u) between (-1, 1) is randomly generated 1 ,u 2 ,…,u n ) The method comprises the steps of carrying out a first treatment on the surface of the And standardizes it to obtainLet x 1 =x+λu', the first step walk is completed;
(4) Calculate the function value if f (x i ) < f (x), i.e. a point is found which is greater than the initial value number, then k is reset to 1, x 1 Changing to x, returning to the step (2), otherwise, letting k=k+1, returning to the step (3);
(5) If no better value is found N consecutive times, the optimal solution is considered to be within an N-dimensional circle centered on the current optimal solution and the current step size is the radius. At this time, if λ < ∈, the algorithm is ended; otherwise, letReturning to step 1, a new round of wandering is started.
Specifically, the specific process of skip-gram training on the sampled data is as follows:
for each random walk, each point is selected to select a nearby point (i-w, i+w) composition (v i ,u k ) As input to the skip-gram, the input is selected according to (v i ,u k ) The frequency setpoint that occurs indicates embedding;
given v i Is embedded, maximizes the probability of other neighbors occurring in walk, and has the objective function:
based on the node vector obtained in the above steps, the value range [ Mink, maxK ] of the community number K is analyzed in combination with the modularity.
And obtaining an optimal K value according to the K value range by using a Gap static algorithm: optimalk.
Each sample is distributed according to the obtained optimal K value by using a KMeans++ algorithm, all nodes are divided into different communities, and the method specifically comprises the following steps:
(1) First randomly selecting a first cluster center mu from all samples 1
(2) Recording the distances D (x) from all samples to the nearest cluster center;
(3) The probability xi calculation method for selecting all the non-clustering center sample points as the next clustering center is as follows:
(4) Repeating the step (2) and the step (3) until a plurality of cluster centers are selected.
Then, the node classification is mapped back to the complication network to obtain a community detection result of the complication network; and finally, mining and analyzing different types of malignant tumor complication modes according to community detection results.
In some embodiments, the visualizing the analysis result in step S4 includes: and respectively visualizing the complication network, the key diseases and the community detection results.
The application provides a malignant tumor complication analysis method based on electronic medical records, which is used for mining the complication mode of malignant tumor and supplementing the existing research. The method comprises the following steps: first, data are preprocessed: the method comprises data cleaning, association of medical records coding and characteristic data, redundancy removal, missing value supplement and ICD coding standardization; then, extracting and evaluating complications and association relations, and constructing a complications network model; then, extracting key diseases based on the centrality of the network nodes; next, mining a complication pattern of the malignancy using a community detection algorithm; finally, the complication pattern and complication network outcome are visualized.
On one hand, the application provides a complication network constructed based on diagnosis ICD codes of electronic medical record data, diseases are abstracted into nodes of a complex network, relations among the diseases are abstracted into edges of the complex network, and an undirected weighted complication network is constructed. On the other hand, the present application uses the algorithm SDB based on degree and betting center in combination of local and global features for extracting key diseases in the complication network. The algorithm integrates degree and median centrality through the connection strength between the nodes, so that the importance of the nodes is calculated and key diseases in the network are obtained. Furthermore, the present application proposes graph-embedded based community detection algorithms that incorporate modularity and uses the algorithms to analyze the pattern of complications for different types of malignancy. Different types of malignant tumors are divided through a community detection algorithm, so that the complicating disease modes of the different types of malignant tumors are clearer.
By the technical scheme, the embodiment of the application provides a malignant tumor complication analysis method based on electronic medical records, which comprises the following steps: firstly, preprocessing malignant tumor electronic medical record data to obtain processed medical record data; next, constructing a complication network model based on the processed medical record data; then, sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality; finally, based on a community detection algorithm, analyzing the complication modes related to malignant tumors in different communities, mining the complication modes of malignant tumors of different types, and obtaining an analysis result; and carrying out visualization processing on the analysis result.
The malignant tumor complication analysis method based on the electronic medical record is used for solving the problems that the medical electronic medical record is insufficient in data utilization, incomplete in malignant tumor complication mode, limited in complication analysis method, difficult in visual display based on complex network analysis complications and the like. According to the method, the information in the electronic medical record is mined, the analysis method of the complex network is used for analyzing the complications of the malignant tumor, the working efficiency of doctors can be improved, the method for researching the complications is enriched, and the application value of medical information data is improved.
Firstly, extracting complications related to malignant tumor and the relation between the complications from an electronic medical record, and abstracting the complications into a complex network model; the complex network analysis method is used for explaining the complication mode, so that the time and resources for researching the complications are saved to a great extent; the method for mining the key nodes of the complex network is proposed to be used for mining key diseases in complications, so that the complications modes of malignant tumors are clearer.
Secondly, the community detection algorithm based on graph embedding of the combined module degree can divide a complication network according to the type of malignant tumor, so that the complication mode can be analyzed more easily; and mining malignant tumor complication modes by using a community detection algorithm, and supplementing the blank of the complication mode research of some types of malignant tumors. In addition, the method and the device for detecting the network of the community have the advantages that the visualization technology is used for visualizing the network of the complications and the detection result of the community, the node colors, the shapes and the thicknesses of the edges are used for representing rich information in the network, the readability of the detection result is higher, and the working efficiency is improved.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of implementing the present application and that various changes in form and details may be made therein without departing from the spirit and scope of the present application. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention shall be defined by the appended claims.

Claims (10)

1. An electronic medical record-based malignant tumor complication analysis method is characterized by comprising the following steps:
preprocessing the malignant tumor electronic medical record data to obtain processed medical record data;
constructing a complication network model based on the processed medical record data;
sequentially carrying out topology analysis and node centrality analysis on the complication network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality;
based on a community detection algorithm, analyzing complication modes related to malignant tumors in different communities, mining the complication modes of malignant tumors of different types, and obtaining analysis results; and carrying out visualization processing on the analysis result.
2. The method for analyzing malignant tumor complications based on electronic medical records according to claim 1, wherein the preprocessing of malignant tumor electronic medical record data to obtain processed medical record data comprises:
acquiring malignant tumor electronic medical record data, and performing desensitization treatment on the data to extract malignant tumor patient data;
according to the attribute characteristics, matching the data of the control group one by one from the data of patients not suffering from malignant tumor according to the data of the malignant tumor patients to obtain the data of the diseased crowd of the control group;
preprocessing the data of the malignant tumor patient and the data of the control group diseased crowd respectively, supplementing the missing value in the data, removing redundant data, and normalizing ICD codes in the data;
and selecting attribute features for complication research in the electronic medical record, marking and extracting required features to obtain processed medical record data.
3. The method for analyzing malignant tumor complications based on electronic medical records according to claim 2, wherein the desensitizing the data comprises: deleting the example with quality problem in the data and removing the sensitive information of the patient;
the attribute features include gender, age, year of admission, season of admission, and disease diagnosis.
4. The method for analyzing malignant tumor complications based on electronic medical records according to claim 1, wherein the constructing a complications network model based on the processed medical record data comprises:
extracting disease nodes and association relations among the disease nodes based on the processed medical record data; the disease node is a malignant tumor complication;
and constructing a complication network model based on the disease nodes and the association relation between the disease nodes.
5. The method for analyzing malignant tumor complications based on electronic medical records according to claim 4, wherein extracting the association between the disease node and the disease node based on the processed medical record data comprises:
extracting all ICD codes of diseases from the processed medical record data, wherein the attribute characteristics of the ICD codes are disease diagnosis data;
calculating the relation among all disease nodes; the disease node is represented by ICD code;
calculating the occurrence frequency of each disease node;
filtering disease nodes with occurrence times smaller than a threshold value;
calculating the number of times each disease pair co-occurs;
defining an associated intensity index for measuring the disease pair;
and calculating the association strength between different disease pairs based on the association strength index, and taking the association strength as a weight of an edge for constructing a complication network.
6. The method for analyzing malignant tumor complications based on electronic medical records according to claim 5, wherein the correlation strength index of the disease pair is represented by the following formula:
wherein C is ij Representing the number of co-occurrence of each disease pair; ij represents a disease pair; i. j represents a disease i, a disease j, respectively; p (P) i Representing the number of occurrences of disease i; p (P) j Indicating the number of occurrences of disease j.
7. The method for analyzing malignant tumor complications based on electronic medical records according to claim 5, further comprising, after obtaining the correlation strength between different pairs of diseases:
obtaining association relations of different disease nodes according to association strength between different disease pairs;
abstracting the extracted diseases as nodes of the complication network, abstracting the association relation of different disease nodes as continuous edges of the complication network, and constructing a complication network model.
8. The method for analyzing malignant tumor complications based on electronic medical records according to claim 1, wherein sequentially performing topology analysis and node centrality analysis on the complications network model, and extracting key diseases based on a measurement algorithm of degree and medium centrality comprises:
based on the complication network model, using an adjacency matrix A= [ a ] ij ]∈R n×n Representing a network;
calculating the degree and the median centrality of each disease node;
calculating the average degree and average median centrality of all disease nodes in the network;
calculating a neighbor node set of each disease node;
respectively defining the association strength between different disease nodes and the importance index of the disease node;
and sequencing the disease nodes according to the importance index values of the disease nodes, obtaining the maximum value of the importance index of the disease node, and taking the disease node corresponding to the maximum value as a key disease.
9. The method for analyzing malignant tumor complications based on electronic medical records according to claim 8, wherein analyzing the complications patterns related to malignant tumor in different communities based on a community detection algorithm, mining the complications patterns of different types of malignant tumor, and obtaining the analysis result comprises:
embedding vector representations of nodes in the learning complication network using the graph based on modularity;
determining a community quantity range based on the modularity;
determining the optimal community quantity by using a Gap static algorithm;
dividing all nodes into different communities based on the optimal community number by using a KMeans++ algorithm to obtain community division of the nodes;
the node classification is mapped back to the complication network to obtain a community detection result of the complication network;
based on the community detection result, analyzing the complication modes related to malignant tumors of different communities to obtain an analysis result.
10. The method for analyzing malignant tumor complications based on electronic medical records according to claim 9, wherein the step of performing a visualization process on the analysis result comprises the steps of:
and visualizing the complication network, the key diseases and the community detection results respectively.
CN202310370759.8A 2023-04-07 2023-04-07 Malignant tumor complication analysis method based on electronic medical record Pending CN116469570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310370759.8A CN116469570A (en) 2023-04-07 2023-04-07 Malignant tumor complication analysis method based on electronic medical record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310370759.8A CN116469570A (en) 2023-04-07 2023-04-07 Malignant tumor complication analysis method based on electronic medical record

Publications (1)

Publication Number Publication Date
CN116469570A true CN116469570A (en) 2023-07-21

Family

ID=87174589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310370759.8A Pending CN116469570A (en) 2023-04-07 2023-04-07 Malignant tumor complication analysis method based on electronic medical record

Country Status (1)

Country Link
CN (1) CN116469570A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116959692A (en) * 2023-09-18 2023-10-27 北方健康医疗大数据科技有限公司 Electronic medical record quality control method, system, terminal and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116959692A (en) * 2023-09-18 2023-10-27 北方健康医疗大数据科技有限公司 Electronic medical record quality control method, system, terminal and storage medium

Similar Documents

Publication Publication Date Title
Butt et al. Deep learning system to screen coronavirus disease 2019 pneumonia
Sahu et al. FINE_DENSEIGANET: Automatic medical image classification in chest CT scan using Hybrid Deep Learning Framework
CN110680326B (en) Pneumoconiosis identification and grading judgment method based on deep convolutional neural network
CN112259221A (en) Lung cancer diagnosis system based on multiple machine learning algorithms
CN112164448B (en) Training method, prediction system, method and medium of immunotherapy efficacy prediction model
CN108511056A (en) Therapeutic scheme based on patients with cerebral apoplexy similarity analysis recommends method and system
Naresh et al. Early detection of lung cancer using neural network techniques
CN111243753B (en) Multi-factor correlation interactive analysis method for medical data
CN112201330A (en) Medical quality monitoring and evaluating method combining DRGs tool and Bayesian model
Inan et al. A hybrid probabilistic ensemble based extreme gradient boosting approach for breast cancer diagnosis
Zhang et al. Explainability metrics of deep convolutional networks for photoplethysmography quality assessment
CN111081381A (en) Intelligent screening method for critical indexes of prediction of nosocomial fatal gastrointestinal rebleeding
Tobias et al. CNN-based deep learning model for chest X-ray health classification using tensorflow
CN111191456A (en) Method for identifying text segmentation by using sequence label
CN116469570A (en) Malignant tumor complication analysis method based on electronic medical record
Razavi et al. Predicting metastasis in breast cancer: comparing a decision tree with domain experts
CN113274031A (en) Arrhythmia classification method based on deep convolution residual error network
Ranjan et al. Transfer learning based approach for pneumonia detection using customized VGG16 deep learning model
CN112861881A (en) Honeycomb lung recognition method based on improved MobileNet model
CN116228759B (en) Computer-aided diagnosis system and apparatus for renal cell carcinoma type
CN113284627A (en) Medication recommendation method based on patient characterization learning
Modak et al. A Study of Lung Cancer Prediction Using Machine Learning Algorithms
JP2024027086A (en) Chronic nephropathy subtype mining system based on self-supervised graph clustering
Mohapatra et al. Automated invasive cervical cancer disease detection at early stage through deep learning
Manikandan et al. Hybrid computational intelligence for healthcare and disease diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination