CN116860739A - Severe medical big data processing system and method - Google Patents
Severe medical big data processing system and method Download PDFInfo
- Publication number
- CN116860739A CN116860739A CN202310932050.2A CN202310932050A CN116860739A CN 116860739 A CN116860739 A CN 116860739A CN 202310932050 A CN202310932050 A CN 202310932050A CN 116860739 A CN116860739 A CN 116860739A
- Authority
- CN
- China
- Prior art keywords
- data
- patient
- processing
- database
- carrying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004458 analytical method Methods 0.000 claims abstract description 28
- 230000004927 fusion Effects 0.000 claims abstract description 25
- 201000010099 disease Diseases 0.000 claims description 21
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 21
- 238000006243 chemical reaction Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000013500 data storage Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 12
- 208000024891 symptom Diseases 0.000 claims description 12
- 230000002159 abnormal effect Effects 0.000 claims description 9
- 238000004088 simulation Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000002474 experimental method Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 238000012544 monitoring process Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 230000000007 visual effect Effects 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 5
- 206010000117 Abnormal behaviour Diseases 0.000 claims description 3
- 230000005856 abnormality Effects 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 239000008280 blood Substances 0.000 claims description 3
- 210000004369 blood Anatomy 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000013499 data model Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 210000002458 fetal heart Anatomy 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000001631 haemodialysis Methods 0.000 claims description 3
- 230000000322 hemodialysis Effects 0.000 claims description 3
- 230000000004 hemodynamic effect Effects 0.000 claims description 3
- 238000001802 infusion Methods 0.000 claims description 3
- 238000007917 intracranial administration Methods 0.000 claims description 3
- 238000007726 management method Methods 0.000 claims description 3
- 238000006213 oxygenation reaction Methods 0.000 claims description 3
- 230000002685 pulmonary effect Effects 0.000 claims description 3
- 230000000241 respiratory effect Effects 0.000 claims description 3
- 230000029058 respiratory gaseous exchange Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 210000002700 urine Anatomy 0.000 claims description 3
- 238000012800 visualization Methods 0.000 claims description 3
- 238000003759 clinical diagnosis Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 3
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000005526 G1 to G0 transition Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/217—Database tuning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/20—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
A severe medical big data processing system and method comprises the following steps: acquiring multi-element high-dimensional severe medical raw data; unified standardization processing is carried out on the original data to obtain basic data; and obtaining analysis data by carrying out data fusion on the basic data. The application extracts the data meeting the requirements by carrying out two processing flows of unified standardization processing and data fusion on the original data, including data content and data format, thereby being capable of rapidly providing comprehensive data for assisting clinical diagnosis for severe medical treatment.
Description
Technical Field
The application relates to a severe medical big data processing system and a method.
Background
Critical medical care is an indispensable ring for life saving and is the last line of life saving. Modern critical care generates a large amount of data related to life health, and how to process a large amount of critical care raw data and provide guidance and reference for subsequent critical care tasks is an urgent problem to be solved. With the development of computer technology, the digitization of severe medical treatment becomes realistic. Most of the existing processing methods mainly include searching, and data is not deeply mined, so that the data is actually wasted in the accumulated data of the severe medical treatment.
Disclosure of Invention
In order to solve the problems, the application discloses a severe medical big data processing method, which comprises the following steps:
acquiring multi-element high-dimensional severe medical raw data;
unified standardization processing is carried out on the original data to obtain basic data;
and obtaining analysis data by carrying out data fusion on the basic data. The application extracts the data meeting the requirements by carrying out two processing flows of unified standardization processing and data fusion on the original data, including data content and data format, thereby being capable of rapidly providing comprehensive data for assisting clinical diagnosis for severe medical treatment.
Preferably, the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;
the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;
the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;
the unstructured data comprises medical images, videos, voices and files;
the structured data comprises a table.
Preferably, the unified normalization process includes data computation and data storage;
the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;
the primary calculation comprises addition, subtraction, duplication removal and combination operations;
the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;
the relational database stores basic information and clinical information of patients;
the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;
the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;
the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;
the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.
Preferably, the data calculation comprises unified data conversion and unified data processing;
the unified data conversion is needed before the unified data processing operation, so that the computer is easy to process and recognize; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;
the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;
aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;
for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;
the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;
hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;
the data of each database is extracted, and after conversion, the data is stored and managed by adopting an elastic search database, and the data is used for storing data of digital, character strings, texts, voice and image video types, so that the sharing of the data can be facilitated, including the consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed. The application aims to perform effective treatment on the data and repair or discard the data with problems, avoid misleading in the use process, and simultaneously perform standardized storage on the data so as to improve query retrieval timeliness in the use process as much as possible.
Preferably, the data unified standardization process further comprises a process of converting unstructured data into structured data and extracting characteristic attributes of patients.
Preferably, the unstructured data is obtained from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;
the text data is processed according to the following flow:
obtaining subjects, predicates, objects, stationary words, idioms and idioms in sentences by using word segmentation and grammar analysis technology,
analyzing the meaning expressed by the text by adopting a BERT or GPT tool, and summarizing the summary of the expressed meaning, namely using a plurality of real words to express the core meaning of a sentence or a section of speech,
summarizing text keywords of a plurality of severe patients and performing classified coding,
extracting the required characteristics of a patient to perform data structuring;
and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data. The text processing method is suitable for the field of severe medical treatment, and mainly because the diagnosis and the medication list made by doctors have the characteristics of stronger terminology and more consistent words, and the text data can be extracted rapidly according to the characteristics.
Preferably, the data fusion comprises the steps of carrying out logic judgment and reasoning on basic data, carrying out machine learning and deep learning, analyzing the data, and helping related personnel to find rules, associations and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;
the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;
the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;
the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;
the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.
Preferably, the data fusion is performed as follows:
inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;
data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, tables and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;
the method comprises the steps of freely fusing data in different forms according to requirements, and acquiring all data acquired by fusion by default, specifically, acquiring basic information of a patient from a structural database, gradually expanding the fused data by taking the basic information as a base point, and aligning the basic information of the patient for other structural data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time sequence waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves, carrying out summary structuring on text data, carrying out structural alignment on calculated values with index, and providing positions of original data;
according to the structured data alignment principle, unstructured data is firstly structured and then structured and aligned, and multi-source data is integrated into a structured data table with distinct strands, so that the management and the application in the future are facilitated;
the data flow chart is drawn from the source data to the structured data elements in the data table in each step of data passing, so that the places where the data come from and where the data come from are clearly seen, and the tracing and the checking of the data when problems occur are facilitated; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.
Preferably, the method further comprises the step of visually and clearly displaying the distribution, the change and the association of the analysis data through visual display of the analysis data for the view display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;
according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;
and displaying clinical data if the source data is displayed, and displaying simulation data if a exploratory experiment is performed on a certain assumed value, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.
On the other hand, also discloses a severe medical big data processing module, which comprises the following modules:
the data module is used for acquiring multi-element high-dimensional severe medical original data;
the unified processing module is used for carrying out unified standardization processing on the original data to obtain basic data;
the analysis module is used for obtaining analysis data by carrying out data fusion on the basic data;
and the display module is used for carrying out view display on the analysis data.
The application has the following beneficial effects:
1. according to the application, through carrying out unified standardized processing and data fusion on the original data, the data meeting the requirements is extracted to obtain the data which comprises the data content and the data format, so that comprehensive data for assisting clinical diagnosis can be rapidly provided for severe medical treatment;
2. the application aims to perform effective treatment on the data and repair or discard the data with problems, avoid misleading in the use process, and simultaneously perform standardized storage on the data, thereby improving query retrieval timeliness in the use process as much as possible;
3. the text processing method is suitable for the field of severe medical treatment, and mainly because the diagnosis and the medication list made by doctors have the characteristics of stronger terminology and more consistent words, and the text data can be extracted rapidly according to the characteristics.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic diagram of example 1;
FIG. 2 is a schematic diagram of text-type data;
FIG. 3 is a schematic diagram of data fusion;
fig. 4 is a schematic diagram of example 2.
Detailed Description
In order to clearly illustrate the technical characteristics of the scheme, the application is explained in detail by the following specific embodiments.
In a first embodiment, as shown in fig. 1, a method for processing critical medical big data includes the steps of:
s101, acquiring multi-element high-dimensional severe medical original data;
the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;
the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;
the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;
the unstructured data comprise medical images, videos, voices and files, the medical images comprise X-ray images, radionuclide images, severe ultrasonic images, magnetic resonance images, pathological images and the like, and the files comprise vital sign data reports, medical diagnosis books, medical tool books, examination report sheets and the like;
the structured data includes tables including patient basic information tables, clinical information tables, and the like.
S102, carrying out unified standardization processing on original data to obtain basic data;
the unified standardization processing comprises data calculation and data storage;
the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;
the primary calculation comprises addition, subtraction, duplication removal and combination operations;
the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;
the relational database stores basic information and clinical information of patients;
the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;
the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;
the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;
the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.
The data calculation comprises unified data conversion and unified data processing;
the unified data conversion is needed before the unified data processing operation, so that the computer is easy to process and recognize; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;
the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;
aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;
for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;
the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;
hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;
the data of each database is extracted, and after conversion, the data is stored and managed by adopting an elastic search database, and the data is used for storing data of digital, character strings, texts, voice and image video types, so that the sharing of the data can be facilitated, including the consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed.
The method also comprises the process of converting unstructured data into structured data and extracting characteristic attributes of patients in the unified data standardization process.
The unstructured data is acquired from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;
as shown in fig. 2, text data is processed according to the following procedure:
s1021, obtaining subjects, predicates, objects, stationary phases, idioms and idioms in sentences by using word segmentation and grammar analysis technology,
s1022 analyzes the text-expressed meaning using BERT or GPT tool, and summaries the expressed meaning, i.e. expresses a sentence or core meaning of a paragraph using several real words,
s1023, summarizing text keywords of a plurality of severe patients and carrying out classification coding,
s1024, extracting the characteristics required by the patient and carrying out data structuring;
and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data.
S103, obtaining analysis data by carrying out data fusion on the basic data.
The data fusion comprises the steps of carrying out logic judgment and reasoning on basic data, carrying out machine learning and deep learning, analyzing the data, and helping related personnel to find the rules, association and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;
the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;
the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;
the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;
the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.
As shown in fig. 3, the data fusion is performed as follows:
s1031, inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;
s1032, data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, forms and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;
s1033, freely fusing data in different forms according to the need, wherein all data acquired by fusing are defaulted, specifically, basic information of a patient is acquired from a structured database, the fused data is gradually expanded by taking the basic information as a base point, and the basic information of the patient is aligned to other structured data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time sequence waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves, carrying out summary structuring on text data, carrying out structural alignment on calculated values with index, and providing positions of original data;
s1034, integrating the multi-source data into a structured data table with distinct strands according to a structured data alignment principle in a mode that unstructured data is subjected to structuring operation and then structured alignment, so that convenience is brought to management and application in the future;
s1035, drawing a data flow chart from source data to structured data elements in a data table, and clearly seeing where the data comes and where the data comes, so that the tracing and the checking are convenient when the data has problems; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.
S104 view presentation of analysis data
The view display visually and clearly displays the distribution, change and association of the analysis data through visual display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;
according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;
and displaying clinical data if the source data is displayed, and displaying simulation data if a exploratory experiment is performed on a certain assumed value, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.
In a second embodiment, as shown in fig. 4, a severe medical big data processing module includes the following modules:
the data module 201 is used for acquiring multi-element high-dimensional severe medical original data;
the unified processing module 202 is configured to perform unified normalization processing on the original data to obtain basic data;
the analysis module 203 is configured to obtain analysis data by performing data fusion on the basic data;
and the display module 204 is used for performing view display on the analysis data.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.
Claims (10)
1. A severe medical big data processing method is characterized in that: the method comprises the following steps:
acquiring multi-element high-dimensional severe medical raw data;
unified standardization processing is carried out on the original data to obtain basic data;
and obtaining analysis data by carrying out data fusion on the basic data.
2. The method for processing severe medical big data according to claim 1, wherein: the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;
the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;
the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;
the unstructured data comprises medical images, videos, voices and files;
the structured data comprises a table.
3. The method for processing severe medical big data according to claim 1, wherein: the unified standardization processing comprises data calculation and data storage;
the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;
the primary calculation comprises addition, subtraction, duplication removal and combination operations;
the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;
the relational database stores basic information and clinical information of patients;
the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;
the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;
the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;
the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.
4. The method for processing severe medical big data according to claim 1, wherein: the data calculation comprises unified data conversion and unified data processing;
the unified data conversion is required before the unified data processing operation so that the computer can easily process and identify the data; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;
the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;
aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;
for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;
the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;
hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;
extracting data of each database, converting, and storing and managing the data by adopting an elastic search database, wherein the data are used for storing data of digital, character strings, texts, voice and image video types so as to facilitate sharing of the data, including consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed.
5. The method for processing severe medical big data according to claim 4, wherein:
the method also comprises the process of converting unstructured data into structured data and extracting characteristic attributes of patients in the unified data standardization process.
6. The method for processing severe medical big data according to claim 5, wherein: the unstructured data is acquired from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;
the text data is processed according to the following flow:
obtaining subjects, predicates, objects, stationary words, idioms and idioms in sentences by using word segmentation and grammar analysis technology,
analyzing the meaning expressed by the text by adopting a BERT or GPT tool, and summarizing the summary of the expressed meaning, namely using a plurality of real words to express the core meaning of a sentence or a section of speech,
summarizing text keywords of a plurality of severe patients and performing classified coding,
extracting the required characteristics of a patient to perform data structuring;
and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data.
7. The method for processing severe medical big data according to claim 1, wherein: the data fusion comprises the steps of carrying out logic judgment and reasoning, machine learning and deep learning on basic data, analyzing the data, and helping related personnel to find rules, associations and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;
the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;
the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;
the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;
the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.
8. The method for processing severe medical big data according to claim 7, wherein: the data fusion is performed as follows:
inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;
data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, tables and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;
the method comprises the steps of freely fusing data in different forms according to requirements, and acquiring all data acquired by fusion by default, specifically, acquiring basic information of a patient from a structural database, gradually expanding the fused data by taking the basic information as a base point, and aligning the basic information of the patient for other structural data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time-series waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves; carrying out summary structuring on text data, and carrying out structural alignment on calculated values with index; and providing the location of the original data;
according to the structured data alignment principle, unstructured data is firstly structured and then structured and aligned, and multi-source data is integrated into a structured data table with distinct strands, so that the management and the application in the future are facilitated;
the data flow chart is drawn from the source data to the structured data elements in the data table in each step of data passing, so that the places where the data come from and where the data come from are clearly seen, and the tracing and the checking of the data when problems occur are facilitated; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.
9. The method for processing severe medical big data according to claim 1, wherein: the visual display system also comprises visual display of the analysis data, wherein the visual display visually and clearly displays the distribution, the change and the association of the analysis data through visual display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;
according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;
if the source data is displayed, displaying clinical data; if a exploratory experiment is performed on a certain assumed value, simulation data is displayed, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.
10. A severe medical big data processing module, which is characterized in that: the device comprises the following modules:
the data module is used for acquiring multi-element high-dimensional severe medical original data;
the unified processing module is used for carrying out unified standardization processing on the original data to obtain basic data;
the analysis module is used for obtaining analysis data by carrying out data fusion on the basic data;
and the display module is used for carrying out view display on the analysis data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310932050.2A CN116860739A (en) | 2023-07-27 | 2023-07-27 | Severe medical big data processing system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310932050.2A CN116860739A (en) | 2023-07-27 | 2023-07-27 | Severe medical big data processing system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116860739A true CN116860739A (en) | 2023-10-10 |
Family
ID=88235879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310932050.2A Pending CN116860739A (en) | 2023-07-27 | 2023-07-27 | Severe medical big data processing system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116860739A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117171176A (en) * | 2023-11-03 | 2023-12-05 | 北京格蒂智能科技有限公司 | Electricity consumption big data self-upgrading supervision platform based on artificial intelligence |
CN117272395A (en) * | 2023-11-21 | 2023-12-22 | 江西曼荼罗软件有限公司 | Patient medical data processing method and system |
CN117648289A (en) * | 2024-01-22 | 2024-03-05 | 北京梦天门科技股份有限公司 | Unified integration method for county-domain medical co-body multi-type data |
-
2023
- 2023-07-27 CN CN202310932050.2A patent/CN116860739A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117171176A (en) * | 2023-11-03 | 2023-12-05 | 北京格蒂智能科技有限公司 | Electricity consumption big data self-upgrading supervision platform based on artificial intelligence |
CN117171176B (en) * | 2023-11-03 | 2024-02-02 | 北京格蒂智能科技有限公司 | Electricity consumption big data self-upgrading supervision platform based on artificial intelligence |
CN117272395A (en) * | 2023-11-21 | 2023-12-22 | 江西曼荼罗软件有限公司 | Patient medical data processing method and system |
CN117272395B (en) * | 2023-11-21 | 2024-01-26 | 江西曼荼罗软件有限公司 | Patient medical data processing method and system |
CN117648289A (en) * | 2024-01-22 | 2024-03-05 | 北京梦天门科技股份有限公司 | Unified integration method for county-domain medical co-body multi-type data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10818397B2 (en) | Clinical content analytics engine | |
CN109299239B (en) | ES-based electronic medical record retrieval method | |
US11823798B2 (en) | Container-based knowledge graphs for determining entity relations in non-narrative text | |
CN116860739A (en) | Severe medical big data processing system and method | |
CN111801741B (en) | Adverse drug reaction analysis | |
US8949108B2 (en) | Document processing, template generation and concept library generation method and apparatus | |
CN112863630A (en) | Personalized accurate medical question-answering system based on data and knowledge | |
CN109346169A (en) | A kind of artificial intelligence assisting in diagnosis and treatment system and its construction method, equipment and storage medium | |
US20140181128A1 (en) | Systems and Methods for Processing Patient Data History | |
CN109241257A (en) | A kind of the wisdom question answering system and its method of knowledge based map | |
US20180096103A1 (en) | Verification of Clinical Hypothetical Statements Based on Dynamic Cluster Analysis | |
Pereira et al. | ICD9-based text mining approach to children epilepsy classification | |
CN113688255A (en) | Knowledge graph construction method based on Chinese electronic medical record | |
CN112562808B (en) | Patient portrait generation method, apparatus, electronic device and storage medium | |
CN112466462B (en) | EMR information association and evolution method based on deep learning of image | |
CN110532367A (en) | A kind of information cuing method and system | |
CN114191665A (en) | Method and device for classifying man-machine asynchronous phenomena in mechanical ventilation process | |
JP2017167738A (en) | Diagnostic processing device, diagnostic processing system, server, diagnostic processing method, and program | |
Chen et al. | Automatically structuring on Chinese ultrasound report of cerebrovascular diseases via natural language processing | |
Ming et al. | AI assisted clinical diagnosis & treatment and development strategy | |
Gu et al. | Strokepeo: Construction of a clinical ontology for physical examination of stroke | |
Jiang et al. | MMDA: A Multimodal Dataset for Depression and Anxiety Detection | |
US20230053429A1 (en) | System and method for automatic analysis of texts in psychotherapy, counseling, and other mental health management activities | |
CN117194677B (en) | Method and system for constructing, expanding and evaluating clinical practice guideline ontology | |
EP4191607A1 (en) | Computer implemented method for analyzing medical data, system for analyzing medical data and computer readable medium storing software |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |