CN116860739A - Severe medical big data processing system and method - Google Patents

Severe medical big data processing system and method Download PDF

Info

Publication number
CN116860739A
CN116860739A CN202310932050.2A CN202310932050A CN116860739A CN 116860739 A CN116860739 A CN 116860739A CN 202310932050 A CN202310932050 A CN 202310932050A CN 116860739 A CN116860739 A CN 116860739A
Authority
CN
China
Prior art keywords
data
patient
processing
database
carrying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310932050.2A
Other languages
Chinese (zh)
Inventor
苏龙翔
刘宪龙
李颖川
潘纯
李友章
刘伟明
白振峰
崔培存
李先涛
王启星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shumu Medical Technology Co ltd
Original Assignee
Shanghai Shumu Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shumu Medical Technology Co ltd filed Critical Shanghai Shumu Medical Technology Co ltd
Priority to CN202310932050.2A priority Critical patent/CN116860739A/en
Publication of CN116860739A publication Critical patent/CN116860739A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

A severe medical big data processing system and method comprises the following steps: acquiring multi-element high-dimensional severe medical raw data; unified standardization processing is carried out on the original data to obtain basic data; and obtaining analysis data by carrying out data fusion on the basic data. The application extracts the data meeting the requirements by carrying out two processing flows of unified standardization processing and data fusion on the original data, including data content and data format, thereby being capable of rapidly providing comprehensive data for assisting clinical diagnosis for severe medical treatment.

Description

Severe medical big data processing system and method
Technical Field
The application relates to a severe medical big data processing system and a method.
Background
Critical medical care is an indispensable ring for life saving and is the last line of life saving. Modern critical care generates a large amount of data related to life health, and how to process a large amount of critical care raw data and provide guidance and reference for subsequent critical care tasks is an urgent problem to be solved. With the development of computer technology, the digitization of severe medical treatment becomes realistic. Most of the existing processing methods mainly include searching, and data is not deeply mined, so that the data is actually wasted in the accumulated data of the severe medical treatment.
Disclosure of Invention
In order to solve the problems, the application discloses a severe medical big data processing method, which comprises the following steps:
acquiring multi-element high-dimensional severe medical raw data;
unified standardization processing is carried out on the original data to obtain basic data;
and obtaining analysis data by carrying out data fusion on the basic data. The application extracts the data meeting the requirements by carrying out two processing flows of unified standardization processing and data fusion on the original data, including data content and data format, thereby being capable of rapidly providing comprehensive data for assisting clinical diagnosis for severe medical treatment.
Preferably, the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;
the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;
the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;
the unstructured data comprises medical images, videos, voices and files;
the structured data comprises a table.
Preferably, the unified normalization process includes data computation and data storage;
the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;
the primary calculation comprises addition, subtraction, duplication removal and combination operations;
the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;
the relational database stores basic information and clinical information of patients;
the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;
the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;
the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;
the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.
Preferably, the data calculation comprises unified data conversion and unified data processing;
the unified data conversion is needed before the unified data processing operation, so that the computer is easy to process and recognize; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;
the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;
aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;
for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;
the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;
hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;
the data of each database is extracted, and after conversion, the data is stored and managed by adopting an elastic search database, and the data is used for storing data of digital, character strings, texts, voice and image video types, so that the sharing of the data can be facilitated, including the consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed. The application aims to perform effective treatment on the data and repair or discard the data with problems, avoid misleading in the use process, and simultaneously perform standardized storage on the data so as to improve query retrieval timeliness in the use process as much as possible.
Preferably, the data unified standardization process further comprises a process of converting unstructured data into structured data and extracting characteristic attributes of patients.
Preferably, the unstructured data is obtained from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;
the text data is processed according to the following flow:
obtaining subjects, predicates, objects, stationary words, idioms and idioms in sentences by using word segmentation and grammar analysis technology,
analyzing the meaning expressed by the text by adopting a BERT or GPT tool, and summarizing the summary of the expressed meaning, namely using a plurality of real words to express the core meaning of a sentence or a section of speech,
summarizing text keywords of a plurality of severe patients and performing classified coding,
extracting the required characteristics of a patient to perform data structuring;
and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data. The text processing method is suitable for the field of severe medical treatment, and mainly because the diagnosis and the medication list made by doctors have the characteristics of stronger terminology and more consistent words, and the text data can be extracted rapidly according to the characteristics.
Preferably, the data fusion comprises the steps of carrying out logic judgment and reasoning on basic data, carrying out machine learning and deep learning, analyzing the data, and helping related personnel to find rules, associations and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;
the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;
the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;
the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;
the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.
Preferably, the data fusion is performed as follows:
inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;
data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, tables and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;
the method comprises the steps of freely fusing data in different forms according to requirements, and acquiring all data acquired by fusion by default, specifically, acquiring basic information of a patient from a structural database, gradually expanding the fused data by taking the basic information as a base point, and aligning the basic information of the patient for other structural data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time sequence waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves, carrying out summary structuring on text data, carrying out structural alignment on calculated values with index, and providing positions of original data;
according to the structured data alignment principle, unstructured data is firstly structured and then structured and aligned, and multi-source data is integrated into a structured data table with distinct strands, so that the management and the application in the future are facilitated;
the data flow chart is drawn from the source data to the structured data elements in the data table in each step of data passing, so that the places where the data come from and where the data come from are clearly seen, and the tracing and the checking of the data when problems occur are facilitated; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.
Preferably, the method further comprises the step of visually and clearly displaying the distribution, the change and the association of the analysis data through visual display of the analysis data for the view display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;
according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;
and displaying clinical data if the source data is displayed, and displaying simulation data if a exploratory experiment is performed on a certain assumed value, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.
On the other hand, also discloses a severe medical big data processing module, which comprises the following modules:
the data module is used for acquiring multi-element high-dimensional severe medical original data;
the unified processing module is used for carrying out unified standardization processing on the original data to obtain basic data;
the analysis module is used for obtaining analysis data by carrying out data fusion on the basic data;
and the display module is used for carrying out view display on the analysis data.
The application has the following beneficial effects:
1. according to the application, through carrying out unified standardized processing and data fusion on the original data, the data meeting the requirements is extracted to obtain the data which comprises the data content and the data format, so that comprehensive data for assisting clinical diagnosis can be rapidly provided for severe medical treatment;
2. the application aims to perform effective treatment on the data and repair or discard the data with problems, avoid misleading in the use process, and simultaneously perform standardized storage on the data, thereby improving query retrieval timeliness in the use process as much as possible;
3. the text processing method is suitable for the field of severe medical treatment, and mainly because the diagnosis and the medication list made by doctors have the characteristics of stronger terminology and more consistent words, and the text data can be extracted rapidly according to the characteristics.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic diagram of example 1;
FIG. 2 is a schematic diagram of text-type data;
FIG. 3 is a schematic diagram of data fusion;
fig. 4 is a schematic diagram of example 2.
Detailed Description
In order to clearly illustrate the technical characteristics of the scheme, the application is explained in detail by the following specific embodiments.
In a first embodiment, as shown in fig. 1, a method for processing critical medical big data includes the steps of:
s101, acquiring multi-element high-dimensional severe medical original data;
the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;
the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;
the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;
the unstructured data comprise medical images, videos, voices and files, the medical images comprise X-ray images, radionuclide images, severe ultrasonic images, magnetic resonance images, pathological images and the like, and the files comprise vital sign data reports, medical diagnosis books, medical tool books, examination report sheets and the like;
the structured data includes tables including patient basic information tables, clinical information tables, and the like.
S102, carrying out unified standardization processing on original data to obtain basic data;
the unified standardization processing comprises data calculation and data storage;
the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;
the primary calculation comprises addition, subtraction, duplication removal and combination operations;
the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;
the relational database stores basic information and clinical information of patients;
the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;
the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;
the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;
the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.
The data calculation comprises unified data conversion and unified data processing;
the unified data conversion is needed before the unified data processing operation, so that the computer is easy to process and recognize; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;
the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;
aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;
for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;
the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;
hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;
the data of each database is extracted, and after conversion, the data is stored and managed by adopting an elastic search database, and the data is used for storing data of digital, character strings, texts, voice and image video types, so that the sharing of the data can be facilitated, including the consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed.
The method also comprises the process of converting unstructured data into structured data and extracting characteristic attributes of patients in the unified data standardization process.
The unstructured data is acquired from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;
as shown in fig. 2, text data is processed according to the following procedure:
s1021, obtaining subjects, predicates, objects, stationary phases, idioms and idioms in sentences by using word segmentation and grammar analysis technology,
s1022 analyzes the text-expressed meaning using BERT or GPT tool, and summaries the expressed meaning, i.e. expresses a sentence or core meaning of a paragraph using several real words,
s1023, summarizing text keywords of a plurality of severe patients and carrying out classification coding,
s1024, extracting the characteristics required by the patient and carrying out data structuring;
and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data.
S103, obtaining analysis data by carrying out data fusion on the basic data.
The data fusion comprises the steps of carrying out logic judgment and reasoning on basic data, carrying out machine learning and deep learning, analyzing the data, and helping related personnel to find the rules, association and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;
the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;
the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;
the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;
the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.
As shown in fig. 3, the data fusion is performed as follows:
s1031, inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;
s1032, data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, forms and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;
s1033, freely fusing data in different forms according to the need, wherein all data acquired by fusing are defaulted, specifically, basic information of a patient is acquired from a structured database, the fused data is gradually expanded by taking the basic information as a base point, and the basic information of the patient is aligned to other structured data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time sequence waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves, carrying out summary structuring on text data, carrying out structural alignment on calculated values with index, and providing positions of original data;
s1034, integrating the multi-source data into a structured data table with distinct strands according to a structured data alignment principle in a mode that unstructured data is subjected to structuring operation and then structured alignment, so that convenience is brought to management and application in the future;
s1035, drawing a data flow chart from source data to structured data elements in a data table, and clearly seeing where the data comes and where the data comes, so that the tracing and the checking are convenient when the data has problems; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.
S104 view presentation of analysis data
The view display visually and clearly displays the distribution, change and association of the analysis data through visual display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;
according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;
and displaying clinical data if the source data is displayed, and displaying simulation data if a exploratory experiment is performed on a certain assumed value, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.
In a second embodiment, as shown in fig. 4, a severe medical big data processing module includes the following modules:
the data module 201 is used for acquiring multi-element high-dimensional severe medical original data;
the unified processing module 202 is configured to perform unified normalization processing on the original data to obtain basic data;
the analysis module 203 is configured to obtain analysis data by performing data fusion on the basic data;
and the display module 204 is used for performing view display on the analysis data.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A severe medical big data processing method is characterized in that: the method comprises the following steps:
acquiring multi-element high-dimensional severe medical raw data;
unified standardization processing is carried out on the original data to obtain basic data;
and obtaining analysis data by carrying out data fusion on the basic data.
2. The method for processing severe medical big data according to claim 1, wherein: the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;
the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;
the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;
the unstructured data comprises medical images, videos, voices and files;
the structured data comprises a table.
3. The method for processing severe medical big data according to claim 1, wherein: the unified standardization processing comprises data calculation and data storage;
the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;
the primary calculation comprises addition, subtraction, duplication removal and combination operations;
the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;
the relational database stores basic information and clinical information of patients;
the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;
the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;
the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;
the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.
4. The method for processing severe medical big data according to claim 1, wherein: the data calculation comprises unified data conversion and unified data processing;
the unified data conversion is required before the unified data processing operation so that the computer can easily process and identify the data; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;
the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;
aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;
for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;
the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;
hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;
extracting data of each database, converting, and storing and managing the data by adopting an elastic search database, wherein the data are used for storing data of digital, character strings, texts, voice and image video types so as to facilitate sharing of the data, including consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed.
5. The method for processing severe medical big data according to claim 4, wherein:
the method also comprises the process of converting unstructured data into structured data and extracting characteristic attributes of patients in the unified data standardization process.
6. The method for processing severe medical big data according to claim 5, wherein: the unstructured data is acquired from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;
the text data is processed according to the following flow:
obtaining subjects, predicates, objects, stationary words, idioms and idioms in sentences by using word segmentation and grammar analysis technology,
analyzing the meaning expressed by the text by adopting a BERT or GPT tool, and summarizing the summary of the expressed meaning, namely using a plurality of real words to express the core meaning of a sentence or a section of speech,
summarizing text keywords of a plurality of severe patients and performing classified coding,
extracting the required characteristics of a patient to perform data structuring;
and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data.
7. The method for processing severe medical big data according to claim 1, wherein: the data fusion comprises the steps of carrying out logic judgment and reasoning, machine learning and deep learning on basic data, analyzing the data, and helping related personnel to find rules, associations and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;
the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;
the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;
the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;
the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.
8. The method for processing severe medical big data according to claim 7, wherein: the data fusion is performed as follows:
inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;
data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, tables and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;
the method comprises the steps of freely fusing data in different forms according to requirements, and acquiring all data acquired by fusion by default, specifically, acquiring basic information of a patient from a structural database, gradually expanding the fused data by taking the basic information as a base point, and aligning the basic information of the patient for other structural data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time-series waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves; carrying out summary structuring on text data, and carrying out structural alignment on calculated values with index; and providing the location of the original data;
according to the structured data alignment principle, unstructured data is firstly structured and then structured and aligned, and multi-source data is integrated into a structured data table with distinct strands, so that the management and the application in the future are facilitated;
the data flow chart is drawn from the source data to the structured data elements in the data table in each step of data passing, so that the places where the data come from and where the data come from are clearly seen, and the tracing and the checking of the data when problems occur are facilitated; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.
9. The method for processing severe medical big data according to claim 1, wherein: the visual display system also comprises visual display of the analysis data, wherein the visual display visually and clearly displays the distribution, the change and the association of the analysis data through visual display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;
according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;
if the source data is displayed, displaying clinical data; if a exploratory experiment is performed on a certain assumed value, simulation data is displayed, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.
10. A severe medical big data processing module, which is characterized in that: the device comprises the following modules:
the data module is used for acquiring multi-element high-dimensional severe medical original data;
the unified processing module is used for carrying out unified standardization processing on the original data to obtain basic data;
the analysis module is used for obtaining analysis data by carrying out data fusion on the basic data;
and the display module is used for carrying out view display on the analysis data.
CN202310932050.2A 2023-07-27 2023-07-27 Severe medical big data processing system and method Pending CN116860739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310932050.2A CN116860739A (en) 2023-07-27 2023-07-27 Severe medical big data processing system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310932050.2A CN116860739A (en) 2023-07-27 2023-07-27 Severe medical big data processing system and method

Publications (1)

Publication Number Publication Date
CN116860739A true CN116860739A (en) 2023-10-10

Family

ID=88235879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310932050.2A Pending CN116860739A (en) 2023-07-27 2023-07-27 Severe medical big data processing system and method

Country Status (1)

Country Link
CN (1) CN116860739A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171176A (en) * 2023-11-03 2023-12-05 北京格蒂智能科技有限公司 Electricity consumption big data self-upgrading supervision platform based on artificial intelligence
CN117272395A (en) * 2023-11-21 2023-12-22 江西曼荼罗软件有限公司 Patient medical data processing method and system
CN117648289A (en) * 2024-01-22 2024-03-05 北京梦天门科技股份有限公司 Unified integration method for county-domain medical co-body multi-type data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171176A (en) * 2023-11-03 2023-12-05 北京格蒂智能科技有限公司 Electricity consumption big data self-upgrading supervision platform based on artificial intelligence
CN117171176B (en) * 2023-11-03 2024-02-02 北京格蒂智能科技有限公司 Electricity consumption big data self-upgrading supervision platform based on artificial intelligence
CN117272395A (en) * 2023-11-21 2023-12-22 江西曼荼罗软件有限公司 Patient medical data processing method and system
CN117272395B (en) * 2023-11-21 2024-01-26 江西曼荼罗软件有限公司 Patient medical data processing method and system
CN117648289A (en) * 2024-01-22 2024-03-05 北京梦天门科技股份有限公司 Unified integration method for county-domain medical co-body multi-type data

Similar Documents

Publication Publication Date Title
US10818397B2 (en) Clinical content analytics engine
CN109299239B (en) ES-based electronic medical record retrieval method
US11823798B2 (en) Container-based knowledge graphs for determining entity relations in non-narrative text
CN116860739A (en) Severe medical big data processing system and method
CN111801741B (en) Adverse drug reaction analysis
US8949108B2 (en) Document processing, template generation and concept library generation method and apparatus
CN112863630A (en) Personalized accurate medical question-answering system based on data and knowledge
CN109346169A (en) A kind of artificial intelligence assisting in diagnosis and treatment system and its construction method, equipment and storage medium
US20140181128A1 (en) Systems and Methods for Processing Patient Data History
CN109241257A (en) A kind of the wisdom question answering system and its method of knowledge based map
US20180096103A1 (en) Verification of Clinical Hypothetical Statements Based on Dynamic Cluster Analysis
Pereira et al. ICD9-based text mining approach to children epilepsy classification
CN113688255A (en) Knowledge graph construction method based on Chinese electronic medical record
CN112562808B (en) Patient portrait generation method, apparatus, electronic device and storage medium
CN112466462B (en) EMR information association and evolution method based on deep learning of image
CN110532367A (en) A kind of information cuing method and system
CN114191665A (en) Method and device for classifying man-machine asynchronous phenomena in mechanical ventilation process
JP2017167738A (en) Diagnostic processing device, diagnostic processing system, server, diagnostic processing method, and program
Chen et al. Automatically structuring on Chinese ultrasound report of cerebrovascular diseases via natural language processing
Ming et al. AI assisted clinical diagnosis & treatment and development strategy
Gu et al. Strokepeo: Construction of a clinical ontology for physical examination of stroke
Jiang et al. MMDA: A Multimodal Dataset for Depression and Anxiety Detection
US20230053429A1 (en) System and method for automatic analysis of texts in psychotherapy, counseling, and other mental health management activities
CN117194677B (en) Method and system for constructing, expanding and evaluating clinical practice guideline ontology
EP4191607A1 (en) Computer implemented method for analyzing medical data, system for analyzing medical data and computer readable medium storing software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination