CN116860739A

CN116860739A - Severe medical big data processing system and method

Info

Publication number: CN116860739A
Application number: CN202310932050.2A
Authority: CN
Inventors: 苏龙翔; 刘宪龙; 李颖川; 潘纯; 李友章; 刘伟明; 白振峰; 崔培存; 李先涛; 王启星
Original assignee: Shanghai Shumu Medical Technology Co ltd
Current assignee: Shanghai Shumu Medical Technology Co ltd
Priority date: 2023-07-27
Filing date: 2023-07-27
Publication date: 2023-10-10

Abstract

A severe medical big data processing system and method comprises the following steps: acquiring multi-element high-dimensional severe medical raw data; unified standardization processing is carried out on the original data to obtain basic data; and obtaining analysis data by carrying out data fusion on the basic data. The application extracts the data meeting the requirements by carrying out two processing flows of unified standardization processing and data fusion on the original data, including data content and data format, thereby being capable of rapidly providing comprehensive data for assisting clinical diagnosis for severe medical treatment.

Description

Severe medical big data processing system and method

Technical Field

The application relates to a severe medical big data processing system and a method.

Background

Critical medical care is an indispensable ring for life saving and is the last line of life saving. Modern critical care generates a large amount of data related to life health, and how to process a large amount of critical care raw data and provide guidance and reference for subsequent critical care tasks is an urgent problem to be solved. With the development of computer technology, the digitization of severe medical treatment becomes realistic. Most of the existing processing methods mainly include searching, and data is not deeply mined, so that the data is actually wasted in the accumulated data of the severe medical treatment.

Disclosure of Invention

In order to solve the problems, the application discloses a severe medical big data processing method, which comprises the following steps:

acquiring multi-element high-dimensional severe medical raw data;

unified standardization processing is carried out on the original data to obtain basic data;

and obtaining analysis data by carrying out data fusion on the basic data. The application extracts the data meeting the requirements by carrying out two processing flows of unified standardization processing and data fusion on the original data, including data content and data format, thereby being capable of rapidly providing comprehensive data for assisting clinical diagnosis for severe medical treatment.

Preferably, the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;

the vital sign monitoring equipment comprises a multi-parameter monitor, a hemodynamic monitor, a blood gas analyzer, an intracranial pressure monitor, an electroencephalogram monitor, a urine dynamic monitor and a fetal heart monitor;

the vital sign support equipment comprises a breathing machine, a continuous hemodialysis machine, an infusion pump, an aortic balloon counterpulsation pump and an external model pulmonary oxygenation system;

the unstructured data comprises medical images, videos, voices and files;

the structured data comprises a table.

Preferably, the unified normalization process includes data computation and data storage;

the data computation includes primary computation and advanced computation; the data storage comprises a relational database, a key value type database, a document type database, a column type database and a graph database;

the primary calculation comprises addition, subtraction, duplication removal and combination operations;

the advanced calculation is to perform calculus, mathematical statistics and data classification operation on the data;

the relational database stores basic information and clinical information of patients;

the key value type database is used for caching data of high-frequency query of users, so that the query efficiency is improved, and high concurrent requests can be met;

the document type database stores and manages document type data, wherein the document type data is structured data in json, xml, bson format, so as to adapt to data formats required by data transmission, storage and calculation;

the column database realizes the storage of a large amount of severe data generated in a short time, and provides data batch and efficient inquiry;

the graph database abstracts the relations among diseases, symptoms, examination items, physical conditions, treatment means, post-cure care entities, symptoms and diseases, diseases and examination items, diseases and treatment means, physical conditions and treatment means, and displays a complicated relation network among severe data.

Preferably, the data calculation comprises unified data conversion and unified data processing;

the unified data conversion is needed before the unified data processing operation, so that the computer is easy to process and recognize; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;

the unified data processing is used for solving the data quality problem, including data deletion, data repetition and data abnormality;

aiming at the problem of data missing, the following operations are respectively carried out according to specific needs: deleting the whole record of missing data, filling the mean value or the median, carrying out rationality reasoning according to known data and verifying the reasonent filling from the rest data, carrying out box-dividing and mode-taking filling on the data, training a machine learning algorithm by utilizing the complete data record and filling by using a predicted value, carrying out collaborative filtering filling on the patient data, and carrying out data regression filling;

for data repetition, if the data is completely repeated, reserving any piece of data record; if the data content is mostly repeated, comparing the repeated data, and reserving high-quality data records; if the data content is a small part of repetition, keeping a record of the repeated data; because some data in severe patients belong to abnormal values, compared with normal people, typical abnormal values are reserved, and compared experiments are carried out as data with obvious characteristics, so that the relation between the characteristics and diseases, symptoms and diseases and the relation between the diseases and treatment modes are explored;

the data storage divides the data into hot data, warm data and cold data according to the frequency of the data access, and stores the hot data, the warm data and the cold data into different databases respectively;

hot data is cached by adopting a Redis database, warm data is stored by adopting MySQL or MongoDB or HBase, and cold data is stored by adopting HDFS;

the data of each database is extracted, and after conversion, the data is stored and managed by adopting an elastic search database, and the data is used for storing data of digital, character strings, texts, voice and image video types, so that the sharing of the data can be facilitated, including the consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed. The application aims to perform effective treatment on the data and repair or discard the data with problems, avoid misleading in the use process, and simultaneously perform standardized storage on the data so as to improve query retrieval timeliness in the use process as much as possible.

Preferably, the data unified standardization process further comprises a process of converting unstructured data into structured data and extracting characteristic attributes of patients.

Preferably, the unstructured data is obtained from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;

the text data is processed according to the following flow:

obtaining subjects, predicates, objects, stationary words, idioms and idioms in sentences by using word segmentation and grammar analysis technology,

analyzing the meaning expressed by the text by adopting a BERT or GPT tool, and summarizing the summary of the expressed meaning, namely using a plurality of real words to express the core meaning of a sentence or a section of speech,

summarizing text keywords of a plurality of severe patients and performing classified coding,

extracting the required characteristics of a patient to perform data structuring;

and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data. The text processing method is suitable for the field of severe medical treatment, and mainly because the diagnosis and the medication list made by doctors have the characteristics of stronger terminology and more consistent words, and the text data can be extracted rapidly according to the characteristics.

Preferably, the data fusion comprises the steps of carrying out logic judgment and reasoning on basic data, carrying out machine learning and deep learning, analyzing the data, and helping related personnel to find rules, associations and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;

the data prediction predicts the trend of the existing data waveform according to the previous data waveform, so that a doctor can make accurate pre-judgment in advance;

the data anomaly detection is to identify abnormal points or wavebands from time sequence waveforms, and intervene in deviation in time so as to expect the waveform to recover to a normal period;

the event early warning alarm is formed by adding certain industry rule logic or a verified and matured algorithm on the basis of data prediction and data anomaly detection, and is used for early warning alarm on abnormal behaviors which endanger life safety;

the patient body evaluation is based on basic information, past related medical history, current symptoms and treatment information after admission of the patient, and is combined with the abundant experience of clinicians learned by a computer, and the computer makes an overall evaluation on the patient body.

Preferably, the data fusion is performed as follows:

inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;

data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, tables and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;

the method comprises the steps of freely fusing data in different forms according to requirements, and acquiring all data acquired by fusion by default, specifically, acquiring basic information of a patient from a structural database, gradually expanding the fused data by taking the basic information as a base point, and aligning the basic information of the patient for other structural data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time sequence waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves, carrying out summary structuring on text data, carrying out structural alignment on calculated values with index, and providing positions of original data;

according to the structured data alignment principle, unstructured data is firstly structured and then structured and aligned, and multi-source data is integrated into a structured data table with distinct strands, so that the management and the application in the future are facilitated;

the data flow chart is drawn from the source data to the structured data elements in the data table in each step of data passing, so that the places where the data come from and where the data come from are clearly seen, and the tracing and the checking of the data when problems occur are facilitated; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.

Preferably, the method further comprises the step of visually and clearly displaying the distribution, the change and the association of the analysis data through visual display of the analysis data for the view display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;

according to specific functional requirements, interacting with a user, and displaying required data, wherein the required data comprises physical sign data of a specific patient at a specific time, a data report of the patient in a period of time, mathematical statistics of all or part of guardianship equipment, and influence factors of a certain guardian equipment on other life support equipment;

and displaying clinical data if the source data is displayed, and displaying simulation data if a exploratory experiment is performed on a certain assumed value, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.

On the other hand, also discloses a severe medical big data processing module, which comprises the following modules:

the data module is used for acquiring multi-element high-dimensional severe medical original data;

the unified processing module is used for carrying out unified standardization processing on the original data to obtain basic data;

the analysis module is used for obtaining analysis data by carrying out data fusion on the basic data;

and the display module is used for carrying out view display on the analysis data.

The application has the following beneficial effects:

1. according to the application, through carrying out unified standardized processing and data fusion on the original data, the data meeting the requirements is extracted to obtain the data which comprises the data content and the data format, so that comprehensive data for assisting clinical diagnosis can be rapidly provided for severe medical treatment;

2. the application aims to perform effective treatment on the data and repair or discard the data with problems, avoid misleading in the use process, and simultaneously perform standardized storage on the data, thereby improving query retrieval timeliness in the use process as much as possible;

3. the text processing method is suitable for the field of severe medical treatment, and mainly because the diagnosis and the medication list made by doctors have the characteristics of stronger terminology and more consistent words, and the text data can be extracted rapidly according to the characteristics.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of example 1;

FIG. 2 is a schematic diagram of text-type data;

FIG. 3 is a schematic diagram of data fusion;

fig. 4 is a schematic diagram of example 2.

Detailed Description

In order to clearly illustrate the technical characteristics of the scheme, the application is explained in detail by the following specific embodiments.

In a first embodiment, as shown in fig. 1, a method for processing critical medical big data includes the steps of:

s101, acquiring multi-element high-dimensional severe medical original data;

the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;

the unstructured data comprise medical images, videos, voices and files, the medical images comprise X-ray images, radionuclide images, severe ultrasonic images, magnetic resonance images, pathological images and the like, and the files comprise vital sign data reports, medical diagnosis books, medical tool books, examination report sheets and the like;

the structured data includes tables including patient basic information tables, clinical information tables, and the like.

S102, carrying out unified standardization processing on original data to obtain basic data;

the unified standardization processing comprises data calculation and data storage;

The data calculation comprises unified data conversion and unified data processing;

the data of each database is extracted, and after conversion, the data is stored and managed by adopting an elastic search database, and the data is used for storing data of digital, character strings, texts, voice and image video types, so that the sharing of the data can be facilitated, including the consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed.

The method also comprises the process of converting unstructured data into structured data and extracting characteristic attributes of patients in the unified data standardization process.

The unstructured data is acquired from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;

as shown in fig. 2, text data is processed according to the following procedure:

s1021, obtaining subjects, predicates, objects, stationary phases, idioms and idioms in sentences by using word segmentation and grammar analysis technology,

s1022 analyzes the text-expressed meaning using BERT or GPT tool, and summaries the expressed meaning, i.e. expresses a sentence or core meaning of a paragraph using several real words,

s1023, summarizing text keywords of a plurality of severe patients and carrying out classification coding,

s1024, extracting the characteristics required by the patient and carrying out data structuring;

and for the image video data, performing target detection, semantic separation and abstract extraction on the image video data by using Yolo, U-Net and SAT models, obtaining the object and medical description in the image video, converting the object and medical description into characters, and performing the operation on the text data.

S103, obtaining analysis data by carrying out data fusion on the basic data.

The data fusion comprises the steps of carrying out logic judgment and reasoning on basic data, carrying out machine learning and deep learning, analyzing the data, and helping related personnel to find the rules, association and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;

As shown in fig. 3, the data fusion is performed as follows:

s1031, inquiring basic information data of a patient, acquiring an identification number of the patient, summarizing all information of the patient dispersed in each data source according to the identification number, and extracting the information to a temporary intermediate layer;

s1032, data cleaning is carried out on the acquired data, the contained data forms are checked, the data forms of the patient comprise texts, voices, image videos, forms and time sequence waveforms, and different patients possibly have different data forms; under the condition of no special requirement, integrating complete and clear data;

s1033, freely fusing data in different forms according to the need, wherein all data acquired by fusing are defaulted, specifically, basic information of a patient is acquired from a structured database, the fused data is gradually expanded by taking the basic information as a base point, and the basic information of the patient is aligned to other structured data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time sequence waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves, carrying out summary structuring on text data, carrying out structural alignment on calculated values with index, and providing positions of original data;

s1034, integrating the multi-source data into a structured data table with distinct strands according to a structured data alignment principle in a mode that unstructured data is subjected to structuring operation and then structured alignment, so that convenience is brought to management and application in the future;

s1035, drawing a data flow chart from source data to structured data elements in a data table, and clearly seeing where the data comes and where the data comes, so that the tracing and the checking are convenient when the data has problems; by means of fusion of the technologies, multi-element fusion of data is achieved, and a data foundation is laid for data analysis and model training.

S104 view presentation of analysis data

The view display visually and clearly displays the distribution, change and association of the analysis data through visual display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;

In a second embodiment, as shown in fig. 4, a severe medical big data processing module includes the following modules:

the data module 201 is used for acquiring multi-element high-dimensional severe medical original data;

the unified processing module 202 is configured to perform unified normalization processing on the original data to obtain basic data;

the analysis module 203 is configured to obtain analysis data by performing data fusion on the basic data;

and the display module 204 is used for performing view display on the analysis data.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A severe medical big data processing method is characterized in that: the method comprises the following steps:

acquiring multi-element high-dimensional severe medical raw data;

and obtaining analysis data by carrying out data fusion on the basic data.

2. The method for processing severe medical big data according to claim 1, wherein: the original data is divided into a hospital information system, a clinical information system, electronic medical record data and vital sign monitoring and supporting equipment data according to application scenes; dividing the data into unstructured data and structured data according to data storage;

the unstructured data comprises medical images, videos, voices and files;

the structured data comprises a table.

3. The method for processing severe medical big data according to claim 1, wherein: the unified standardization processing comprises data calculation and data storage;

4. The method for processing severe medical big data according to claim 1, wherein: the data calculation comprises unified data conversion and unified data processing;

the unified data conversion is required before the unified data processing operation so that the computer can easily process and identify the data; the unified data conversion comprises data format conversion, wherein Txt files are converted into CSV files, CSV files are converted into Json, and Json is converted into Parque; data type conversion, converting character strings into digital types and converting segmented data into categories;

extracting data of each database, converting, and storing and managing the data by adopting an elastic search database, wherein the data are used for storing data of digital, character strings, texts, voice and image video types so as to facilitate sharing of the data, including consulting, inquiring and using; the original data before conversion is reserved and is used as a later traceable data source; the converted data is stored in an elastic search as a data source of a subsequent task, and intermediate data of the processing process is also stored if needed.

5. The method for processing severe medical big data according to claim 4, wherein:

6. The method for processing severe medical big data according to claim 5, wherein: the unstructured data is acquired from the elastic search and the following operations are performed: text and voice can be converted into text type data by deep speech or Whisper, and the text and the Whisper can be uniformly processed as the text type data;

the text data is processed according to the following flow:

7. The method for processing severe medical big data according to claim 1, wherein: the data fusion comprises the steps of carrying out logic judgment and reasoning, machine learning and deep learning on basic data, analyzing the data, and helping related personnel to find rules, associations and modes in the data; realizing data prediction, data anomaly detection, event early warning and alarming and patient body assessment;

8. The method for processing severe medical big data according to claim 7, wherein: the data fusion is performed as follows:

the method comprises the steps of freely fusing data in different forms according to requirements, and acquiring all data acquired by fusion by default, specifically, acquiring basic information of a patient from a structural database, gradually expanding the fused data by taking the basic information as a base point, and aligning the basic information of the patient for other structural data; for unstructured data of text, language and image video, firstly adopting an unstructured data structuring method, and then aligning the data; for time-series waveform data, generating an analysis report for a period of time, including calculating flow-index of respiratory waves; carrying out summary structuring on text data, and carrying out structural alignment on calculated values with index; and providing the location of the original data;

9. The method for processing severe medical big data according to claim 1, wherein: the visual display system also comprises visual display of the analysis data, wherein the visual display visually and clearly displays the distribution, the change and the association of the analysis data through visual display of the analysis data; through the visualization of the data, the related personnel can better understand and utilize the data;

if the source data is displayed, displaying clinical data; if a exploratory experiment is performed on a certain assumed value, simulation data is displayed, wherein the simulation data is obtained by simulation according to the data trend presented by a patient under the parameter.

10. A severe medical big data processing module, which is characterized in that: the device comprises the following modules: