CN116525124B - Data standardized management method and system for medical big data - Google Patents

Data standardized management method and system for medical big data Download PDF

Info

Publication number
CN116525124B
CN116525124B CN202310799572.XA CN202310799572A CN116525124B CN 116525124 B CN116525124 B CN 116525124B CN 202310799572 A CN202310799572 A CN 202310799572A CN 116525124 B CN116525124 B CN 116525124B
Authority
CN
China
Prior art keywords
data
medical
standard
data source
thematic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310799572.XA
Other languages
Chinese (zh)
Other versions
CN116525124A (en
Inventor
汪榕
高山
简义鹏
胡丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC Big Data Research Institute Co Ltd
Original Assignee
CETC Big Data Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC Big Data Research Institute Co Ltd filed Critical CETC Big Data Research Institute Co Ltd
Priority to CN202310799572.XA priority Critical patent/CN116525124B/en
Publication of CN116525124A publication Critical patent/CN116525124A/en
Application granted granted Critical
Publication of CN116525124B publication Critical patent/CN116525124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data management, in particular to a data standardized management method and system for medical big data. The method comprises the following steps: s100: acquiring original medical data of each type from a data source through different data acquisition modes, and storing the original medical data into a medical original database; s200: performing standardization processing on the original medical data to obtain standard medical data, and storing the standard medical data into a medical standard database; s300: determining a medical theme object and the related dimension thereof, generating a medical theme object table, and calling standard medical data of the related dimension of the medical theme object from a medical standard database to fill the medical theme object table; s400: acquiring a medical thematic scene of a data service, generating a medical thematic table, determining a medical theme object to be focused according to the medical thematic scene, calling related standard medical data from the medical theme object table, performing data fusion, and adding the data fusion to the medical thematic table.

Description

Data standardized management method and system for medical big data
Technical Field
The invention relates to the technical field of data management, in particular to a data standardized management method and system for medical big data.
Background
The data resource classification in the medical field specifically comprises image type data, form type data, text type data, series type data, filling type data, wearing type data, database data and the like. The medical data are not only stored in the information islands in the hospital in a scattered manner, but also face the problems of diversified data types, uneven data quality levels, irregular data standards, lack of relevance of data and the like.
In the application scenario of the topics in the medical field, different data from different data sources need to be used, and the medical data are stored in various medical and health institutions in various sources, types and formats, and the medical and health institutions are not interconnected and not communicated, so that the medical data are discontinuous, incomplete and the like. In the prior art, although the data of each medical and health institution are integrated, the data are simply classified after being integrated, for example, the data are simply classified according to the type of the data, the source time of the data and the source side of the data, however, when the medical data are required to be used in a specific application scene, the data are required to be used in multiple parties, for example, the historical medical record, the CT image and the body index data of a certain patient, the data types are different, and the source time is different, so that the data are required to be retrieved from massive data when the data are required to be used.
Disclosure of Invention
The invention aims to provide a data standardized treatment method for medical big data, which can collect and integrate the medical data and improve the use efficiency of the medical data.
The basic scheme provided by the invention is as follows: a data normalization method for medical big data, comprising the steps of:
s100: acquiring original medical data of each type from a data source through different data acquisition modes, and storing the original medical data into a medical original database;
s200: performing standardization processing on the original medical data to obtain standard medical data, and storing the standard medical data into a medical standard database;
s300: determining a medical theme object and a related dimension thereof, generating a medical theme object table, calling standard medical data of the related dimension of the medical theme object from a medical standard database, filling the medical theme object table, and storing the medical theme object table into the medical theme database;
s400: acquiring a medical thematic scene of a data service, generating a medical thematic table, determining a medical theme object to be concerned according to the medical thematic scene, calling related standard medical data from the medical theme object table, performing data fusion, adding the medical theme object to the medical thematic table, and adding the medical thematic table to a medical thematic database;
S500: determining a data source of standard medical data in a medical thematic table, wherein the data source comprises a primary data source, a secondary data source and a tertiary data source with sequentially reduced trust level;
s501: after standard medical data is added into a medical thematic table, the data proportion of the standard medical data of a primary data source, a secondary data source and a tertiary data source in the medical thematic table is respectively determined;
s502: when the ratio of standard medical data of a primary data source in the medical thematic table is lower than that of any one of a secondary data source and a tertiary data source, acquiring the times of occurrence of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, and judging the data credibility of the secondary data source or the tertiary data source according to the times of occurrence of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, wherein the more the times of occurrence are, the higher the data credibility is;
s503: when the data reliability of the secondary data source or the tertiary data source is high, the standard medical data of the secondary data source or the tertiary data source is reserved, and when the data reliability of the secondary data source or the tertiary data source is low, the standard medical data from the same relevant dimension of the secondary data source or the tertiary data source is called from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database and added to the medical thematic table.
The principle and the advantages of the invention are as follows: firstly, the original medical data is acquired from a data source of the medical data by various data acquisition modes. Raw medical data refers to data obtained directly from a data source where the medical data is stored, and the structure and the content of the data are consistent with those of the data source. After the original medical data is obtained, standard medical data is obtained after the original medical data is subjected to standardization processing because the original medical data may have defects and the types and formats of the original medical data obtained from different data sources are different, and the standard medical data is stored in a medical standard database. And when standard medical data is obtained, the original data is stored through the medical original database, so that the data can be traced conveniently. And then establishing a medical theme database according to the standard medical data in the medical standard database. By determining the medical subject object and the associated dimension thereof, the medical subject object refers to the subject of data collection in the medical big data field, such as a patient subject, a hospital subject and the like, and the associated dimension under the patient subject can be the history medical record of the patient, the physical examination report of the patient and the like. And extracting data required by the medical theme object from the medical standard database according to the dimension associated with the medical theme object, and completing the data collection in the form of the theme to obtain various data under each theme. And then, according to the medical thematic scene, determining a medical theme object required by the medical thematic scene, and filling medical data associated with the thematic scene in the medical theme object table into the medical thematic table.
Compared with the prior art, medical data is collected through different dimensions. The medical data association dimension of the individual is embodied by the medical subject object, and the overall medical data association dimension is embodied by the medical subject object. And collecting the medical data, serving a specific medical scene, and finishing the arrangement and application of the mass data.
Further, the step S200 includes the steps of:
s210: performing metadata design, field specification design, field mapping design and snowflake type architecture design on the original medical data to obtain standard medical data;
s220: and performing defect identification on the standard medical data, repairing the identified standard medical data with defects, and storing the repaired standard medical data into a medical standard database.
Metadata design, field specification design, field mapping design and snowflake type architecture design are carried out on the original medical data. Metadata design is data information defining a table structure field, and can be used as a summary of the table structure information, and specifically comprises metadata input and management. After the construction is completed, the method can be used for medical data catalog retrieval, medical data tracing and medical data asset statistics. The field specification is a data unit whose definition, identification, representation and running values are specified by a set of attributes, the function of which is to specify, on the one hand, the data information stored by the field and, on the other hand, the identifier, the data type, the identification format, the value field are the basis of the data exchange. The field mapping design aims at the scene that the description of the original table on the same service attribute is not uniform, and the original table data value needs to be replaced by uniform mapping through a standard specification naming rule. Snowflake-type architecture designs, meaning that one or more medical field dimension tables are not directly tied to medical fact tables, but are connected to medical fact tables through other medical dimension tables, become snowflake-type architectures. Standard medical data is obtained through four design methods, then defect identification and treatment are carried out on the standard medical data, and the data with defects are restored and then stored in a medical standard database.
Further, the step S300 includes the steps of:
s301: determining the classification dimension of the medical theme object, and constructing a label system of each classification dimension;
s302: standard medical data is obtained from a medical standard database according to the label system, and the standard medical data is calculated according to a preset calculation logic of the label system and then is filled into a medical theme object table.
Different classification dimensions under different subjects, for example, classification dimensions under a subject may include inspection records, admission records, medical image reports, outpatient emergency medical records, sleep data, blood oxygen data, athletic data, and the like. The label is extracted to the concrete representation of the data under each classification dimension, standard medical data is extracted from a medical standard database according to a label system, and the standard medical data is calculated through a preset calculation logic of the label system and then is filled into a medical main object table. Medical data of different tag systems has different computational logic, for example taking the average over a month, or taking the latest data, or taking the data source as data of a three-dimensional hospital. And collecting the medical data by various types of subject objects to complete data fusion.
Further, the step S400 includes the steps of:
s401: determining associated subject matter objects of a medical service scene;
s402: and acquiring standard medical data related to the medical service scene from a medical theme object table of the related theme object according to the theme object related to the service scene, and filling the standard medical data into the medical theme table.
The medical service scene is a specific application scene of medical data, a medical service scene table is established according to application requirements, for example, when a certain disease of a certain patient needs to be analyzed, a subject object associated with the medical service scene comprises a patient subject and a disease subject, the associated standard medical data is a patient body index in a medical main body table of the patient subject, a historical medical record of the disease, a medication mode, a treatment mode and the like of the disease in the disease subject table. The associated standard medical data is obtained from the medical standard topic table and filled into the medical topic table.
Further, the method also comprises the following steps:
s600: when the acquired original medical data is updated, updating the data in the medical original database, the medical standard database, the medical theme database and the medical theme database.
When the original medical data is updated, the data in each database is updated, and the timeliness of the data is ensured.
The invention also discloses a data standardized treatment system for the medical big data, which comprises a data acquisition module, a data processing module, a data association module, a data fusion module, a medical original database, a medical standard database, a medical theme database, a data source determination module, a duty ratio determination module, a frequency acquisition module and a data updating module;
and a data acquisition module: the method comprises the steps of acquiring original medical data of various types from a data source by using different data acquisition modes, and storing the original medical data into a medical original database;
and a data processing module: the medical standard database is used for carrying out standardized processing on the original medical data to obtain standard medical data, and storing the standard medical data into the medical standard database;
and a data association module: the medical theme management system comprises a medical theme database, a medical theme object table, a medical theme object management system and a medical theme management system, wherein the medical theme database is used for storing medical theme objects and related dimensions of the medical theme objects;
And a data fusion module: the method comprises the steps of obtaining a medical thematic scene of a data service, generating a medical thematic table, determining a medical theme object to be concerned according to the medical thematic scene, calling related standard medical data from the medical theme object table, carrying out data fusion, adding the medical theme object to the medical thematic table, and adding the medical thematic table to a medical thematic database;
the data source determining module is used for determining data sources of standard medical data in the medical thematic table, and the data sources comprise a primary data source, a secondary data source and a tertiary data source with sequentially reduced trust level;
the system comprises a duty ratio determining module, a data processing module and a data processing module, wherein the duty ratio determining module is used for respectively determining the data duty ratio of standard medical data of a primary data source, a secondary data source and a tertiary data source in a medical thematic table after standard medical data is added into the medical thematic table;
the frequency acquisition module is used for acquiring the frequency of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables when the duty ratio of the standard medical data of the primary data source in the medical thematic table is lower than that of the standard medical data of any one of the secondary data source or the tertiary data source, and judging the data credibility of the secondary data source or the tertiary data source according to the frequency of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, wherein the more the frequency of occurrence is, the higher the data credibility is;
And the data updating module is used for reserving standard medical data of the secondary data source or the tertiary data source when the data reliability of the secondary data source or the tertiary data source is high, and calling the standard medical data from the secondary data source or the tertiary data source with the same relevant dimension of the secondary data source or the tertiary data source from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database when the data reliability of the secondary data source or the tertiary data source is low, and adding the standard medical data to the medical thematic table.
Further, the data processing module comprises a data design module
The data design module is used for carrying out metadata design, field specification design, field mapping design and snowflake type architecture design on the original medical data to obtain standard medical data;
and the defect repair module is used for carrying out defect recognition on the standard medical data, repairing the recognized standard medical data with defects and storing the repaired standard medical data into the medical standard database.
Further, the data association module includes a theme dimension module
The topic dimension module is used for determining the classification dimension of the medical topic object and constructing a label system of each classification dimension;
the data calculation module is used for acquiring standard medical data from the medical standard database according to the label system, calculating the standard medical data according to preset calculation logic of the label system, and filling the standard medical data into the medical theme object table.
Further, the data fusion module comprises a scene object module and an object fusion module;
scene object module: a subject object for determining an association of a medical business scenario;
and the object fusion module is used for acquiring standard medical data related to the medical service scene from the medical theme object table of the related theme object according to the theme object related to the service scene and filling the standard medical data into the medical theme table.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of a data standardized governance method for medical big data according to the present invention.
Detailed Description
The following is a further detailed description of the embodiments:
an example is substantially as shown in figure 1:
the data standardized treatment method for the medical big data comprises the following steps:
s100: and acquiring the original medical data of each type from the data source through different data acquisition modes, and storing the original medical data into a medical original database.
Specifically, in this embodiment, the original medical data includes image data, form data, text data, series data, filling data, wearing data, and database data, which are scattered in the medical informatization systems of various hospitals, clinics, and sanitary stations. The data acquisition mode comprises an interface data acquisition mode, a file data acquisition mode, a JDBC data acquisition mode and a crawler data acquisition mode. The interface data acquisition mode is mainly used for acquiring API interface data and Kafka streaming data in real time. The document data collection method uses OCR (optical character recognition) and NLP (natural language processing) technology to perform preprocessing, converts a scanned item of medical images or the like into digitized text, and then analyzes the content of such document to extract key information to form structured data. The JDBC data acquisition mode refers to the mode of acquiring data of a transactional database at the bottom layer of a medical informatization system by using DataX, and the crawler data acquisition mode refers to the mode of crawling, analyzing and storing medical Internet data resources by using a crawler technology.
After the raw medical data is acquired, the raw medical data is stored in a medical raw database. The structure and content of the raw medical data are consistent with those in the acquired medical information system. Medical service data (such as examination records, admission records, medical image reports, emergency medical records of an outpatient department, and the like) which are acquired in static batch, health sensor equipment data (such as sleep data, blood oxygen data, exercise data, and the like) which are acquired in real time, important health medical data of personal self-filling data, and the like. The original medical data is stored by way of an original table.
The original table is named after the six parts A, B, C, D, E, F, each of which is indicated by uppercase letters, and the parts are underlined and connected as shown below.
Store name_data source category_hospital information system name_health care service name_raw table name_delta/full.
A represents the name of a database accessed with data, and the medical original database is named ODS.
And B represents the name of the class of the data mine, and specifically comprises medical health, public health, wearing equipment, internet medical treatment and the like.
C represents the name of a hospital information system, such as an electronic medical record system, a hospital feel management system, a clinical examination system, a medical imaging system and the like, and is composed of the initial of the hospital information system.
And D is a health medical service name, such as population information data, medical service data, hospital operation data, public health data and disease prevention and control data.
E represents the original table name.
And F is used for distinguishing the data acquisition modes of the increment table and the full table, wherein the increment acquisition is represented by ADD, and the full acquisition is represented by ALL.
S200: and carrying out standardization processing on the original medical data to obtain standard medical data, and storing the standard medical data into a medical standard database. S200 specifically comprises the following steps:
s210: and performing metadata design, field specification design, field mapping design and snowflake type architecture design on the original medical data to obtain standard medical data.
Specifically, metadata design is data information defining a table structure field, can be used as a summary of the table structure information, specifically comprises metadata input and management, can be used for medical data target retrieval, medical data tracing and medical data asset library statistics after construction is completed, and specifically relates to a specification shown in table 1.
TABLE 1
The field specification is designed to be a data unit whose definition, identification, representation and permission values are specified by a set of attributes, which function on the one hand to try out the data information stored in the specification field, and on the other hand the identifier, data type, identification format, value field are the basis of the data exchange. The specific specification design is shown in table 2.
Table 2 patient age field constraint table
Standard defined field Limiting value
Value type INT
Value range [0,150]
Whether or not it can be empty Whether or not
The field mapping design aims at the scene that the description of the original table on the same service attribute is not uniform, and standard specification naming rules are needed to uniformly map and replace the data value of the original table.
Snowflake-type architecture designs, when one or more medical dimension tables are not directly connected to a medical fact table, but are connected to a medical fact table through other medical dimension tables, are illustrated as if multiple snowflakes were connected together, thus forming a snowflake-type architecture.
The table designations stored in the medical standards database consist of five parts A, B, C, D, E. Each section is designated by a military capital letter, and the sections are connected by underlines.
Store name_hospital name_department name_information system name_original table name.
A represents the name of a database storing access data, and the medical standard database is named STD.
B represents a hospital name.
C represents each department unit in the hospital and consists of the first letter detected by the department unit.
D is the name of the medical informatization system.
E represents the named original table name in each informatization system before data acquisition.
The following requirements should be satisfied for medical standard database construction:
1. The tables in the medical original database are stored in the medical standard database one by one after washing, conversion, field screening and standardization operation.
2. The table naming of the medical standard library follows the naming convention.
3. The table field of the medical standard data has a unique data mapping relation with the table field of the medical original database.
4. The data of the medical standard database should be obtained from the data in the medical original database by adopting data operations such as direct copying, cleaning conversion or data standardization.
5. When an old standard library table is established, the metadata of the medical standard library is maintained.
6. When the medical standard library data is maintained, field data quality auditing is carried out, and data operation is readjusted for fields which do not meet the quality requirements.
S220: and performing defect identification on the standard medical data, repairing the identified standard medical data with defects, and storing the repaired standard medical data into a medical standard database.
Specifically, in this embodiment, the data defect specifically includes missing data, data error, and association error.
The record with the attribute value marked as blank or "-" is that there is a missing value, and the step of processing the missing value includes the steps of:
step 1: determining a missing value range, calculating a missing proportion for each field of the medical informatization system, and adopting different customization strategies according to the missing proportion and the field importance.
Step 2: and filling the data with high importance and low deletion rate, such as the data with the deletion value of daily operation in hospitals, by a mean value method.
Step 3: and if the related data cannot be obtained, filling the missing value manually.
Step 4: for the data with low importance and low deletion rate, such as attendance deletion data of hospital personnel on duty, the method of deletion supplement filling is adopted to carry out simple filling or no treatment.
Step 5: for the data inoculated by patients in hospitals, which has low index importance and high deletion rate, the current data is backed up, and the unnecessary fields are deleted.
The method for filling the vacancy values comprises the following steps:
1. filling the missing values is presumed with medical business knowledge or expert experience.
2. The missing values are filled in by the calculation results (mean, median, mode, etc.) of the same index of the medical treatment.
3. And filling the missing values with the calculation results of different indexes of medical treatment. If the hospital evaluates that the age field of the patient is missing but has a citizen identification card number, the age data can be extracted from the citizen identification card number.
The error data includes a format content error and a logical error.
The format content error is handled as follows:
When the display formats of time, date, numerical value, full half angle and the like are inconsistent, the formats are unified by adopting a manual collection/user filling mode.
When the content contains characters which are not present, the problem should be found out in a semi-automatic inspection and semi-manual mode, and the unnecessary characters are removed.
The data content should not be identical to the field content, and different processing modes are adopted according to the problem type.
The logical errors are handled as follows:
and (5) de-duplication: the repeated value should be determined using the field similarity recognition after the format content is cleaned, and the repeated value is deduplicated.
Outliers: and identifying outliers of which the data value exceeds the range of the data value range, and removing.
Correcting contradiction contents: based on the source of the data field, the reliability of the field information is judged, and the unreliable field is removed or reconstructed.
The processing method for the association errors is as follows:
the data from the plurality of medical informatization systems should be subjected to data relevance verification, and the relationship among the data dictionary, the metadata and the data is selected and managed through analysis for the data which does not meet the integrity constraint. The management error data cleansing method is shown in table 3.
Table 3 medical data correlation cleaning method
S300: determining a medical theme object and the related dimension thereof, generating a medical theme object table, calling standard medical data of the related dimension of the medical theme object from a medical standard database, filling the medical theme object table, and storing the medical theme object table into the medical theme database.
The medical theme database aggregates and integrates medical service data (such as examination records, admission records, medical image reports, emergency medical records of clinic and the like), health sensor data (such as sleep data, blood oxygen data and exercise data) acquired in real time, important health medical data self-filled by individuals in real time and the like. The sharing, integration, storage, updating and service of a plurality of medical data object models are realized. Specifically, in this embodiment, the medical subjects include a patient subject, a hospital subject, a medicine subject, a doctor subject, and a disease subject.
S300 includes the steps of:
s301: determining the classification dimension of the medical theme object, and constructing a label system of each classification dimension.
S302: standard medical data is obtained from a medical standard database according to the label system, and the standard medical data is calculated according to a preset calculation logic of the label system and then is filled into a medical theme object table.
In particular, different subjects have different classification dimensions, here exemplified by patient dimensions. After determining the personal theme of the patient, the classification dimension under the theme object can comprise history medical records, blood sugar data, blood oxygen data, sleep data and movement data, and the data of different classification dimensions come from different data sources and are stored in different medical standard data tables. After the medical theme object and the classification dimension thereof are determined, system labels under different classification dimensions are acquired, the system labels refer to specific medical data acquisition modes, different calculation logics are preset, standard medical data are calculated according to the calculation logics, and then the standard medical data are filled into a medical theme object table. Specifically, different medical data have different calculation modes. For example, for emergency outpatient medical records, when the data source has a three-dimensional hospital and a health clinic, standard medical data from the three-dimensional hospital is used. Or the exercise data acquired by the health sensor data is based on the average value of the last month, and the blood oxygen data acquired by the health sensor is based on the latest data of the time node.
The medical body table is named by A, B, C, D four parts, representing the subject hierarchy, each part being indicated by uppercase letters, each part being connected by underlines, as follows:
Store name_topic class_topic object_implementation table name.
The fixed prefix of the A-standard medical topic table is commonly named DWR.
B represents topic classification, named by English abbreviations.
C represents specific business object entities, such as patients, diseases and hospitals, named by English abbreviations.
D represents a topic name.
S400: acquiring a medical thematic scene of a data service, generating a medical thematic table, determining a medical theme object to be focused according to the medical thematic scene, calling related standard medical data from the medical theme object table, performing data fusion, and adding the data fusion to the medical thematic table.
S400 specifically comprises the following steps:
s401: determining associated subject matter objects of a medical service scene;
s402: and acquiring standard medical data related to the medical service scene from a medical theme object table of the related theme object according to the theme object related to the service scene, and filling the standard medical data into the medical theme table.
Specifically, the medical service scene is a specific application scene of medical data, a medical service scene table is established according to application requirements, for example, when a certain disease of a patient needs to be analyzed, a subject object associated with the medical service scene comprises a patient subject and a disease subject, the associated standard medical data is a patient body index in a medical main body table of the patient subject, a historical medical record about the disease, a medication mode, a treatment mode and the like of the disease in the disease subject table. The associated standard medical data is obtained from the medical standard topic table and filled into the medical topic table.
The nomenclature of the medical thematic table consists of A, B, C three parts, representing the thematic hierarchy, each part being denoted by uppercase letters, each part being connected by underlines, as follows:
store the name of the store-thematic packet-service table.
A represents a fixed prefix of a medical thematic table, and is named as DM in a unified way.
B represents a thematic group, and is named by English name or abbreviation.
C represents the name of the service list and consists of proper English names or short names.
S600: when the acquired original medical data is updated, updating the data in the medical original database, the medical standard database, the medical theme database and the medical theme database.
The embodiment also discloses a data standardized treatment system for the medical big data, which comprises a data acquisition module, a data processing module, a data association module, a data fusion module, a medical original database, a medical standard database, a medical theme database and a medical theme database;
and a data acquisition module: the method comprises the steps of acquiring original medical data of various types from a data source by using different data acquisition modes, and storing the original medical data into a medical original database;
And a data processing module: the medical standard database is used for carrying out standardized processing on the original medical data to obtain standard medical data, and storing the standard medical data into the medical standard database;
and a data association module: the medical theme management system comprises a medical theme database, a medical theme object table, a medical theme object management system and a medical theme management system, wherein the medical theme database is used for storing medical theme objects and related dimensions of the medical theme objects;
and a data fusion module: the method comprises the steps of obtaining medical thematic scenes of data service, generating a medical thematic table, determining medical theme objects needing to be concerned according to the medical thematic scenes, calling relevant standard medical data from the medical theme object table, carrying out data fusion, and adding the medical theme objects to the medical thematic table.
The data processing module comprises a data design module
The data design module is used for carrying out metadata design, field specification design, field mapping design and snowflake type architecture design on the original medical data to obtain standard medical data;
and the defect repair module is used for carrying out defect recognition on the standard medical data, repairing the recognized standard medical data with defects and storing the repaired standard medical data into the medical standard database.
The data association module comprises a theme dimension module
The topic dimension module is used for determining the classification dimension of the medical topic object and constructing a label system of each classification dimension;
the data calculation module is used for acquiring standard medical data from the medical standard database according to the label system, calculating the standard medical data according to preset calculation logic of the label system, and filling the standard medical data into the medical theme object table.
The data fusion module comprises a scene object module and an object fusion module;
scene object module: a subject object for determining an association of a medical business scenario;
and the object fusion module is used for acquiring standard medical data related to the medical service scene from the medical theme object table of the related theme object according to the theme object related to the service scene and filling the standard medical data into the medical theme table.
Example two
The difference between this embodiment and the first embodiment is that in this embodiment, the method further includes the following steps:
s500: determining a data source of standard medical data in a medical thematic table, wherein the data source comprises a primary data source, a secondary data source and a tertiary data source with sequentially reduced trust level;
s501: after standard medical data is added into a medical thematic table, the data proportion of the standard medical data of a primary data source, a secondary data source and a tertiary data source in the medical thematic table is respectively determined;
S502: when the ratio of standard medical data of a primary data source in the medical thematic table is lower than that of any one of a secondary data source and a tertiary data source, acquiring the times of occurrence of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, and judging the data credibility of the secondary data source or the tertiary data source according to the times of occurrence of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, wherein the more the times of occurrence are, the higher the data credibility is;
s503: when the data reliability of the secondary data source or the tertiary data source is high, the standard medical data of the secondary data source or the tertiary data source is reserved, and when the data reliability of the secondary data source or the tertiary data source is low, the standard medical data from the same relevant dimension of the secondary data source or the tertiary data source is called from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database and added to the medical thematic table.
In this embodiment, the primary data source is a three-phase hospital, the secondary data source is a medical and health station, and the tertiary data source is a health device (such as a smart bracelet). The highest trust level is data from three hospitals, the next data from medical and health stations, and the lowest trust level is data from healthy equipment.
When the medical thematic table is established, the medical data proportion in the medical thematic table is determined, the standard medical data proportion from the primary data source, the medical standard data proportion from the secondary data source and the medical standard data proportion from the tertiary data source are respectively determined. And comparing the standard medical data duty ratio of the primary data source with the standard medical data duty ratio of the tertiary medical data source, and judging whether the medical standard data duty ratio of the secondary data source or the medical standard data duty ratio of the tertiary medical data source is higher than the standard medical data duty ratio of the primary data source, for example, in a medical thematic table, the data duty ratio from healthy equipment is higher than the data duty ratio from a three-dimensional hospital. At this time, the number of times the health device appears in other medical thematic tables is acquired. Here, the other medical thematic table is a medical thematic table which is not modified before the execution of step S603 and is directly obtained after the system data processing. And judging the credibility of the data from the tertiary data source or the secondary data source according to the times of the health equipment in other original medical tables. The greater the number of occurrences, the higher the confidence.
Firstly, in the preset trust level, the trust level of the primary data source, the secondary data source and the tertiary data source is sequentially reduced, when the data from the secondary data source or the tertiary data source is more in the medical thematic table, the data accounting for the data higher than the data of the primary data source may be due to errors in data retrieval, or a large amount of data may be newly generated by the secondary data source or the tertiary data source recently. The number of times that the standard data of the secondary data source or the tertiary data source appears in other medical thematic tables is obtained. If the number of occurrences is small, or the data of the secondary data source or the tertiary data source has never occurred recently, an error may occur in the data retrieval process. If the data has a certain data quantity, the data reliability can be judged according to the occurrence times. When the standard medical data of the secondary data source or the tertiary data source appear in a plurality of other medical thematic tables, the standard medical data of the secondary data source or the tertiary data is indicated to have certain credibility and can be directly used.
Meanwhile, if the secondary data source or the tertiary data source is newly added, the data is not considered to be used in the initial stage of new addition because the data is less frequently appeared in other medical thematic tables. If the generated data quantity is enough, the data quality is high enough, and the data area is wide enough, the generated data can be gradually displayed in other medical thematic tables in the subsequent process, the frequency of occurrence is gradually increased, and the reliability is also gradually improved.
Through the step, errors in the data calling process are found, and the credibility of standard medical data from the secondary data source and the tertiary data source is identified. And adding a buffer period to the newly added secondary data source and the newly added tertiary data source, wherein the newly added secondary data source and the newly added tertiary data source are found that the data quantity is enough, the data quality is excellent enough and the data surface is wide enough.
When the data reliability of the secondary data source or the tertiary data source is high, the standard medical data of the secondary data source or the tertiary data source is reserved, and when the data reliability of the secondary data source or the tertiary data source is low, the standard medical data from the same relevant dimension of the secondary data source or the tertiary data source is called from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database and added to the medical thematic table.
The embodiment also discloses a data standardization management system for medical big data, which comprises:
the data source determining module is used for determining data sources of standard medical data in the medical thematic table, and the data sources comprise a primary data source, a secondary data source and a tertiary data source with sequentially reduced trust level;
the system comprises a duty ratio determining module, a data processing module and a data processing module, wherein the duty ratio determining module is used for respectively determining the data duty ratio of standard medical data of a primary data source, a secondary data source and a tertiary data source in a medical thematic table after standard medical data is added into the medical thematic table;
the frequency acquisition module is used for acquiring the frequency of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables when the duty ratio of the standard medical data of the primary data source in the medical thematic table is lower than that of the standard medical data of any one of the secondary data source or the tertiary data source, and judging the data credibility of the secondary data source or the tertiary data source according to the frequency of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, wherein the more the frequency of occurrence is, the higher the data credibility is;
and the data updating module is used for reserving standard medical data of the secondary data source or the tertiary data source when the data reliability of the secondary data source or the tertiary data source is high, and calling the standard medical data from the secondary data source or the tertiary data source with the same relevant dimension of the secondary data source or the tertiary data source from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database when the data reliability of the secondary data source or the tertiary data source is low, and adding the standard medical data to the medical thematic table.
The foregoing is merely exemplary of the present application, and specific structures and features well known in the art will not be described in detail herein, so that those skilled in the art will be aware of all the prior art to which the present application pertains, and will be able to ascertain the general knowledge of the technical field in the application or prior art, and will not be able to ascertain the general knowledge of the technical field in the prior art, without using the prior art, to practice the present application, with the aid of the present application, to ascertain the general knowledge of the same general knowledge of the technical field in general purpose. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present application, and these should also be considered as the scope of the present application, which does not affect the effect of the implementation of the present application and the utility of the patent. The protection scope of the present application is subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims (9)

1. The data standardized treatment method for the medical big data is characterized by comprising the following steps of: the method comprises the following steps:
S100: acquiring original medical data of each type from a data source through different data acquisition modes, and storing the original medical data into a medical original database;
s200: performing standardization processing on the original medical data to obtain standard medical data, and storing the standard medical data into a medical standard database;
s300: determining a medical theme object and a related dimension thereof, generating a medical theme object table, calling standard medical data of the related dimension of the medical theme object from a medical standard database, filling the medical theme object table, and storing the medical theme object table into the medical theme database;
s400: acquiring a medical thematic scene of a data service, generating a medical thematic table, determining a medical theme object to be concerned according to the medical thematic scene, calling related standard medical data from the medical theme object table, performing data fusion, adding the medical theme object to the medical thematic table, and adding the medical thematic table to a medical thematic database;
s500: determining a data source of standard medical data in a medical thematic table, wherein the data source comprises a primary data source, a secondary data source and a tertiary data source with sequentially reduced trust level;
S501: after standard medical data is added into a medical thematic table, the data proportion of the standard medical data of a primary data source, a secondary data source and a tertiary data source in the medical thematic table is respectively determined;
s502: when the ratio of standard medical data of a primary data source in the medical thematic table is lower than that of any one of a secondary data source and a tertiary data source, acquiring the times of occurrence of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, and judging the data credibility of the secondary data source or the tertiary data source according to the times of occurrence of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, wherein the more the times of occurrence are, the higher the data credibility is;
s503: when the data reliability of the secondary data source or the tertiary data source is high, the standard medical data of the secondary data source or the tertiary data source is reserved, and when the data reliability of the secondary data source or the tertiary data source is low, the standard medical data from the same relevant dimension of the secondary data source or the tertiary data source is called from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database and added to the medical thematic table.
2. The data standardized governance method for medical big data of claim 1, wherein: the step S200 includes the steps of:
s210: performing metadata design, field specification design, field mapping design and snowflake type architecture design on the original medical data to obtain standard medical data;
s220: and performing defect identification on the standard medical data, repairing the identified standard medical data with defects, and storing the repaired standard medical data into a medical standard database.
3. The data standardized governance method for medical big data of claim 1, wherein: the step S300 includes the steps of:
s301: determining the classification dimension of the medical theme object, and constructing a label system of each classification dimension;
s302: standard medical data is obtained from a medical standard database according to the label system, and the standard medical data is calculated according to a preset calculation logic of the label system and then is filled into a medical theme object table.
4. The data standardized governance method for medical big data of claim 1, wherein: the step S400 includes the steps of:
s401: determining associated subject matter objects of a medical service scene;
S402: and acquiring standard medical data related to the medical service scene from a medical theme object table of the related theme object according to the theme object related to the service scene, and filling the standard medical data into the medical theme table.
5. The data standardized governance method for medical big data of claim 1, wherein: the method also comprises the following steps:
s600: when the acquired original medical data is updated, updating the data in the medical original database, the medical standard database, the medical theme database and the medical theme database.
6. A data standardization treatment system for medical big data, its characterized in that: the system comprises a data acquisition module, a data processing module, a data association module, a data fusion module, a medical original database, a medical standard database, a medical theme database, a data source determination module, a duty ratio determination module, a frequency acquisition module and a data update module;
and a data acquisition module: the method comprises the steps of acquiring original medical data of various types from a data source by using different data acquisition modes, and storing the original medical data into a medical original database;
and a data processing module: the medical standard database is used for carrying out standardized processing on the original medical data to obtain standard medical data, and storing the standard medical data into the medical standard database;
And a data association module: the medical theme management system comprises a medical theme database, a medical theme object table, a medical theme object management system and a medical theme management system, wherein the medical theme database is used for storing medical theme objects and related dimensions of the medical theme objects;
and a data fusion module: the method comprises the steps of obtaining a medical thematic scene of a data service, generating a medical thematic table, determining a medical theme object to be concerned according to the medical thematic scene, calling related standard medical data from the medical theme object table, carrying out data fusion, adding the medical theme object to the medical thematic table, and adding the medical thematic table to a medical thematic database;
the data source determining module is used for determining data sources of standard medical data in the medical thematic table, and the data sources comprise a primary data source, a secondary data source and a tertiary data source with sequentially reduced trust level;
the system comprises a duty ratio determining module, a data processing module and a data processing module, wherein the duty ratio determining module is used for respectively determining the data duty ratio of standard medical data of a primary data source, a secondary data source and a tertiary data source in a medical thematic table after standard medical data is added into the medical thematic table;
the frequency acquisition module is used for acquiring the frequency of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables when the duty ratio of the standard medical data of the primary data source in the medical thematic table is lower than that of the standard medical data of any one of the secondary data source or the tertiary data source, and judging the data credibility of the secondary data source or the tertiary data source according to the frequency of the standard medical data of the secondary data source or the tertiary data source in other medical thematic tables, wherein the more the frequency of occurrence is, the higher the data credibility is;
And the data updating module is used for reserving standard medical data of the secondary data source or the tertiary data source when the data reliability of the secondary data source or the tertiary data source is high, and calling the standard medical data from the secondary data source or the tertiary data source with the same relevant dimension of the secondary data source or the tertiary data source from the primary data source or other secondary data sources or tertiary data sources with high data reliability in the standard medical database when the data reliability of the secondary data source or the tertiary data source is low, and adding the standard medical data to the medical thematic table.
7. The data standardized governance system for medical big data of claim 6, wherein: the data processing module comprises a data design module and a defect repair module;
the data design module is used for carrying out metadata design, field specification design, field mapping design and snowflake type architecture design on the original medical data to obtain standard medical data;
and the defect repair module is used for carrying out defect recognition on the standard medical data, repairing the recognized standard medical data with defects and storing the repaired standard medical data into the medical standard database.
8. The data standardized governance system for medical big data of claim 7, wherein: the data association module comprises a theme dimension module and a data calculation module;
The topic dimension module is used for determining the classification dimension of the medical topic object and constructing a label system of each classification dimension;
the data calculation module is used for acquiring standard medical data from the medical standard database according to the label system, calculating the standard medical data according to preset calculation logic of the label system, and filling the standard medical data into the medical theme object table.
9. The data standardized governance system for medical big data of claim 8, wherein: the data fusion module comprises a scene object module and an object fusion module;
scene object module: a subject object for determining an association of a medical business scenario;
and the object fusion module is used for acquiring standard medical data related to the medical service scene from the medical theme object table of the related theme object according to the theme object related to the service scene and filling the standard medical data into the medical theme table.
CN202310799572.XA 2023-07-03 2023-07-03 Data standardized management method and system for medical big data Active CN116525124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310799572.XA CN116525124B (en) 2023-07-03 2023-07-03 Data standardized management method and system for medical big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310799572.XA CN116525124B (en) 2023-07-03 2023-07-03 Data standardized management method and system for medical big data

Publications (2)

Publication Number Publication Date
CN116525124A CN116525124A (en) 2023-08-01
CN116525124B true CN116525124B (en) 2023-08-29

Family

ID=87406694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310799572.XA Active CN116525124B (en) 2023-07-03 2023-07-03 Data standardized management method and system for medical big data

Country Status (1)

Country Link
CN (1) CN116525124B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272395B (en) * 2023-11-21 2024-01-26 江西曼荼罗软件有限公司 Patient medical data processing method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004017164A2 (en) * 2002-08-16 2004-02-26 Hx Technologies, Inc. Methods and systems for managing distributed digital medical data and access thereto
CN110335647A (en) * 2019-06-21 2019-10-15 上海市精神卫生中心(上海市心理咨询培训中心) A kind of clinical data standards system and standardized data acquisition method
CN112349369A (en) * 2020-11-27 2021-02-09 广州瀚信通信科技股份有限公司 Medical image big data intelligent analysis method, system and storage medium
CN113641659A (en) * 2021-08-30 2021-11-12 平安医疗健康管理股份有限公司 Medical characteristic database construction method, device, equipment and storage medium
WO2022155607A1 (en) * 2021-01-15 2022-07-21 F. Hoffmann-La Roche Ag Oncology workflow for clinical decision support
CN115599840A (en) * 2022-10-17 2023-01-13 中电科大数据研究院有限公司(Cn) Complex service data management method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11295867B2 (en) * 2018-06-05 2022-04-05 Koninklljke Philips N.V. Generating and applying subject event timelines

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004017164A2 (en) * 2002-08-16 2004-02-26 Hx Technologies, Inc. Methods and systems for managing distributed digital medical data and access thereto
CN110335647A (en) * 2019-06-21 2019-10-15 上海市精神卫生中心(上海市心理咨询培训中心) A kind of clinical data standards system and standardized data acquisition method
CN112349369A (en) * 2020-11-27 2021-02-09 广州瀚信通信科技股份有限公司 Medical image big data intelligent analysis method, system and storage medium
WO2022155607A1 (en) * 2021-01-15 2022-07-21 F. Hoffmann-La Roche Ag Oncology workflow for clinical decision support
CN113641659A (en) * 2021-08-30 2021-11-12 平安医疗健康管理股份有限公司 Medical characteristic database construction method, device, equipment and storage medium
CN115599840A (en) * 2022-10-17 2023-01-13 中电科大数据研究院有限公司(Cn) Complex service data management method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于健康医疗大数据的智能治理系统;黄寿孟等;《现代信息科技》;第7卷(第01期);14-17+22 *

Also Published As

Publication number Publication date
CN116525124A (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US20240203599A1 (en) Method and system of for predicting disease risk based on multimodal fusion
KR101873926B1 (en) Method for providing medical counseling service between insurance organization and specialist based on bigdata
CN102947832B (en) The identities match of patient's record
CN100449531C (en) Patient data mining
CN110335647A (en) A kind of clinical data standards system and standardized data acquisition method
WO2022116430A1 (en) Big data mining-based model deployment method, apparatus and device, and storage medium
CN112164460A (en) Intelligent disease auxiliary diagnosis system based on medical knowledge map
CN116525124B (en) Data standardized management method and system for medical big data
CN112349369A (en) Medical image big data intelligent analysis method, system and storage medium
CN114003791B (en) Depth map matching-based automatic classification method and system for medical data elements
CN103370629B (en) Clinical quality analytics system
CN114121295A (en) Construction method of knowledge graph driven liver cancer diagnosis and treatment scheme recommendation system
CN117275660B (en) Full-link AI auxiliary method for inquiry to prescription
CN110910991A (en) Medical automatic image processing system
CN113921122A (en) Medical care resource distribution system based on intelligent medical treatment
CN114242194A (en) Natural language processing device and method for medical image diagnosis report based on artificial intelligence
US20200135308A1 (en) Expression of clinical logic with positive and negative explainability
CN116884612A (en) Intelligent analysis method, device, equipment and storage medium for disease risk level
CN116110542A (en) Data analysis method based on trusted multi-view
Zamora et al. Characterizing chronic disease and polymedication prescription patterns from electronic health records
Jin et al. Research on the construction and application of breast cancer-specific database system based on full data lifecycle
CN114944209A (en) Integrated computing method and system for medical similar medical records
CN114647737A (en) Medical rule completion method and device
Fang et al. Abnormal event health-status monitoring based on multi-dimensional and multi-level association rules constraints in nursing information system
CN112699669B (en) Natural language processing method, device and storage medium for epidemiological survey report

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant