CN115221152A - Distributed node sharing method and system for biological sample database data - Google Patents

Distributed node sharing method and system for biological sample database data Download PDF

Info

Publication number
CN115221152A
CN115221152A CN202210840621.5A CN202210840621A CN115221152A CN 115221152 A CN115221152 A CN 115221152A CN 202210840621 A CN202210840621 A CN 202210840621A CN 115221152 A CN115221152 A CN 115221152A
Authority
CN
China
Prior art keywords
data
processing
sample
database
biological sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210840621.5A
Other languages
Chinese (zh)
Inventor
黄杰玞
黄晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bioit Guangzhou Biological Information Technology Co ltd
Original Assignee
Bioit Guangzhou Biological Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bioit Guangzhou Biological Information Technology Co ltd filed Critical Bioit Guangzhou Biological Information Technology Co ltd
Priority to CN202210840621.5A priority Critical patent/CN115221152A/en
Publication of CN115221152A publication Critical patent/CN115221152A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/20Heterogeneous data integration

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Bioethics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a distributed node sharing method and a system for biological sample database data, wherein the method comprises the following steps: synchronizing each biological sample library into a cloud server, and merging to obtain a merged database; standardizing the data in the merged database to obtain processed standardized data; synchronizing the standardized data to a cloud dispersed node database; performing characteristic engineering processing on the standardized data modeling, and automatically generating a medical sample analysis result; and carrying out visualization processing on the analysis result. According to the invention, deep learning is carried out through modeling, so that artificial intelligence processing of biological sample data of large sample magnitude from different mechanisms becomes possible, the labor cost and the error rate of manual judgment are improved, the accuracy of the data is ensured, the quality of medical research work is improved, and the public verifiability, traceability and high transparency of each data and each node in a biological sample library sharing system can be effectively ensured.

Description

Distributed node sharing method and system for biological sample database data
The application is a divisional application of a patent application named 'a distributed node sharing method, a system and a device for data of a biological sample library', the application date of the original application is 11 and 29 days in 2018, and the application number is 201811447402.0.
Technical Field
The invention relates to the technical field of data processing, in particular to a distributed node sharing method and system for biological sample database data.
Background
The biological sample bank is also called biological bank, and is used for performing standardized management on the collection, processing, storage and application processes of various biological samples and managing various information related to the samples, such as clinical information of the samples, follow-up visit information of patients, quality management information of the samples and the like. Over the last century, more and more biological sample libraries have been established, which play an increasingly important role in genomics research and precision medical research.
With the continuous development of biological sample libraries, the management of biological samples is increasingly difficult, the traditional manual management mode is difficult to meet the management requirements of the biological sample libraries, and meanwhile, the acquisition and processing of data information also face great challenges. The large sample and the large data are remarkable characteristics of modern life science research, biological sample data information comes from different medical institutions and scientific research institutions and is stored in various offline databases to form a plurality of island-type data sets, the traditional data transfer mode depends on manual operation and manual interpretation, the problems of data information loss, errors, unavailability and the like easily occur, sharing cannot be realized, and effective utilization of the biological sample is hindered.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a distributed node sharing method and system for biological sample database data, so as to solve the problems of obtaining biological sample information data from databases of various sources, such as medical institutions and scientific research institutions, and ensuring the reality, integrity and availability of the information data. According to the invention, through modeling and deep learning, artificial intelligence processing of biological sample information data of large samples and large data becomes possible, the labor cost and the error rate of manual judgment are greatly improved, the accuracy of the biological sample information data is ensured, the sharing of the biological sample information data can be realized, and the research of precise medical treatment on the large samples and the large data is promoted. The invention carries out pathological characteristic engineering treatment by modeling, can automatically generate the analysis result of the medical sample, simplifies the workload of medical researchers, and greatly improves the research quality and efficiency.
The technical scheme adopted by the invention is as follows:
a distributed node sharing method for biological sample library data comprises the following steps:
synchronizing each biological sample library into a cloud server, and merging to obtain a merged database;
standardizing the data in the merged database to obtain processed standardized data;
synchronizing the standardized data to a cloud dispersed node database;
modeling and characteristic engineering processing are carried out on the standardized data to obtain an analysis result;
carrying out visualization processing on the analysis result;
the standardized data are modeled based on different database types and medical research purposes of visiting users to diseases, different directivity problems are worked out, and special data variable capturing and processing are carried out aiming at the directivity problems; and a classification algorithm is adopted during modeling, and comprises a naive Bayes, an Adaboost iterative algorithm or a support vector machine algorithm.
Optionally, the normalizing the data in the merged database to obtain the normalized data after the normalization processing includes:
converting the merged data in the database into text data to obtain sample data, and importing the sample data into a background for processing;
carrying out data cleaning processing on the sample data to obtain standardized data;
the sample data is constant value coding data and free text data;
the cleaning types of the constant value coding data are data abnormity and data loss;
when the cleaning type is data missing, if the object is random numerical data, assigning and filling by using the sum of the average value, the median value, the average value and the random standard deviation;
if the object is classified data, classifying, assigning and filling the object by using the occurrence frequency;
when the cleaning type is data abnormity, if the object is unit numerical value abnormity, unit conversion is carried out on the object by using naive Bayes and decision binary tree;
if the object is abnormal point data, the abnormal point data is eliminated by using a kernel density estimation algorithm and principal component analysis;
the processing process of the free text data comprises the following steps: firstly, primary keyword grabbing is carried out, a new variable column is created, and primary coding assignment is carried out on the new variable column.
Optionally, the step of performing data cleaning processing on the sample data to obtain standardized data specifically includes:
performing characteristic engineering processing on the sample data to obtain processed sample data;
and performing data cleaning processing in a corresponding mode according to the data type of the processed sample data and the cleaning type required to be performed.
In order to achieve the above object, the present invention further provides a distributed node sharing system for biological sample database data, including:
the merging unit is used for synchronizing all the biological sample libraries to the cloud server and merging the biological sample libraries to obtain a merged database;
the standardization unit is used for standardizing the data in the merged database to obtain processed standardized data;
the node synchronization unit is used for synchronizing the standardized data to the cloud dispersed node database;
the analysis unit is used for modeling and performing characteristic engineering processing on the standardized data to obtain an analysis result;
the visualization unit is used for performing visualization processing on the analysis result;
the standardized data are modeled based on different database types and medical research purposes of visiting users to diseases, different directivity problems are worked out, and special data variable capturing and processing are carried out aiming at the directivity problems; and a classification algorithm is adopted during modeling, and comprises a naive Bayes, an Adaboost iterative algorithm or a support vector machine algorithm.
Optionally, the normalization unit specifically includes:
the conversion unit is used for converting the merged data in the database into text data to obtain sample data and importing the sample data into a background for processing;
the cleaning unit is used for cleaning the data of the sample data to obtain standardized data;
the sample data is fixed value coding data and free text data;
the cleaning types of the constant value coding data are data abnormity and data loss;
when the cleaning type is data missing, if the object is random numerical data, assigning and filling by using the sum of the average value, the median value, the average value and the random standard deviation;
if the object is classified data, classifying, assigning and filling the object by using the occurrence frequency;
when the cleaning type is data abnormity, if the object is abnormal in unit value, unit conversion is carried out on the object by using naive Bayes and decision binary tree;
if the object is abnormal point data, removing the abnormal point data by using a kernel density estimation algorithm and principal component analysis;
the processing process of the free text data comprises the following steps: firstly, primary keyword grabbing is carried out, new variable columns are created, and primary coding assignment is carried out on the new variable columns.
Optionally, the cleaning unit specifically includes:
the characteristic processing unit is used for carrying out unsupervised sample and pathological characteristic engineering processing on the sample data to obtain the processed sample data;
and the data clearing unit is used for carrying out data clearing processing in a corresponding mode according to the data type of the processed sample data and the clearing type required to be carried out.
The invention has the beneficial effects that:
according to the distributed node sharing method and system for the biological sample library data, deep learning is performed through modeling, artificial intelligence becomes possible in the life science field, labor cost and the error rate of manual judgment are greatly improved, the accuracy of data is guaranteed, meanwhile, pathological feature engineering processing can be performed through modeling, a medical sample analysis result is automatically generated, the workload of medical researchers is simplified, and the research quality and efficiency are greatly improved. And moreover, a block chain distributed deployment mode is adopted, so that each piece of data and each node in the biological sample library sharing system can be effectively ensured to have the characteristics of public verifiability, traceability and high transparency.
Drawings
FIG. 1 is a flowchart illustrating steps of a distributed node sharing method for data of a biological sample database according to the present invention;
fig. 2 is a block diagram of a distributed node sharing system for biological sample database data according to the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings:
referring to fig. 1, the distributed node sharing method for biological sample database data of the present invention includes the following steps:
synchronizing each biological sample library to a cloud server, and merging to obtain a merged database; the merged objects are a plurality of biological sample banks disposed in different hospitals and research institutions;
standardizing the data in the merged database to obtain processed standardized data;
and synchronizing the standardized data to a cloud dispersed node database.
The biological sample data comprises sample types, acquisition positions, acquisition time, freezing conditions and analysis data for scientific research by using a new technical method, wherein the analysis data comprises sequencing data and proteomics data, and each biological sample library is arranged in a hospital and a research institution, is synchronized to a cloud server and is combined, so that preparation for unified data is made for subsequent data processing. The step becomes a big premise for realizing the sharing of the biological sample library system, and the database nodes are deployed uniformly for data acquisition, so that the database list consistency and the data utilization rate are greatly improved, and the space and time cost of subsequent data processing is reduced. The distributed node database is positioned in a block chain, the distributed nodes refer to all nodes in the block chain, and the distributed node database is integrally deployed in a block chain system.
Further as a preferred embodiment, the method further comprises the following steps:
modeling and characteristic engineering processing are carried out on the standardized data to obtain an analysis result;
and carrying out visualization processing on the analysis result.
In this embodiment, the standardized data is modeled based on different database types and medical research purposes of accessing users to diseases, different directionality problems are formulated, and then special data variable capture and processing are performed on the problems. In the preliminary scheme preparation, because biological samples are related to disease types, most common algorithms are classified algorithms, such as naive Bayes, adaboost iterative algorithm, support vector machine and other algorithm processing, and after a series of preliminary modeling is completed, K-folding cross validation is performed on the three models for multiple times to obtain the best accuracy. The invention carries out deep learning feedback through modeling, so that artificial intelligence becomes possible in the life science field and the labor cost and the error rate of manual judgment are greatly improved. Finally, various analysis results processed through statistical analysis are visually output according to user requirements, so that the user can conveniently compare and check the results and display results used for research.
As a preferred embodiment, the normalizing the data in the merged database to obtain the normalized data after processing specifically includes:
converting the merged data in the database into text data to obtain sample data, and importing the sample data into a background for processing;
and carrying out data cleaning processing on the sample data to obtain standardized data.
Further as a preferred embodiment, the step of performing data cleaning processing on the sample data to obtain standardized data specifically includes:
carrying out unsupervised sample and pathological feature engineering processing on the sample data to obtain processed sample data;
and performing data cleaning processing in a corresponding mode according to the data type of the processed sample data and the cleaning type required to be performed.
In the embodiment of the invention, the sample data can be divided into constant value coding data and free text data.
The fixed value encodes data, and the cleaning types of the data are data exception and data missing. Aiming at the variables of different types, the invention adopts a corresponding algorithm to process the variables.
The cleaning types are data missing and data abnormity;
when the cleaning type is data missing, if the object is randomly digitalized data such as age, height and the like, the average value, the median value, and the sum of the average value and the random standard deviation are used for assignment filling.
And if the object is blood type, gender and other classification type data, classifying, assigning and filling the object by using the occurrence frequency.
When the type of the removal is data abnormity, if the object is unit numerical abnormity, unit conversion is carried out on the object by using simple naive Bayes and decision binary tree. And if the object is abnormal point data, using a kernel density estimation algorithm and principal component analysis abnormal point data to eliminate the abnormal point data.
For the free text data, in the embodiment, preliminary keyword grabbing is performed on the free text data, a new variable column is created, and preliminary coding assignment is performed on the free text data. The specific value will depend on the code dictionary of the specific sample library.
Referring to fig. 2, the present invention provides a distributed node sharing system for biological sample library data, including:
the merging unit is used for synchronizing all the biological sample libraries to the cloud server and merging the biological sample libraries to obtain a merged database;
the standardization unit is used for standardizing the data in the merged database to obtain processed standardized data;
and the node synchronization unit is used for synchronizing the standardized data to the cloud end scattered node database.
Further, as a preferred embodiment, the method further includes:
the analysis unit is used for carrying out characterization processing on the standardized data by utilizing modeling to obtain an analysis result;
and the visualization unit is used for performing visualization processing on the analysis result.
As a further preferred embodiment, the normalization unit specifically includes:
the conversion unit is used for converting the merged data in the database into text data to obtain sample data and importing the sample data into a background for processing;
and the cleaning unit is used for cleaning the data of the sample data to obtain standardized data.
As a preferred embodiment, the cleaning unit specifically includes:
the characteristic processing unit is used for carrying out unsupervised sample and pathological characteristic engineering processing on the sample data to obtain the processed sample data;
and the data clearing unit is used for carrying out data clearing processing in a corresponding mode according to the data type of the processed sample data and the clearing type required to be carried out.
The specific embodiment of the invention utilizes the characteristic engineering innovation algorithm, the deep learning algorithm and other algorithms to standardize the biological sample data from different sources and enhance the reality in the virtual cloud computing terminal, and utilizes the distributed deployment technology of the block chain in the subsequent steps, so that the biological sample data and the clinical data of the patient are impartial, safe and traceable, and the data loss condition in cross-domain diagnosis and research is greatly reduced.
From the above, the invention can realize deep learning through modeling, so that artificial intelligence becomes possible in the field of life science, the labor cost and the error rate of character judgment are greatly improved, the accuracy of data is ensured, meanwhile, the modeling can be utilized to carry out pathological characteristic engineering processing, the analysis result of the medical sample can be automatically generated, the workload of medical research workers is simplified, the research quality and efficiency are greatly improved, and the characteristics of open verifiability, traceability and high transparency of each piece of data and each node in the biological sample library sharing system can be effectively ensured.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A method for sharing distributed nodes of biological sample library data is characterized by comprising the following steps:
synchronizing each biological sample library to a cloud server, and merging to obtain a merged database;
standardizing the data in the merged database to obtain processed standardized data;
synchronizing the standardized data to a cloud dispersed node database;
modeling and characteristic engineering processing are carried out on the standardized data to obtain an analysis result;
carrying out visualization processing on the analysis result;
the standardized data are modeled based on different database types and the purpose of accessing users, different directivity problems are worked out, and special data variable capturing and processing are carried out aiming at the directivity problems; during modeling, a classification algorithm is adopted, the classification algorithm comprises a naive Bayes algorithm, an Adaboost iterative algorithm or a support vector machine algorithm, and an unsupervised sample and pathological feature engineering algorithm is adopted for feature engineering processing.
2. The distributed node sharing method for biological sample library data as claimed in claim 1, wherein: the step of normalizing the data in the merged database to obtain the processed normalized data specifically comprises:
converting the merged data in the database into text data to obtain sample data, and importing the sample data into a background for processing;
carrying out data cleaning processing on the sample data to obtain standardized data;
the sample data is fixed value coding data and free text data;
the cleaning types of the constant value coding data are data abnormity and data loss;
when the cleaning type is data missing, if the object is random numerical data, assigning and filling by using the sum of the average value, the median value, the average value and the random standard deviation;
if the object is classified data, classifying, assigning and filling the object by using the occurrence frequency;
when the cleaning type is data abnormity, if the object is unit numerical value abnormity, unit conversion is carried out on the object by using naive Bayes and decision binary tree;
if the object is abnormal point data, the abnormal point data is eliminated by using a kernel density estimation algorithm and principal component analysis;
the processing process of the free text data comprises the following steps: firstly, primary keyword grabbing is carried out, new variable columns are created, and primary coding assignment is carried out on the new variable columns.
3. The distributed node sharing method for biological sample library data as claimed in claim 2, wherein: the step of performing data cleaning processing on the sample data to obtain standardized data specifically comprises the following steps:
performing characteristic engineering processing on the sample data to obtain processed sample data;
and performing data cleaning processing in a corresponding mode according to the data type of the processed sample data and the cleaning type required to be performed.
4. A distributed node sharing system for biological sample library data, comprising:
the merging unit is used for synchronizing all the biological sample libraries to the cloud server and merging the biological sample libraries to obtain a merged database;
the standardization unit is used for standardizing the data in the merged database to obtain processed standardized data;
the node synchronization unit is used for synchronizing the standardized data to the cloud end scattered node database;
the analysis unit is used for modeling and performing characteristic engineering processing on the standardized data to obtain an analysis result;
the visualization unit is used for performing visualization processing on the analysis result;
the standardized data are modeled based on different database types and the purpose of accessing users, different directivity problems are worked out, and special data variable capturing and processing are carried out aiming at the directivity problems; a classification algorithm is adopted during modeling, and comprises a naive Bayes, an Adaboost iterative algorithm or a support vector machine algorithm; the characteristic engineering processing adopts an unsupervised sample and a pathological characteristic engineering algorithm.
5. The distributed node sharing system for biological sample library data as claimed in claim 4, wherein: the standardization unit specifically comprises:
the conversion unit is used for converting the merged data in the database into text data to obtain sample data and importing the sample data into a background for processing;
the cleaning unit is used for cleaning the data of the sample data to obtain standardized data;
the sample data is fixed value coding data and free text data;
the cleaning types of the constant value coding data are data abnormity and data loss;
when the cleaning type is data missing, if the object is random numerical data, assigning and filling by using the sum of the average value, the median value, the average value and the random standard deviation;
if the object is classified data, classifying, assigning and filling the object by using the occurrence frequency;
when the cleaning type is data abnormity, if the object is unit numerical value abnormity, unit conversion is carried out on the object by using naive Bayes and decision binary tree;
if the object is abnormal point data, the abnormal point data is eliminated by using a kernel density estimation algorithm and principal component analysis;
the processing process of the free text data comprises the following steps: firstly, primary keyword grabbing is carried out, new variable columns are created, and primary coding assignment is carried out on the new variable columns.
6. The system of claim 5, wherein: the cleaning unit specifically comprises:
the characteristic processing unit is used for carrying out unsupervised sample and pathological characteristic engineering processing on the sample data to obtain the processed sample data;
and the data clearing unit is used for carrying out data clearing processing in a corresponding mode according to the data type of the processed sample data and the clearing type required to be carried out.
CN202210840621.5A 2018-11-29 2018-11-29 Distributed node sharing method and system for biological sample database data Pending CN115221152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210840621.5A CN115221152A (en) 2018-11-29 2018-11-29 Distributed node sharing method and system for biological sample database data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811447402.0A CN109635026A (en) 2018-11-29 2018-11-29 A kind of biological sample bank data distributing nodes sharing method, system and device
CN202210840621.5A CN115221152A (en) 2018-11-29 2018-11-29 Distributed node sharing method and system for biological sample database data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201811447402.0A Division CN109635026A (en) 2018-11-29 2018-11-29 A kind of biological sample bank data distributing nodes sharing method, system and device

Publications (1)

Publication Number Publication Date
CN115221152A true CN115221152A (en) 2022-10-21

Family

ID=66069944

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811447402.0A Pending CN109635026A (en) 2018-11-29 2018-11-29 A kind of biological sample bank data distributing nodes sharing method, system and device
CN202210840621.5A Pending CN115221152A (en) 2018-11-29 2018-11-29 Distributed node sharing method and system for biological sample database data

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201811447402.0A Pending CN109635026A (en) 2018-11-29 2018-11-29 A kind of biological sample bank data distributing nodes sharing method, system and device

Country Status (1)

Country Link
CN (2) CN109635026A (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150181A1 (en) * 2007-12-07 2009-06-11 Roche Diagnostics Operations, Inc. Method and system for personal medical data database merging
CN103150304B (en) * 2011-12-06 2016-11-23 郑红云 Cloud Database Systems
CN104809590B (en) * 2015-05-05 2016-10-12 赣州市明豪信息技术有限公司 A kind of intelligent cloud synchronizes medical information sharing system
CN105956015A (en) * 2016-04-22 2016-09-21 四川中软科技有限公司 Service platform integration method based on big data
CN106777930A (en) * 2016-11-30 2017-05-31 南京大学 Central network platform based on electrocardiogram unified standardization
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method

Also Published As

Publication number Publication date
CN109635026A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
CN107731269B (en) Disease coding method and system based on original diagnosis data and medical record file data
CN107705839B (en) Disease automatic coding method and system
Eswari et al. Predictive methodology for diabetic data analysis in big data
Milovic et al. Prediction and decision making in health care using data mining
CN108831556B (en) Method for predicting heparin dosage in continuous renal replacement therapy process
Boukenze et al. Predictive analytics in healthcare system using data mining techniques
Karthiga et al. Early prediction of heart disease using decision tree algorithm
CN109785927A (en) Clinical document structuring processing method based on internet integration medical platform
Ghadge et al. Intelligent heart attack prediction system using big data
CN108461110B (en) Medical information processing method, device and equipment
US20170147753A1 (en) Method for searching for similar case of multi-dimensional health data and apparatus for the same
CN112349369A (en) Medical image big data intelligent analysis method, system and storage medium
Cismondi et al. Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data
Jatav An algorithm for predictive data mining approach in medical diagnosis
CN115497631A (en) Clinical scientific research big data analysis system
Alaria et al. Design Simulation and Assessment of Prediction of Mortality in Intensive Care Unit Using Intelligent Algorithms
Swetha et al. Leveraging Scalable Classifier Mining for Improved Heart Disease Diagnosis
Sampath et al. Diabetic data analysis in healthcare using Hadoop architecture over big data
CN116343980A (en) Intelligent medical review follow-up data processing method and system
CN115472257A (en) Method and device for recruiting users, electronic equipment and storage medium
Mahammad et al. Machine Learning Approach to Predict Asthma Prevalence with Decision Trees
Abdulkadium et al. Application of Data Mining and Knowledge Discovery in Medical Databases
CN115221152A (en) Distributed node sharing method and system for biological sample database data
CN110010231A (en) A kind of data processing system and computer readable storage medium
CN111986815A (en) Project combination mining method based on co-occurrence relation and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination