CN109830303A - Clinical data mining analysis and aid decision-making method based on internet integration medical platform - Google Patents

Clinical data mining analysis and aid decision-making method based on internet integration medical platform Download PDF

Info

Publication number
CN109830303A
CN109830303A CN201910101985.XA CN201910101985A CN109830303A CN 109830303 A CN109830303 A CN 109830303A CN 201910101985 A CN201910101985 A CN 201910101985A CN 109830303 A CN109830303 A CN 109830303A
Authority
CN
China
Prior art keywords
data
analysis
module
clinical
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910101985.XA
Other languages
Chinese (zh)
Inventor
高建强
赵戈
徐龙章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Triman Information & Technology Co Ltd
Original Assignee
Shanghai Triman Information & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Triman Information & Technology Co Ltd filed Critical Shanghai Triman Information & Technology Co Ltd
Priority to CN201910101985.XA priority Critical patent/CN109830303A/en
Publication of CN109830303A publication Critical patent/CN109830303A/en
Pending legal-status Critical Current

Links

Abstract

Present invention discloses a kind of clinical data mining analysis and aid decision-making method based on internet integration medical platform, it is related to internet medical platform technical field, including data mining analysis and aid decision, data mining analysis includes multidimensional analysis algoritic module, data mining algorithm module, deep learning algoritic module;Aided remote decision is made of four parts such as the prediction module based on index parameter, the prediction module based on audit report text, model training module and structurized modules.The present invention selects the research object of hyperthyroidism, diabetes, thyroid nodule, several diseases of tumor of breast as data collection and analysis, unified platform acquisition is relied on to integrate clinical medical data, it realizes the data mining analysis towards the diseases clinical data such as hyperthyroidism, diabetes, thyroid nodule, tumor of breast and aid decision service, system is provided and is supported for clinician's clinical diagnosis and scientific research personnel's disease research.

Description

Clinical data mining analysis and aid decision based on internet integration medical platform Method
Technical field
The present invention relates to internet medical platform technical fields, more specifically refer to a kind of flat based on internet integration medical treatment The clinical data mining analysis and aid decision-making method of platform.
Background technique
Big data penetrates into each industry and department, depth is answered as a kind of important resource to some extent With the business activities for not only facilitating constituent parts, it is also beneficial to push the development of national economy." internet+" be various countries' industry and The achievement and mark of information-based depth integration, and further promote the important handgrip of information consumption.So-called " internet+" is exactly " internet+each traditional industries ", but this is not both simple addition, but utilize Information and Communication Technology and internet Platform allows internet and traditional industries to carry out depth integration, creates new developing ecology.Future Internet also can as electricity, As a kind of productivity tool, being substantially improved for efficiency is brought to each industry.Push mobile Internet, cloud computing, big data, Internet of Things etc. promotes e-commerce, industry internet and the development of internet financial health, such as in conjunction with modern manufacturing industry " tradition Fairground+internet, traditional general merchandise sales field+internet, traditional bank+internet, traditional matchmaker+internet have, and tradition is handed over Logical+internet." internet+" it is financial, mutual to form such as internet medical treatment, internet just in overall application to the tertiary industry The new industry situations such as networking traffic, Internet education.
Medical industry is the important component of national economy and social development, and under the new situation, medical information is built Fast development have benefited from the application of the IT emerging technology such as big data, cloud computing and Internet of Things, caused the big of medical data Explosion, promotes the formation of medical big data.Traditional medical is not overturned in internet, but is with internet, mobile Internet Means are examined to divide, Extension of service radius, to solve the problems, such as medical resource insufficient supply and be unevenly distributed weighing apparatus.Internet medical treatment The communication capability between patient, medical service organ and doctor is improved, traditional Site Service mode is broken through, alleviates medical treatment The status of scarcity of resources.But do not got through between hospital and with the information sharing outside institute in institute, problem of detached island of information is significant, A degree of restriction is brought for the effective use of medical data, the user health data being well worth doing originally become nothing and use force it Ground.
Summary of the invention
(1) the technical issues of solving
It is an object of the present invention to provide a kind of clinical data mining analysis based on internet integration medical platform and auxiliary Decision-making technique selects the research of hyperthyroidism, diabetes, tumor of breast and several diseases of thyroid tumors as data collection and analysis Object relies on unified platform acquisition to integrate clinical medical data, realizes towards hyperthyroidism, diabetes, tumor of breast and thyroid gland The data mining analysis of the diseases clinical data such as tumour and aid decision service are clinician's clinical diagnosis and scientific research personnel's disease Disease research offer system is supported.
(2) technical solution
Clinical data mining analysis and aid decision-making method based on internet integration medical platform, including data mining Analysis and aid decision, data mining analysis include multidimensional analysis algoritic module, data mining algorithm module, deep learning algorithm Module;Multidimensional analysis algoritic module chooses several numbers firstly the need of cube, cube is established from data warehouse According to subset, then organize and be aggregated into the multidimensional structure as defined in multiple dimensions and metric;Data mining algorithm module provides Uniform registration including machine learning algorithms such as classification, cluster, correlation rule, regression analyses is used for using managing with nullifying For the mining analysis of specific set of data, clinical depth analysis, early warning and prediction are realized;Deep learning prediction algorithm module collection At the recurrent neural networks model of convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN) and shot and long term memory unit (LSTM) scheduling algorithm;Aided remote decision by the prediction module based on index parameter, the prediction module based on audit report text, Model training module and structurized module composition.
An embodiment according to the present invention, the multidimensional analysis algoritic module are vertical to the data organized with multi-dimensional form Cube carries out volume, lower brill, slice, stripping and slicing, a variety of analyses of rotation operate, so as to profile data, enable analyst, policymaker from Data in multiple angles, multiple sides observation database, so that understanding in depth includes information and intension in data.
An embodiment according to the present invention, the multidimensional analysis algoritic module include dimension and metric, and the dimension is User observes the angle of data, and what it is comprising dimensional information is dimension table, detailed value or factual data comprising preservation metric It is true table, dimension table includes the characteristic of the true record in description fact table;The metric is with practical significance Numerical value, metric is stored in true table.
An embodiment according to the present invention, it is described to carry out model training using data mining algorithm, it is broadly divided into problem Analysis, data prediction, data modeling, outcome evaluation, knowledge apply five steps:
(1) case study: by the productive discussions with doctor, expert, the target of explicit data analysis defines data analysis Demand, setting data analysis expected results;
(2) data prediction: determine data analysis data source, construct reasonable data warehouse for data inquiry with Analysis, using sqlserver, mysql database, java, python high-level language, R, spss statistical means carry out data Necessary pretreatment work;
(3) using processed data as input feature vector, suitable machine learning model, Huo Zhe data modeling: are selected It is suitably modified on the basis of existing machine learning model, using objective attribute target attribute as output feature, during model training Parameter adjustment and model optimization is repeated, obtains the optimal training pattern of effect;
(4) outcome evaluation: after the completion of model construction, it would be desirable to carry out the assessment of science, to the reliability of model with true The convincingness for protecting experimental result, specifically, using accuracy rate, can be recalled by the methods of ROC curve and confusion matrix The indexs such as rate, F1 value measure the quality of model;
(5) it knowledge application: goes to predict unknown things using molding model, or seeks unknown things and known things Between connection, so that people be helped preferably to recognize unknown things.
An embodiment according to the present invention, be integrated in the data mining algorithm module random forest, support vector machines, Neural network, regression analysis, correlation analysis, Apriori association analysis, K-means clustering algorithm;
(1) random forests algorithm is described in detail below: firstly, being concentrated with the pumping put back to from sample using bagging technology The subset of same size is taken, repeats K times, that is, generates K sample set, it is then right using each self-generating of these sample sets one The decision tree answered generates optimal decision tree to each sample set, finally carries out the classification prediction result of these decision trees Ballot, classification prediction result of the several most results of getting tickets as Random Forest model;
(2) algorithm of support vector machine finds an optimal segmentation hyperplane, so that the plane is guaranteeing to maximize classification Under the premise of accuracy rate, the distance of both sides sample to hyperplane is maximum;
(3) neural network algorithm can be divided into several component parts in structure, be input layer, hidden layer and output respectively Layer, wherein input layer receives the initial data of extraneous input, and passes it to hidden layer;Hidden layer may include one layer or Multilayer is responsible for internal information processing and conversion, and the information after conversion is passed to output layer;Information is in output layer by most After processing and conversion afterwards, final result is outwardly exported;
(4) algorithm with regress analysis method is divided into simple regression analysis and polynary according to the quantity of variable involved in analyzing Regression analysis;According to the quantity of independent variable, linear regression analysis can be divided into simple regression analysis and multiple regression point Analysis;According to the relationship type between independent variable and dependent variable, linear regression analysis and nonlinear regression analysis can be divided into;
(5) it whether there is certain dependence between correlation analysis algorithm data object, and have dependence for specific The data object of relationship inquires into its related direction and degree of correlation;
(6) Apriori association analysis algorithm, the first step retrieve all frequent episodes in transaction database by iteration Collection;Second step constructs the rule for meeting user's min confidence using frequent item set;
(7) for K-means algorithm using manhatton distance or Euclidean distance as similarity measure, it is to ask corresponding a certain first Beginning cluster centre vector v most has classification, so that evaluation index J is minimum.
An embodiment according to the present invention, the prediction module based on index parameter: according to patient's outpatient service serial number or Person's medical insurance card number information inquires the test rating and audit report text of the related disease of the patient, for structuring index Data can directly input;For non-structured audit report, obtaining model using the progress structuring of structuring submodule can know Other data format;
The prediction module based on audit report text: according to the information such as patient's outpatient service serial number or medical insurance card number, Inquire the audit report text of the patient;For non-structured audit report text, directly using deep learning algorithm into Row prediction;
The model training module: by the relevant clinical audit report of multiple database medical information systems, test rating Data merge processing, are integrated into unified tables of data, carry out model training;
The structurized module: it realizes that the structuring to ultrasonic report text data is handled, extracts the ultrasound of various samples The index value of feature and each index, and form the description template by each sample.
(3) beneficial effect
Using technical solution of the present invention, clinical data mining analysis based on internet integration medical platform with it is auxiliary Decision-making technique is helped, unstructured clinical document is input to clinical document structuring processing engine, passes through clinical medicine corpus, rule Then, the means such as full-text search and machine learning are handled, and are obtained structural data and are output to distributed storage engine, pass through artificial intelligence Energy algorithm is handled, and for Platform Analysis, is shown;Text data non-structured in clinical data is carried out structure by the present invention Change processing, stores into distributed Hadoop cluster, realizes Distributed Storage mode and distributed computing processing, and will be Programming in software application, which is realized, to be transformed and is adapted to for distributed nature.
Detailed description of the invention
In the present invention, identical appended drawing reference always shows identical feature, in which:
Fig. 1 is integrated medical platform general frame figure Internet-based.
Fig. 2 is hyperthyroidism clinical medical data analysis system level architecture diagram.
Fig. 3 is that diabetes clinical data analysis excavates general frame figure.
Fig. 4 is that thyroid nodule clinical data analysis excavates integrated stand composition.
Fig. 5 is the thyroid disease classification method flow chart based on random forest.
Specific embodiment
Technical solution of the present invention is further illustrated with reference to the accompanying drawings and examples.
Integration medical platform Internet-based combines medical big data and artificial intelligence technology, realizes based on " interconnection The integrated big data medical services platform of net+medical treatment ", for all participation health cares, movable personal and mechanism provides data The medical services of the online health care new model such as shared, business operation and cooperation with service, optimization information communication, advantageously promote Doctors and patients' information mutual communication facilitates service and management that hospital improves itself.Platform general frame is as shown in Figure 1, in platform under It is supreme to be respectively as follows: platform data basal layer, data analysis layer, medical information resource layer, data depth application layer and client layer etc. Five levels.Integration medical services platform Internet-based, including back-stage management end, doctor terminal and the big portion of patient end three Point.
Integrated medical services backstage management of platform end:
Back-stage management provides hospital HIS, the data exchanges such as PACS, LIS, RIS integration, medical information system medical data The functions such as backup.Mainly by data pick-up integration, medical data backup storage, special population database and anonymous public medical record number The composition such as inquiry according to library.
(1) data pick-up is integrated: completing the mistake of extraction, conversion and the load of the system datas such as HIS, RIS, LIS, PACS The clinical data that different clinic information systems generate is carried out unified integration and summarized, realized and suffer from different clinical information by journey The unification of person's mark and the unification of patient clinical information, make clinical data can unify storage.
A) HIS data extraction module, which is realized, registers, goes to a doctor, examines from HIS Emergency call and HIS system increment extraction of being hospitalized Break, doctor's advice, be admitted to hospital, the clinical datas such as expense;
B) RIS data extraction module realizes from RIS system increment synchronization audit report, position detail etc. and checks data;
C) LIS data extraction module is realized from LIS system increment synchronization survey report, test rating, bacterium and susceptibility Etc. inspection datas;
D) PACS module realizes the access from image documentation equipment such as DR, CT etc. the data for following DICOM3.0 consensus standard.
E) ETL subsystem is completed to operate desensitization, cleaning and conversion of clinical data etc..
Data desensitization: desensitizing for patient individual's sensitive data, and patient identity card number, medical card number, patient are personal Name etc. carries out specially treated, removes sensitive composition.
Data cleansing: incomplete data are abandoned;The data wrong for format, such as date of birth, pass through Other related datas are repaired, and can not repair, data are marked;
Data conversion: to the enumerated value for using numerical value or character to save in the system of source, the text of corresponding meaning is converted to.
(2) medical data backup storage: medical data backup center is the basis of clinical big data storage, for clinical big number Initial data source is provided according to processing, analysis.System is using distributed Hadoop cloud storage architecture, and for different medical, mechanism is provided The distributed storage ability of linear expansion, realize data storage filing, management and shared and all types medical institutions it Between information intercommunication, shared, achieve the purpose that the diversification storage and access of cloud computing platform.Medical data backup center is by curing The modules compositions such as treatment data bulk migration, medical data increment import, medical data is checked.
A) medical data bulk migration: use hadoop distributed structure/architecture, realize medical information system medical data by The monolithic backup that time carries out.
B) medical data increment imports: in the incremental mode of time series, the medical treatment imported in medical information system increases Measure data.
C) medical data is checked: being realized to kinds of Diseases, Gender, age bracket, department, audit report type and inspection Time etc. imports medical data and is inquired.
(3) it special population database: according to the patient clinical data of medical information system, establishes towards hyperthyroidism, glycosuria The special population database of the diseases such as disease, thyroid nodule, tumor of breast and thyroid tumors, can be to kinds of Diseases, patient Gender, age, inspection doctor, Index for examination and review time etc. inquire.
(4) state of an illness case and the doctor of the patients such as diabetes, thyroid disease anonymous public clinical record data base: can be checked Diagnosis and treatment suggestion, see a doctor to the patient of the similar state of an illness and reference be provided.In view of privacy, number is established using anonymous form for patient According to library.Kinds of Diseases, illness description content, doctor can be suggested in detail, check doctor, enquirement and time for replying etc. to look into It askes.
(5) model library: in order to which the model constructed using intelligent algorithm carries out classification forecast analysis, mould to medical diagnosis on disease The management of artificial intelligence model is mainly realized in type library, including importing, model training and model such as check at the functions.
(6) system administration: unified platform is mainly directed towards information centre, medical institutions administrative staff, doctor and patient etc. no With role, need scientifically to manage these users, lead to user management and role rights management, to it is various operation with Data access authority carries out stringent authorization and control.
Integrated medical services platform doctor terminal:
Integrated medical services platform doctor terminal is mainly that the medical personnel of medical institutions and researcher provide medicine Research and medical diagnosis aid decision provide platform, establish doctors and patients' channel of communication, check the medical advice of patient and for diagnosis Evaluation.Mainly by special population analysis, aided remote decision, patient advisory checks, evaluation of patient is checked etc. forms.
(1) special population is analyzed: to the clinical data for suffering from the special populations such as hyperthyroidism and diabetes in hospital information system Analysis mining is carried out disease research is provided and is provided and is for clinician and scientific research personnel to obtain occurrence regularity and inherent mechanism System is supported.
A) hyperthyroidism clinical data analysis excavates: hyperthyroidism clinical data includes the medical note of the Basic Information Table of patient, patient The clinical datas tables such as table, the medicining condition table of patient, the index test table of patient and the diagnosis situation table of patient are recorded, number is recorded Total amount about 2,000,000.Realize and data mining analysis carried out to the clinical data of hyperthyroidism disease, mainly from the essential information of patient, The themes such as test rating data information, doctor's advice medicining condition, complication situation, recurrence carry out.
B) diabetes clinical data analysis excavates: Basic Information Table of the diabetes clinical data comprising patient, patient are just The clinical datas tables such as record sheet, the medicining condition table of patient, the index test table of patient and the diagnosis situation table of patient are examined, are remembered Record number total amount about 1,000,000.It realizes and data mining analysis is carried out to the clinical data of hyperthyroidism disease, mainly from the basic letter of patient The themes such as breath, test rating data information, doctor's advice medicining condition, diagnosis situation carry out.
(2) aided remote decision: selection endocrine subject, the thyroid gland of cardiovascular subject and tumour subject, coronary heart disease and Research object of several diseases such as tumour as data collection and analysis relies on unified platform acquisition to integrate clinical treatment number According to realizing the medical diagnosis aid decision-making system towards thyroid nodule, coronary heart disease and tumor of breast etc., face for clinician Bed diagnosis and scientific research personnel's disease research provide system and support.Mainly by based on index parameter prediction module, based on check report Accuse four parts such as prediction module, model training module and the structurized module of text composition.
A) based on the prediction module of index parameter: according to the information such as patient's outpatient service serial number or medical insurance card number, Ke Yicha Ask the test rating and audit report text of the related disease of the patient.Structuring achievement data can be directly inputted;It is right In non-structured audit report, structuring is carried out using structuring submodule and obtains the data format that model can identify.
B the prediction module) based on audit report text:, can according to the information such as patient's outpatient service serial number or medical insurance card number To inquire the audit report text of the patient.For non-structured audit report text, deep learning algorithm is directly utilized It is predicted.
C) model training module: belonging to the basic module of system, invisible to user.By the thyroid gland knot of multiple databases The data such as the relevant clinical audit reports of the medical information systems such as section, coronary heart disease, mammary gland, test rating merge processing, collect At into unified tables of data, model training is carried out.
D) structurized module: realize that the structuring to ultrasonic report text data is handled, the ultrasound for extracting various samples is special Sign includes the index value of Tumor size, boundary, echo distribution, echo intensity etc. and each index, and forms retouching by each sample State template.Based on the template, the processing of the structuring to ultrasonic content of text is realized.
(3) patient advisory checks: it realizes doctor and conditions of patients diagnosis consulting content is checked, it can be according to disease kind Class, illness description content, review time etc. screening are checked.
(4) evaluation of patient is checked: it realizes doctor and evaluation of patient content is checked, it can be according to physician names, patient Name, evaluation content, evaluation time etc. screening are checked.
Integrated medical services platform patient end:
Integrated medical services platform patient end is the interface that patient logs in platform, and predominantly patient provides remotely cures on line Service is treated, evaluates service, the inquiry of Patients ' Electronic health account etc. after being mainly included in line consulting interrogation, medical treatment.Patient can lead to Online interrogation is crossed, the state of an illness tentative diagnosis result that artificial intelligence technology provides is obtained;, clothes horizontal by on-line evaluation doctor medical skill Attitude of being engaged in etc.;Diagnosis, inspection, inspection and image, doctor's advice, medical history, pathology and expense etc. are checked by Patients ' Electronic health account Data.It mainly include three modules: evaluation service, Patients ' Electronic health account after patient advisory's service, medical treatment.
(1) patient advisory services: realizing and provides online interrogation service for patient.Patient provides original state of an illness symptom and retouches It states, data, the system such as image check text report, test rating value obtain model energy using OCR identification facility, structured techniques The data format of identification examines unknown sample using the model that intelligent algorithm constructs by test rating signature analysis It is disconnected to carry out classification prediction, the state of an illness result for predicting the patient is finally showed into patient, including thyroid nodule type, thyroid gland Good pernicious, Breast Tumors of type of surgery, thyroid tumors etc. achieve the purpose that instruct patient's medical treatment and health care.System Be integrated with including convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN), shot and long term memory unit recurrent neural network mould The intelligent algorithms such as type (LSTM), random forest, support vector machines, neural network, decision tree and K-means, construct Thyroid nodule and Breast Tumor Patients disease auxiliary diagnosis prediction model.
(2) service is evaluated after medical treatment: realizing rear evaluation of the patient to doctor's diagnosis and treatment process.Patient on the line of doctor to commenting Valence is a kind of effective doctor patient communication channel, is improved service quality for medical institutions and doctor, and gradually alleviating conflict between doctors and patients is There is great help.Doctor can be according to the evaluation and demand of patient come improvement, and medical institutions can be according to patient to doctor Overall evaluation situation give rewards and punishments appropriate.But the review number of single doctor may just have hundreds and thousands of in practice, Doctor's quantity of one medical institutions has several hundred or even thousands of, it will generates the evaluation of patient text information of magnanimity, manually Method needs to expend a large amount of energy to handle and analyze these information.System realizes the medical care evaluation body based on artificial intelligence System carries out emotional semantic analysis to evaluation of patient by machine, identifies front and unfavorable ratings automatically, count proportion. Doctor can quickly filter out unfavorable ratings, make improvement according to content;Medical institutions can be by department, doctor etc. to magnanimity Overall evaluation situation statistical analysis is carried out in evaluation information.
(3) Patients ' Electronic health account:
The clinical data for relying on Data Integration module to generate the different clinic information system such as HIS, PACS, LIS, RIS into Row and summarizes at unified integration, establish include patient essential information, diagnosis, inspection, inspection and image, doctor's advice, medical history, pathology With the personal electric health account unified view view of the data such as expense, it can be convenient patient and have access at any time, be diagnosis and treatment and scientific research Application is provided using clinical big data to support.
A) patient basis's dimension: name, gender, date of birth, passport NO., the contact method of main display patient Etc. essential informations;
B it) diagnoses dimension: showing all previous diagnosis records of patient etc.;
C it) examines dimension: showing all previous inspection record of patient in a tabular form;
D ultrasonic examination record and image of patient etc.) inspection and image dimension: are shown;
E) doctor's advice dimension: all kinds of doctor's advices of the record doctor to patient;
F) medical history dimension: the electronic health record record of patient;
G) pathology dimension: the pathology of patient is recorded;
H it) nurses dimension: showing the nursing record of patient, such as pulse, body temperature, blood pressure, breathing in graphical form;
I) physical examination dimension: display patient's physical examination record;
J) expense dimension: display statistics all kinds of expense details of patient.
Clinical data mining analysis and aid decision-making method based on internet integration medical platform, including data mining Analysis and aid decision, data mining analysis include multidimensional analysis algoritic module, data mining algorithm module, deep learning algorithm Module.Multidimensional analysis algoritic module is firstly the need of cube is established, due to its characteristic with many dimensions, multidimensional data Collection is usually visually known as data cube (Cube), and cube is a data acquisition system, usually from data warehouse It is middle to choose several data subsets, then organize and be aggregated into the multidimensional structure as defined in multiple dimensions and metric;Data mining Algoritic module provides the uniform registration including machine learning algorithms such as classification, cluster, correlation rule, regression analyses, using with Management is nullified, for being directed to the mining analysis of specific set of data, realizes clinical depth analysis, early warning and prediction;Deep learning Prediction algorithm module is integrated with the recurrence mind of convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN) and shot and long term memory unit Through network model (LSTM) scheduling algorithm.Aided remote decision by based on index parameter prediction module, be based on audit report text Four parts such as prediction module, model training module and structurized module composition.
One, multidimensional analysis algoritic module
Multidimensional data analysis is firstly the need of cube is established, due to its characteristic with many dimensions, multidimensional data Collection is usually visually known as data cube (Cube).Cube is a data acquisition system, usually from data warehouse It is middle to choose several data subsets, then organize and be aggregated into the multidimensional structure as defined in multiple dimensions and metric.
(1) dimension: refer to that user observes the angle of data, such as hospital is usually concerned about the medical change of number data at any time Change situation, this is the variation of medical number from coming from the angle of time, and at this moment the time is exactly a dimension.Include dimensional information Be dimension table, detailed value or factual data comprising saving metric are true tables.Dimension table includes description factual data The characteristic of true record in table.One dimension is usually with the rank of multiple and different granularities, the i.e. fine degree of viewing angle, example Such as time dimension can have year, season, the moon different granularity level.
(2) metric: being to have the numerical value of practical significance, such as medical number, drug usage amount etc..Metric is stored in In true table, true table is the core of analyzed cube, required when being end user's browsing cube The data checked
Multidimensional data analysis can be carried out upper volume to the data cube organized with multi-dimensional form, lower bore, is sliced, cutting The a variety of analyses operation such as block, rotation enables analyst, policymaker from multiple angles, multiple side observed numbers so as to profile data According to the data in library, so that understanding in depth includes information and intension in data.
Two, data mining algorithm module
Data mining algorithm module is provided including machine learning algorithms such as classification, cluster, correlation rule, regression analyses Uniform registration is managed using with cancellation, for being directed to the mining analysis of specific set of data, realizes clinical depth analysis, early warning With prediction.
Using data mining algorithm carry out model training, be broadly divided into case study, data prediction, data modeling, Outcome evaluation, knowledge apply this five steps:
1, case study.By the productive discussions with doctor, expert, the target of explicit data analysis defines data analysis Demand, setting data analysis expected results.
2, data prediction.Determine data analysis data source, construct reasonable data warehouse for data inquiry with Analysis.Utilize the databases such as sqlserver, mysql, the high-level languages such as java, python, the statistical means logarithm such as R, spss It mainly include data integration, data cleansing and data transformation according to necessary pretreatment work is carried out.
3, data modeling.Using processed data as input feature vector, suitable machine learning model is selected, or Have and is suitably modified on the basis of machine learning model, it is anti-during model training using objective attribute target attribute as output feature Parameter adjustment and model optimization are carried out again, obtain the optimal training pattern of effect.
4, outcome evaluation.After the completion of model construction, it would be desirable to carry out the assessment of science, to the reliability of model to ensure The convincingness of experimental result.Specifically, can by the methods of ROC curve and confusion matrix, using accuracy rate, recall rate, The indexs such as F1 value measure the quality of model.
5, knowledge application.When a model is demonstrated as a reliable, efficient, practical model, a last step It is that the application of model.All steps that all fronts are done are provided to using preparing, and the application of knowledge is only engineering Where the core value of habit.Go to predict unknown things using molding model, or seek unknown things and known things it Between connection, so that people be helped preferably to recognize unknown things.
The classification such as random forest, support vector machines, neural network, decision tree calculation is integrated in data mining algorithm module Method, K-means clustering algorithm, logistic regression, linear regression and association analysis scheduling algorithm.
(1) the basic classification thought of random forest is by the decision tree obtained after multiple technique drills by bagging Prediction result is provided by each base classifier, finally takes ballot when inputting a unknown sample to be measured for base classifier Mode obtain the prediction result of random forest.It is described in detail below: being put firstly, being concentrated with using bagging technology from sample The subset of the extraction same size returned repeats K times, that is, generates K sample set, then utilizes each self-generating of these sample sets One corresponding decision tree.Optimal decision tree is generated to each sample set, and knot finally is predicted into the classification of these decision trees Fruit is voted, classification prediction result of the several most results of getting tickets as Random Forest model.
(2) basic thought of support vector machines is by finding an optimal segmentation hyperplane, so that the plane is being protected Under the premise of card maximizes classification accuracy, the distance of both sides sample to hyperplane is maximum.Realizing Structural risk minization On the basis of, generalization ability is promoted, while making every effort to the minimum of empiric risk and confidence interval, accordingly even when less in sample size In the case where, it can also obtain good classifying quality.
(3) base unit of neural network is known as neuron, it simulates the nerve cell in human brain structure and carries out knowledge Study, by the way that neuron to be carried out to certain topological sorting, the nerve cell simulated in human brain interconnects and transmits information Mechanism achievees the purpose that autonomous learning.Neural network can be divided into several component parts in structure, be input layer respectively, hide Layer and output layer.Wherein input layer receives the initial data of extraneous input, and passes it to hidden layer;Hidden layer may include One layer or multilayer are responsible for internal information processing and conversion, and the information after conversion are passed to output layer;Information is exporting Layer is by outwardly exporting final result after last processing and conversion.But under actual conditions, unidirectional TRANSFER MODEL is simultaneously It cannot be guaranteed that the accuracy of result, needs to adjust the parameter and weight between each layer repeatedly by a large amount of duplicate experiments, with Seek obtaining preferable training pattern.
(4) regression analysis is a kind of statistical analysis of complementary quantitative relationship between two or more determining variable Method.According to the quantity for analyzing related variable, it is divided into simple regression analysis and multiple regression analysis;According to independent variable Quantity, linear regression analysis can be divided into simple regression analysis and multiple regression analysis;According to independent variable and because becoming Relationship type between amount can be divided into linear regression analysis and nonlinear regression analysis.In regression analysis, if only including one A independent variable and a dependent variable, and the relationship of the two can approximatively be indicated with straight line, this regression analysis just by Referred to as simple linear regression analysis.If in regression analysis include two or more independents variable, and dependent variable and independent variable it Between relationship be it is linear, then it is this analysis be referred to as multiple linear regression analysis.
(5) it whether there is certain dependence between the main data object of correlation analysis, and have dependence for specific The data object of relationship inquires into its related direction and degree of correlation.Correlativity is a kind of relationship of uncertainty, for example, with X and Y remembers the situation of change of two kinds of indexs (such as T3 and T4) of a hyperthyroid patient respectively, or is denoted as the height and weight of people, Then X and Y obviously have relationship, and not definitely to can go accurately to determine another degree by one of those, here it is Correlativity.
(6) basic thought of Apriori association analysis algorithm: the first step is retrieved in transaction database by iteration All frequent item sets, i.e. support are more than or equal to the item collection of minimum support set by user;Second step utilizes frequent item set structure Produce the rule for meeting user's min confidence.Specific practice is exactly: finding out frequent 1- item collection first, is denoted as L1;Then it utilizes L1 generates candidate C2, carries out decision analysis to the item in C2, excavates L2, i.e., frequent 2- item collection;Constantly so iteration is followed Ring goes down until can not find more frequently k- item collections.One layer of Lk of every excavation just needs to rescan an entire data Library.
(7) for K-means algorithm using manhatton distance or Euclidean distance as similarity measure, it is to ask corresponding a certain first Beginning cluster centre vector v most has classification, so that evaluation index J is minimum.The setting of the K value of K-means clustering algorithm is predefined , it can be customized by business experience or be obtained by algorithm checks, which represents the number of initial classes cluster centre point, There is large effect to final cluster result.The algorithm concentrates remaining each object to data in each iteration, according to Its class that the nearest class heart is re-assigned at a distance from each class center, after having investigated all data objects, once Interative computation is completed.
(8) temporal sequence association rule dig realize multiple time granularities time it is gauged, such as year, month, day multiple time granularity dimension The method indicated is spent, is indicated using Linear Segmentation and vector form cluster realizes that the Image Segmentation Methods Based on Features of time series and symbolism turn The thought changed.In addition when timing is excavated, often some subsequence of time series is excavated, in time series phase Sliding window is proposed in the application of Time Series Similarity dimensionality reduction technology like the determination of sliding window in property problem.
Three, deep learning prediction algorithm module
Deep learning prediction algorithm module is integrated with convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN) and shot and long term Recurrent neural networks model (LSTM) scheduling algorithm of memory unit.
(1) convolutional neural networks (CNN)
In convolutional neural networks, (it is at C1 layers with the trainable filter fx image inputted that deconvolutes Input picture, convolutional layer input later are then the convolution characteristic patterns of preceding layer), it (is generally used by an activation primitive It is Sigmoid function), then plus a biasing bx, convolutional layer Cx is obtained.Concrete operation such as following formula, Mj is input feature vector in formula The value of figure:
(2) Recognition with Recurrent Neural Network (RNN)
Why RNNs is known as circulation neural network, i.e. the output of a sequence current output and front is also related.Tool The form of expression of body is that network can remember the information of front and be applied in the calculating currently exported, i.e., between hidden layer Node it is no longer connectionless but have connection, and not only the output including input layer further includes upper a period of time for the input of hidden layer Carve the output of hidden layer.Theoretically, RNNs can be handled the sequence data of any length.
(3) recurrent neural networks model (LSTM) of shot and long term memory unit
LSTM is the extension on basic RNN.LSTM is different from the place of RNN, is mainly that it is added in the algorithm The structure of one " processor " judged whether information is useful, the effect of this processor is referred to as cell.In one cell Three fan doors have been placed, has been called input gate respectively, forgets door and out gate.One information enters in the network of LSTM, can be with It is according to rule to determine whether useful.The information for only meeting algorithm certification can just leave, and the information not being inconsistent then passes through forgetting door It passes into silence.Say be exactly nothing but one-in-and-two-out working principle, can but be solved under operation repeatedly in neural network long-term Existing big problem.
Aided remote decision: thyroid gland, coronary heart disease and the tumour of selection endocrine subject, cardiovascular subject and tumour subject Research object etc. several diseases as data collection and analysis relies on unified platform acquisition to integrate clinical medical data, real The medical diagnosis aid decision-making system towards thyroid nodule, coronary heart disease and tumor of breast etc. is showed, has been examined for clinician's clinic Disconnected and scientific research personnel's disease research provides system and supports.Mainly by based on index parameter prediction module, based on audit report text Four parts such as this prediction module, model training module and structurized module composition.
A) based on the prediction module of index parameter: according to the information such as patient's outpatient service serial number or medical insurance card number, Ke Yicha Ask the test rating and audit report text of the related disease of the patient.Structuring achievement data can be directly inputted;It is right In non-structured audit report, structuring is carried out using structuring submodule and obtains the data format that model can identify.
B the prediction module) based on audit report text:, can according to the information such as patient's outpatient service serial number or medical insurance card number To inquire the audit report text of the patient.For non-structured audit report text, deep learning algorithm is directly utilized It is predicted.
C) model training module: belonging to the basic module of system, invisible to user.By the thyroid gland knot of multiple databases The data such as the relevant clinical audit reports of the medical information systems such as section, coronary heart disease, mammary gland, test rating merge processing, collect At into unified tables of data, model training is carried out.
D) structurized module: realize that the structuring to ultrasonic report text data is handled, the ultrasound for extracting various samples is special Sign includes the index value of Tumor size, boundary, echo distribution, echo intensity etc. and each index, and forms retouching by each sample State template.Based on the template, the processing of the structuring to ultrasonic content of text is realized.
The present invention selects endocrine subject, cardiovascular subject and the hyperthyroidism of tumour subject, diabetes, thyroid nodule, cream Several diseases of adenoncus tumor as research object, realize towards hyperthyroidism, diabetes, thyroid nodule, tumor of breast disease disease Sick clinical data analysis excavates.
Embodiment 1: hyperthyroidism clinical data analysis excavates
Hyperthyroidism clinical data analysis excavate realize to the clinical data of hyperthyroidism disease carry out multidimensional analysis, association analysis and Clustering etc..True clinical medical data of the hyperthyroidism clinical data from multicenter clinic big data platform.Hyperthyroidism clinic number According to the general frame of analysis mining as shown in Fig. 2, be largely divided into four levels, be respectively data active layer, data Layer, analysis layer and Application layer.The data put in order are loaded among HANA, later from raw data set by preprocessing process such as ETL by data Relevant analysis purpose is realized in conjunction with correlation analysis algorithm using analysis tool.
Data source: this level corresponds to initial data, is the major database of structured data.It contains basic The information such as information table, diagnosis records table, survey report table, test rating table, diagnostics table, prescription detail list.This layer is thereon The operation basis of face all levels, it is necessary to assure work normally.
Data preparation layer: this level is responsible for integrate from the data cleansing of data source, constructs and meets multidimensional point The data set of analysis model will extract the data acquisition system excavated simultaneously for association rule mining later.Below Whether data analysis and the result excavated are accurate, whether thorough, the number of building of the data cleansing for this level that places one's entire reliance upon Whether meet the requirements according to set.
Data Layer: this level is the realization of Data Analysis Model and mining model.I.e. by ready data, there are this Among layer, pending datas analysis and tap layer is waited to operate on it.
Analysis layer: analysis layer is the core level of whole system, including multidimensional analysis and the big module of association rule mining two. Multidimensional analysis module designs suitable Star Model, to complete the analysis work of specific subject according to analysis theme;Association Rule digging module is then in alignment with the associated data set got ready and does specific excavation, if data set owner according to the theme of excavation and It is fixed.
Application layer: the processing of layer by analysis can generate experimental result, how clearly show result, with regard to this The work of this level.Result and digging are analyzed using the WebI component of SAP and the visualization analysis tools of HANA to show respectively Dig result.
The level of system is divided into three big modules, first be initial data preprocessing module, which passes through to clinic Tables of data carries out cleaning, obtains the clinical medical data for meeting specification, completes data cube and associated data again later The building task of collection;Second module is data analysis module, which will complete base according to the multidimensional analysis models defined Three eigen analysis, analysis of drug use and index analysis subtasks;Third is data-mining module, this module will mainly close Connection rule mining algorithms are applied in associated data set, obtain the result set of correlation rule.It is last then using WebI and HANA Visualization technique shows result.
Embodiment 2: diabetes clinical data analysis and excavation
Diabetes clinical data analysis, which excavates to realize, carries out multidimensional analysis, correlation analysis to the clinical data of diabetes With diagnostic event timing excavate etc..True clinical medical data of the diabetes clinical data from integral system platform.
According to the function of diabetes clinical data analysis application system, Fig. 3 is the hierarchical chart of system, mainly there is three A functional module.First module is data preprocessing module, because may require that different structure for different analysis demands Data carry out data needed for processing obtains multidimensional analysis and Time-Series analysis to clinical diabetes database;Second module be Diabetes multidimensional analysis module, multidimensional analysis are one of the nucleus module for analyzing process layer, the multidimensional analysis to diabetes data The feature (such as: age, gender, area) of diabetic patient population can be observed according to diagnosis, index and medicining condition;Third A module is that timing excavates module, and timing excavation, which is analysis another nucleus module of process layer, to be often accompanied by for diabetes The reality of more complication excavates a timing of the complication occurrence regularity of clinical diabetes.
Diabetes clinical data analysis application system hierarchical chart is followed successively by data active layer, ETL layers, analysis from top to bottom Next process layer and application layer briefly introduce basic function at all levels.
1. data active layer: being the major database for storing clinical structural data.Contain Basic Information Table, visiting hospital register Information table, clinical diagnosis table, prescription detail list, clinical examination index table etc..This layer is the basis that system operates normally, and Post analysis work initial data.
2. data preparation layer: in data analysis, the data structure that different analysis methods generally requires is also different, this Process layer is analyzed in text, and mainly there are two analysis directions: the Time-Series analysis of multidimensional analysis and diagnostic event.So in data preparation Layer is divided into the processing method of three subdivisions again, is data cleansing, building multi-dimension data cube, building diagnostic event sequence respectively.
3. analyzing process layer: analysis process layer is the core level of diabetes clinical data analysis application system, layer master Will be there are two functional module, the Time-Series analysis module of multidimensional analysis module and diagnostic event respectively corresponds on-line analytical processing sum number According to excacation.Multidimensional analysis module designs Star Model according to analysis theme, Data Mart is constructed, thus analysis of diabetes The essential characteristic of PATIENT POPULATION;Time-Series analysis module examines patient in the primary medical diagnosis of hospital as one of the patient Disconnected event just has time upper successive concept between such diagnostic event, and each patient's body is exactly a sequence, finally to true A sequence sets carry out the excavation of frequent mode, obtain the frequent mode for meeting minimum support.
4. applying presentation layer: by analyzing process layer, mainly carrying out probe into application in terms of two herein, be multidimensional respectively Analysis and Time-Series analysis.Multidimensional analysis module is mainly corresponding with analysis of drug use, index analysis and diabetes and its complication Diagnostic analysis;Time-Series analysis module mainly to analysis of experimental results, is tied experiment from the setting of minimum support and confidence level Fruit analysis.The result of analysis can be presented in system in a manner of chart, report etc..
It can be three main module compositions according to the hierarchical chart of system.First be data preparation layer data Preprocessing module, the module mainly have diagnostic event sequence three data cleansing, building multi-dimension data cube, building operations.Second A module is to analyze the multidimensional analysis module of process layer, according to analysis demand tissue multidimensional data, mainly there is analysis of drug use, index Analysis and three parts of clinical analysis of diagnosis.Third module is to analyze the timing excavation module of process layer, proposes NFPS frequency Numerous mode discovery algorithm.Finally, use intuitiveization of the result of multidimensional analysis is (such as: chart, datagram to analysis result visualization Table etc.) form presentation come out.
Main function of system is divided into two lines, is multidimensional analysis and time series analysis respectively, finally carrying out to result can Depending on changing.Firstly, system first pre-processes the data of acquisition, pretreatment does different processing according to the difference of analysis demand, Then multidimensional analysis and the Time-Series analysis of diagnostic event are respectively enterd, finally by the result pictorialization of multidimensional analysis, and to timing The algorithm of excavation compares experiment and carries out correlation analysis to the reliability of result set.
During analyzing clinical diabetes diagnosis, discovery diabetes are frequently accompanied by many complication, Er Qie There are different complication in the different state of an illness stages, whether there is certain association between diabetic complication to explore.By There is no the concept of time series in traditional Multi-relational frequent pattern discovery, in order to find successive association between event, is faced according to diabetes The characteristics of bed diagnostic data set, Multi-relational frequent pattern discovery Time-Series analysis is carried out, NFPS algorithm is proposed.The step of algorithm, describes such as Under:
It inputs diagnostic event sequence sets D and time window constrains G;
1) Effect-Sequence algorithm is carried out, the sequence sets D ' for meeting time window constraint is obtained
2) ergodic data collection calculates a frequent item collection
3) it is connected by K item collection and obtains K+1 candidate
4) if carrying out cut operator // K item collection according to minimum support is frequently, then all items of K-1 item collection also must It must be frequent
5) Recursive Implementation step 3) and the item collection 4) until not expiring minimum support
6) the fuzzy frequent itemsets S ' of diagnostic event sequence is obtained
From above-mentioned steps as can be seen that the algorithm mainly consists of three parts, the validation of a sequence sets is carried out first, What this step executed is actually Effect-Sequence algorithm, obtains candidate frequent episode from connection followed by item collection;Finally It is beta pruning part, can all has cut operator in recursive each step, if it is frequent that the foundation of cut operator, which is K item collection, then The K-1 item collection for forming K item collection also must all be frequent.
Embodiment 3: thyroid nodule clinical data analysis excavates
Thyroid nodule clinical data analysis, which excavates to realize, carries out multidimensional analysis, disease to the clinical data of thyroid nodule Disease classification and good pernicious differentiation etc..True clinical medical data of the thyroid nodule clinical data from clinical data platform.
Fig. 4 is that thyroid nodule clinical data analysis excavates integrated stand composition, can be seen that thyroid nodule is clinical in figure Data analysis mining is mainly made of input module, training module and prediction and display module three parts, wherein data prediction The basic module of system is partly belonged to, it is invisible to user.User provides initial data, is obtained using microstructured tool or technology Then the data format that model can identify carries out model construction using machine learning algorithm, best model is selected to carry out unknown sample This diagnostic classification prediction, finally shows user to consult result.
The data of multiple files or multiple databases are mainly merged processing by data integration, relate generally to data Selection, data conflict and data it is inconsistent the problems such as processing problem.During data integration, need to consider field Definition, the selection of data type etc..The data of thyroid disease are more dispersed, are distributed in different database tables, number Effectively these data can be integrated according to integrating process.For example, extracting the basic of patient from patient basis's table Information such as gender, age etc.;Index name, index value etc. are extracted from Index for examination table;Diagnosis name etc. is extracted from diagnostics table, By completing to be integrated into unified tables of data after extracting to aforesaid operations.
It finds in practice, extracting related data by multiple database tables and integrating to a database table, treated counts It is many according to missing values, it needs to be further processed.By serial number of going to a doctor, the inspection record extraction of patient is handled, and is integrated into One record.There are missing values for T3 the and T4 index of most of patient, it is necessary to be handled in order to avoid influencing subsequent analysis result.This It is literary mainly to take the method deleted missing column and replace missing values using regression analysis.For the category containing most missing values Property, it is necessary to it deletes.It, can be with completion missing values for containing the sample of a small amount of missing values.
The quantity that medical system generates data is big, process is complicated, repetition, missing even mistake of data etc. inevitably occurs, In order to reduce the interference of these noises, accuracy when improving followed by classification can be by there is the data cleansing of supervision calculation Method obtains effective data.
Data cleansing mainly includes the noise data and extraneous data removed in original data set, and processing missing values and cleaning are dirty Data etc., and complete some data type conversion work.The data cleansing process of this thyroid gland medical data is medical special Under family's guidance, the data after integrated are analyzed and processed, noise data and Data duplication record is removed, fills up missing data. Generally for the processing method of missing values, following method can be taken according to different situations:
(1) missing values are deleted: whether including key message according to each record to determine reservation or deletes the note Record.If the attribute value of a record missing is too many, or not includes key message, such as the card number or medical stream of patient When water number lacks, the record is deleted herein.
(2) constant value method of substitution: the attribute value of all missings is filled with the same constant such as NULL, and this method is very simple It is single, it is mainly used for the processing of the fields such as the marital status in patient's Basic Information Table.
(3) when field is numeric type data, all missings mean value method of substitution: can be filled with the average value of the attribute Value.Such as in patient's Basic Information Table, the age of a small number of patients without the date of birth takes the average value of all patient ages.
(4) estimated value method of substitution: the predicted value of the attribute missing values is obtained with the methods of regression analysis, is filled with it scarce Mistake value.Such as the missing values of T3 and T4 are calculated by regression analysis in test rating table.It is used during executing data cleansing Which type of processing method will be determined according to the concrete condition of data set and related request.Regression analysis pair is utilized herein Missing values are handled.
Data conversion is mainly to reduce the number of useful variable or find the invariants of data, including normalization, conclude, cut The operation such as change, rotate and project.Computational efficiency can be greatly improved by data transformation, and the starting point of Knowledge Discovery can be improved. For example for Gender, male's attribute value can be set as to 0, corresponding women attribute value is set as 1.Diagnostic result point in diagnostics table Abnormal higher to be normal, corresponding attribute value can be used 1,2 and 3 replacement by abnormal relatively low three kinds of situations respectively.
The disaggregated model of thyroid disease
For the clinical data of thyroid disease, a kind of classification side of thyroid disease type based on random forest is proposed Method, this method use Principal Component Analysis to carry out feature selecting to data set first, reduce data dimension, then using random gloomy Woods algorithm realizes classification.The model mainly analyzes many index data set in the serum of patient using data mining, from And realize the classification of thyroid disease, it is mainly made of three phases, method flow is shown in Fig. 5.
First stage: data prediction.
Firstly, it is necessary to establish disaggregated model by training set.In actual inspection, due to the 5 of thyroid function inspection Index is not essential items for inspection, and there are a large amount of missing values in the medical data base of hospital, so needing to carry out data prediction.I The object that extracts be the patient that thyroid disease is diagnosed as in diagnostics table, by patient's Basic Information Table, inspection result index table It is merged with the data of multiple tables such as diagnostics table by the way that multilist is operation associated, by selection, transformation, a series of ETL such as integrates Operation obtains related data, establishes the thyroid gland integrated data set comprising patient's essential information and test rating.According to medical stream Water number is by the data preparation of each patient at including gender, age, the total triiodo thryonine of serum (T3), thyroxine (T4), free serum triiodothyronine (FT3), free thyroxine (FT4), the attributes such as thyrotropic hormone (TSH) A plurality of record.
Second stage: Feature Dimension Reduction.
Feature selecting (Feature Selection) is also referred to as Attributions selection (Attribute Selection), refers to root According to certain criterion from known feature set, it is advantageously selected for distinguishing the character subset of different classes of data.Feature selecting can pick Except some incoherent features, Characteristic Number is reduced, raising model is accurate, reduces runing time.
In order to retain the main information of initial data, dimension-reduction treatment is carried out to data using Principal Component Analysis herein.It will Thyroid gland data characteristics collection F random division is at k subset Fij, FijIt indicates for training classifier DiJ-th of character subset.It takes out The sample of each subset 75% is taken to establish new subset, it is therefore an objective to improve the otherness of base classifier.To new thyroid gland data Character subset carries out feature selecting, reduces attribute dimensions.
Principal Component Analysis is a kind of statistical method of dimensionality reduction, its effect can be reduced comprising a large amount of association attributes numbers According to the dimension of collection.In order to extract main information from multidimensional data, the primitive attribute space of thyroid gland data set is transformed into category The incoherent new space of property.The principal component that original variable is identified by linear combination, defines original thyroid gland data set Maximum middle variance is first principal component, is Second principal component, in remaining data set.Because variance accumulative perception has letter Unisexuality and qualified performance are so for determining the number of principal component.Accumulative variance percentage is usually between 70% and 90% Range carries out the selection of principal component when higher than defined threshold value.
Enabling each attribute of thyroid gland data is xi, then
Ci=ai1x1+ai2x2+······+aipxp, i=1,2 ...,
Wherein, x1,x2,...,xpFor stochastic variable, aijReferred to as principal component coefficient, if Var (Ci) is maximum, Ci is referred to as First principal component.Similarly, can have second, third, the 4th ... principal component, at most have p.
M is enabled to represent all principal component numbers, p indicates most important principal component quantity in principal component, and p is that m ties up thyroid gland The quantity of principal component in data with highest variance yields, by analyzing visible p≤m, it is achieved that original thyroid gland data The dimensionality reduction of collection.
Phase III: classification prediction.
The selection of base classifier can have large effect to the precision of Ensemble classifier algorithm, in order to improve thyroid gland disease The precision of disease classification, compared common base classifier algorithm such as Naive Bayes, SMO, C4.5 on thyroid gland data set Performance.
Since there are large amount of text information for Thyroid ultrasound index, examined using a kind of pathology based on interdependent syntactic analysis Report structure method is looked into, detailed process is as follows: firstly, for a variety of description feelings of same index frequently occurred in pathological replacement Condition is pre-processed, and finds out term vector using neural network model, is calculated cosine similarity on this basis and is found out synonym, advises The text expression of model pathologic finding report, while cutting short sentence and introducing word information labeling method and simplify a sentence structure, it reduces The height of dependency tree improves the accuracy of structured result so that grammatical relation be made to be more clear;Followed by interdependent sentence Method is analyzed to obtain the dependency tree of each short sentence, extracts index and corresponding index using gained semantic feature and part of speech feature Non-structured text, can be converted to the structured stencil of key-value form by value;Finally markup information is restored, simultaneously Correct noise data.According to the difference for realizing function, total process can be divided into 3 modules: preprocessing module, knot Structure module, post-processing module.
Embodiment 4: tumor of breast clinical data analysis excavates
The excavation of tumor of breast clinical data analysis, which is realized, to be associated analysis to the clinical data of tumor of breast disease, gathers Alanysis and classification analysis etc..True clinical medical data of the tumor of breast clinical data from clinical big data platform.
Tumor of breast Data Mart mainly includes following five base tables: mammary gland patient checks summary table, breast X-ray report Table, breast ultrasound account, breast puncture result table and mammary gland pathological account.
Breast ultrasound diagnostic result is divided into different brackets, and the final pathological diagnosis result of patient is divided into benign with pernicious two Kind.
It is proposed a kind of new knowledge mapping inference method based on TransR-DNN, the mammary gland for constructing high anticipation accuracy rate is swollen The good pernicious discrimination model of tumor.According to the tumor of breast knowledge mapping that completion is perfect, a kind of new knowledge based map is proposed Inference Forecast algorithm, carry out Breast Tumors prediction differentiate.Tumor of breast clinical fact knowledge graph is analyzed first Modal data amount is big, and comprising abundant semantic space, and there are the relationships of a large amount of multi-to-multis, herein based on translation conversion TransR On model, new model TransR-DNN learning algorithm is proposed, by predicting link and the entity of triple, to obtain accurately The higher Breast Tumors of rate predict discrimination model.New model finally is assessed from accuracy, recall rate and F1 score, and And comparative experiments is carried out on time complexity and time loss, it is more excellent to demonstrate new model performance.
In conclusion using technical solution of the present invention, the clinical data based on internet integration medical platform is dug Pick analysis and aid decision-making method, unstructured clinical document are input to clinical document structuring processing engine, are cured by clinic The means processing such as corpus, rule, full-text search and machine learning is learned, obtains structural data and be output to distributed storage drawing It holds up, is handled by intelligent algorithm, for Platform Analysis, shown;The present invention is by text non-structured in clinical data Notebook data carries out structuring processing, stores into distributed Hadoop cluster, realizes Distributed Storage mode and distribution Calculation processing, and the programming in software application is realized and is transformed and is adapted to for distributed nature.

Claims (6)

1. clinical data mining analysis and aid decision-making method based on internet integration medical platform, it is characterised in that: packet Data mining analysis and aid decision are included, data mining analysis includes multidimensional analysis algoritic module, data mining algorithm module, depth Spend learning algorithm module;Multidimensional analysis algoritic module is firstly the need of cube is established, and cube is from data warehouse Several data subsets are chosen, then organize and be aggregated into the multidimensional structure as defined in multiple dimensions and metric;Data mining is calculated Method module provides the uniform registration including machine learning algorithms such as classification, cluster, correlation rule, regression analyses, using with note Pin pipe reason realizes clinical depth analysis, early warning and prediction for being directed to the mining analysis of specific set of data;Deep learning is pre- Method of determining and calculating module is integrated with the recurrent neural of convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN) and shot and long term memory unit Network model (LSTM) scheduling algorithm;Aided remote decision by based on index parameter prediction module, based on audit report text Prediction module, model training module and structurized module composition.
2. clinical data mining analysis and aid decision side as described in claim 1 based on internet integration medical platform Method, which is characterized in that the multidimensional analysis algoritic module to the data cube organized with multi-dimensional form carry out upper volume, under Brill, slice, stripping and slicing, a variety of analyses of rotation operate, and so as to profile data, enable analyst, policymaker from multiple angles, multiple sides The data in database are observed in face, so that understanding in depth includes information and intension in data.
3. clinical data mining analysis and aid decision side as claimed in claim 2 based on internet integration medical platform Method, which is characterized in that the multidimensional analysis algoritic module includes dimension and metric, and the dimension is the angle that user observes data Degree, what it is comprising dimensional information is dimension table, and detailed value or factual data comprising preservation metric are true table, dimension table packet The characteristic of true record in the fact table containing description;The metric is the numerical value with practical significance, metric storage In true table.
4. clinical data mining analysis and aid decision side as claimed in claim 3 based on internet integration medical platform Method, which is characterized in that it is described using data mining algorithm carry out model training, be broadly divided into case study, data prediction, Data modeling, outcome evaluation, knowledge apply five steps:
(1) case study: by the productive discussions with doctor, expert, the target of explicit data analysis defines the need of data analysis It asks, the expected results of setting data analysis;
(2) data prediction: determining the data source of data analysis, constructs reasonable data warehouse for the inquiry of data and divides Analysis, using sqlserver, mysql database, java, python high-level language, R, spss statistical means must to data progress The pretreatment work wanted;
(3) data modeling: using processed data as input feature vector, suitable machine learning model is selected, or existing It is suitably modified on the basis of machine learning model, using objective attribute target attribute as output feature, during model training repeatedly Parameter adjustment and model optimization are carried out, the optimal training pattern of effect is obtained;
(4) outcome evaluation: after the completion of model construction, it would be desirable to carry out the assessment of science, to the reliability of model to ensure reality The convincingness of result is tested, specifically, accuracy rate, recall rate, F1 can be utilized by the methods of ROC curve and confusion matrix The indexs such as value measure the quality of model;
(5) it knowledge application: goes to predict unknown things using molding model, or seeks between unknown things and known things Connection, so that people be helped preferably to recognize unknown things.
5. clinical data mining analysis and aid decision side as claimed in claim 4 based on internet integration medical platform Method, which is characterized in that be integrated with random forest in the data mining algorithm module, support vector machines, neural network, return and divide Analysis, correlation analysis, Apriori association analysis, K-means clustering algorithm;
(1) random forests algorithm is described in detail below: firstly, being concentrated with the extraction phase put back to from sample using bagging technology With the subset of size, repeats K times, that is, generate K sample set, it is then corresponding using each self-generating of these sample sets one Decision tree generates optimal decision tree to each sample set, and finally the classification prediction result of these decision trees is voted, Classification prediction result of the several most results of getting tickets as Random Forest model;
(2) algorithm of support vector machine finds an optimal segmentation hyperplane, so that the plane is guaranteeing to maximize classification accurately Under the premise of rate, the distance of both sides sample to hyperplane is maximum;
(3) neural network algorithm can be divided into several component parts in structure, be input layer, hidden layer and output layer respectively, Middle input layer receives the initial data of extraneous input, and passes it to hidden layer;Hidden layer may include one layer or multilayer, It is responsible for internal information processing and conversion, and the information after conversion is passed into output layer;Information is in output layer by last Processing outwardly exports final result with after conversion;
(4) algorithm with regress analysis method is divided into simple regression analysis and multiple regression according to the quantity for analyzing related variable Analysis;According to the quantity of independent variable, linear regression analysis can be divided into simple regression analysis and multiple regression analysis;It presses According to the relationship type between independent variable and dependent variable, linear regression analysis and nonlinear regression analysis can be divided into;
(5) it whether there is certain dependence between correlation analysis algorithm data object, and have dependence for specific Data object inquire into its related direction and degree of correlation;
(6) Apriori association analysis algorithm, the first step retrieve all frequent item sets in transaction database by iteration;The Two steps construct the rule for meeting user's min confidence using frequent item set;
(7) for K-means algorithm using manhatton distance or Euclidean distance as similarity measure, it is to ask corresponding a certain initial poly- Class center vector v most has classification, so that evaluation index J is minimum.
6. clinical data mining analysis and aid decision side as described in claim 1 based on internet integration medical platform Method, which is characterized in that the prediction module based on index parameter: according to patient's outpatient service serial number or medical insurance card number information, The test rating and audit report text for inquiring the related disease of the patient, can directly input structuring achievement data; For non-structured audit report, structuring is carried out using structuring submodule and obtains the data format that model can identify;
The prediction module based on audit report text: according to the information such as patient's outpatient service serial number or medical insurance card number, inquiry To the audit report text of the patient;For non-structured audit report text, directly carried out using deep learning algorithm pre- It surveys;
The model training module: by the relevant clinical audit report of multiple database medical information systems, test rating data Processing is merged, is integrated into unified tables of data, model training is carried out;
The structurized module: realizing that the structuring to ultrasonic report text data is handled, extract the ultrasonic feature of various samples, And the index value of each index, and form the description template by each sample.
CN201910101985.XA 2019-02-01 2019-02-01 Clinical data mining analysis and aid decision-making method based on internet integration medical platform Pending CN109830303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910101985.XA CN109830303A (en) 2019-02-01 2019-02-01 Clinical data mining analysis and aid decision-making method based on internet integration medical platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910101985.XA CN109830303A (en) 2019-02-01 2019-02-01 Clinical data mining analysis and aid decision-making method based on internet integration medical platform

Publications (1)

Publication Number Publication Date
CN109830303A true CN109830303A (en) 2019-05-31

Family

ID=66863183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910101985.XA Pending CN109830303A (en) 2019-02-01 2019-02-01 Clinical data mining analysis and aid decision-making method based on internet integration medical platform

Country Status (1)

Country Link
CN (1) CN109830303A (en)

Cited By (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110277172A (en) * 2019-06-27 2019-09-24 齐鲁工业大学 A kind of clinical application behavior analysis system and its working method based on efficient negative sequence mining mode
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform
CN110477954A (en) * 2019-07-08 2019-11-22 无锡海斯凯尔医学技术有限公司 Detection device based on elastogram
CN110544528A (en) * 2019-08-29 2019-12-06 中南大学 advanced learning-based upper and lower ophthalmic remote diagnosis platform and construction method thereof
CN110569372A (en) * 2019-09-20 2019-12-13 四川大学 construction method of heart disease big data knowledge graph system
CN110570921A (en) * 2019-08-20 2019-12-13 广东省第二中医院(广东省中医药工程技术研究院) Clinical information processing system based on single disease category
CN110675952A (en) * 2019-09-19 2020-01-10 上海腾程医学科技信息有限公司 Checking decision method and device, terminal equipment and computer readable storage medium
CN110737731A (en) * 2019-10-25 2020-01-31 徐州工程学院 accumulation fund user data refinement analysis system and method based on decision tree
CN110767317A (en) * 2019-08-30 2020-02-07 贵州力创科技发展有限公司 Cloud computing platform and method based on data mining and big data analysis
CN110875095A (en) * 2019-09-27 2020-03-10 长沙瀚云信息科技有限公司 Standardized clinical big data center system
CN111081016A (en) * 2019-12-18 2020-04-28 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN111341456A (en) * 2020-02-21 2020-06-26 中南大学湘雅医院 Method and device for generating diabetic foot knowledge map and readable storage medium
CN111382151A (en) * 2020-03-23 2020-07-07 新疆大学 CT medical image cleaning method based on data desensitization
CN111428930A (en) * 2020-03-24 2020-07-17 中电药明数据科技(成都)有限公司 GBDT-based medicine patient using number prediction method and system
CN111696666A (en) * 2020-06-10 2020-09-22 杭州联众医疗科技股份有限公司 Intelligent chronic disease management system based on time coding
CN111696669A (en) * 2020-06-15 2020-09-22 山东搜搜中医信息科技有限公司 Quantitative diagnosis of clinical multidimensional data of traditional Chinese medicine
CN111696665A (en) * 2020-06-10 2020-09-22 杭州联众医疗科技股份有限公司 Auxiliary decision making system based on time coding
CN111696675A (en) * 2020-05-22 2020-09-22 平安国际智慧城市科技股份有限公司 User data classification method and device based on Internet of things data and computer equipment
CN111710427A (en) * 2020-06-17 2020-09-25 广州市金域转化医学研究院有限公司 Cervical precancerous early lesion stage diagnosis model and establishment method
CN111724898A (en) * 2020-06-15 2020-09-29 中国医学科学院医学信息研究所 Intelligent skin disease monitoring and early warning system based on big data technology
CN111739639A (en) * 2020-06-19 2020-10-02 杭州联众医疗科技股份有限公司 Perioperative complication data prediction system based on time coding
CN111768846A (en) * 2020-05-27 2020-10-13 医利捷(上海)信息科技有限公司 Clinical data management method
CN111863267A (en) * 2020-07-08 2020-10-30 首都医科大学附属北京天坛医院 Data information acquisition method, data analysis device and storage medium
CN111899828A (en) * 2020-07-31 2020-11-06 青岛百洋智能科技股份有限公司 Knowledge graph driven breast cancer diagnosis and treatment scheme recommendation system
CN111914026A (en) * 2020-07-31 2020-11-10 南京朗赢信息技术有限公司 General data exchange sharing service platform
CN111949801A (en) * 2020-07-27 2020-11-17 西北工业大学 Knowledge graph fusion method of doctor experience knowledge and ultrasonic image information
CN111951976A (en) * 2020-08-21 2020-11-17 上海交通大学医学院附属第九人民医院 Value judgment method, system, terminal and medium based on medical data margin
CN111984987A (en) * 2020-09-01 2020-11-24 上海梅斯医药科技有限公司 Method, device, system and medium for desensitization and reduction of electronic medical record
WO2020233254A1 (en) * 2019-07-12 2020-11-26 之江实验室 Medical data analysis system integrating structured image data
CN112037925A (en) * 2020-07-29 2020-12-04 郑州大学第一附属医院 LSTM algorithm-based early warning method for newly-released major infectious diseases
CN112100286A (en) * 2020-08-14 2020-12-18 华南理工大学 Computer-aided decision-making method, device and system based on multi-dimensional data and server
CN112150209A (en) * 2020-06-19 2020-12-29 南京理工大学 Construction method of CNN-LSTM time sequence prediction model based on clustering center
CN112185585A (en) * 2020-11-03 2021-01-05 浙江大学滨海产业技术研究院 Diabetes early warning method based on metabonomics
CN112200374A (en) * 2020-10-15 2021-01-08 平安国际智慧城市科技股份有限公司 Medical data processing method, device, electronic equipment and medium
CN112286985A (en) * 2020-10-13 2021-01-29 江苏云脑数据科技有限公司 Clinical research statistical analysis system based on cloud computing
CN112365976A (en) * 2020-11-14 2021-02-12 南昌大学第二附属医院 Compound disease clinical path construction method and system based on transfer learning
CN112380763A (en) * 2020-11-03 2021-02-19 浙大城市学院 System and method for analyzing reliability of in-pile component based on data mining
CN112446862A (en) * 2020-11-25 2021-03-05 北京医准智能科技有限公司 Dynamic breast ultrasound video full-focus real-time detection and segmentation device and system based on artificial intelligence and image processing method
TWI723868B (en) * 2019-06-26 2021-04-01 義守大學 Method for applying a label made after sampling to neural network training model
CN112635074A (en) * 2020-12-21 2021-04-09 云南省疾病预防控制中心 AIDS prevention and treatment decision method based on multi-data analysis model
CN112786126A (en) * 2020-12-31 2021-05-11 天津开心生活科技有限公司 Time sequence analysis method and device of clinical test data, electronic equipment and medium
CN112927810A (en) * 2021-03-23 2021-06-08 崔剑虹 Smart medical response method based on big data and smart medical cloud computing system
CN112988783A (en) * 2021-03-12 2021-06-18 李涛 Public opinion occurrence time sequence analysis method based on multidimensional data model
CN113053479A (en) * 2019-12-27 2021-06-29 天津幸福生命科技有限公司 Medical data processing method, device, medium and electronic equipment
CN113096817A (en) * 2021-04-13 2021-07-09 北京大学 Method, apparatus, computer device and storage medium for disease prediction
CN113160999A (en) * 2021-04-25 2021-07-23 厦门拜特信息科技有限公司 Data structured analysis system and data processing method for medical decision
CN113177040A (en) * 2021-04-29 2021-07-27 东北大学 Full-process big data cleaning and analyzing method for aluminum/copper plate strip production
CN113314201A (en) * 2021-06-17 2021-08-27 南通市第一人民医院 Neurology clinical nursing potential safety hazard analysis method and system
WO2021175038A1 (en) * 2020-11-13 2021-09-10 之江实验室 Patient data visualization method and system for assisting decision-making in chronic disease
CN113380360A (en) * 2021-06-07 2021-09-10 厦门大学 Similar medical record retrieval method and system based on multi-mode medical record map
CN113436747A (en) * 2021-07-20 2021-09-24 四川省医学科学院·四川省人民医院 Medical data clinical auxiliary system and method based on medical data analysis model
CN113436745A (en) * 2021-06-30 2021-09-24 四川大学华西医院 Artificial intelligence auxiliary diagnosis method based on database analysis
CN113539471A (en) * 2021-03-26 2021-10-22 内蒙古卫数数据科技有限公司 Auxiliary diagnosis method and system for hyperplasia of mammary glands based on conventional inspection data
CN113539414A (en) * 2021-07-30 2021-10-22 中电药明数据科技(成都)有限公司 Method and system for predicting rationality of antibiotic medication
CN113611411A (en) * 2021-10-09 2021-11-05 浙江大学 Body examination aid decision-making system based on false negative sample identification
CN113609555A (en) * 2021-07-16 2021-11-05 黄河勘测规划设计研究院有限公司 Hydraulic metal structure design method based on big data technology
CN113628734A (en) * 2021-08-02 2021-11-09 浙江海心智惠科技有限公司 Design method of oncology electronic medical advice system with clinical decision intelligent recommendation function
CN113674867A (en) * 2021-07-27 2021-11-19 上海药慧信息技术有限公司 Clinical data mining method and device, electronic equipment and storage medium
CN113688169A (en) * 2021-08-11 2021-11-23 北京科技大学 Mine potential safety hazard identification and early warning system based on big data analysis
CN113744873A (en) * 2021-11-08 2021-12-03 浙江大学 Heating to-be-checked auxiliary differential diagnosis system based on task decomposition strategy
CN113823421A (en) * 2021-08-20 2021-12-21 武汉心络科技有限公司 Information providing method, device, equipment and storage medium
CN113889279A (en) * 2021-09-28 2022-01-04 北京华彬立成科技有限公司 Combination therapy information mining and inquiring method, device and electronic equipment
CN114068013A (en) * 2021-11-16 2022-02-18 高峰 Cerebral artery occlusion artificial intelligence assistant decision system
CN114121293A (en) * 2021-11-12 2022-03-01 北京华彬立成科技有限公司 Clinical trial information mining and inquiring method and device
CN114464314A (en) * 2022-02-08 2022-05-10 四川大学华西医院 Clinical body symptom classification diagnosis system
CN114496177A (en) * 2022-01-24 2022-05-13 佳木斯大学 Method and system for detecting clinical infection source of infectious department based on big data
CN114564755A (en) * 2022-03-03 2022-05-31 曾迎春 Cancer data management platform based on block chain technology
CN114678132A (en) * 2022-02-22 2022-06-28 北京颐圣智能科技有限公司 Self-learning medical wind control system and method based on clinical behavior feedback
CN114912804A (en) * 2022-05-17 2022-08-16 四川大学华西医院 Scientific research data related property control method and system
CN115083601A (en) * 2022-07-25 2022-09-20 四川省医学科学院·四川省人民医院 Type 2diabetes auxiliary decision making system based on machine learning
CN115083616A (en) * 2022-08-16 2022-09-20 之江实验室 Chronic nephropathy subtype mining system based on self-supervision graph clustering
CN115145993A (en) * 2022-07-05 2022-10-04 西南交通大学 Railway freight big data visualization display platform based on self-learning rule operation
CN115240800A (en) * 2022-09-26 2022-10-25 北京泽桥医疗科技股份有限公司 Medical data intelligent analysis execution method based on big data platform
WO2022228473A1 (en) * 2021-04-27 2022-11-03 联峰远程健康管理服务有限公司 Smart health management system for use in telemedicine service and method used in same
CN115617840A (en) * 2022-12-19 2023-01-17 江西曼荼罗软件有限公司 Medical data retrieval platform construction method, system, computer and storage medium
CN116386848A (en) * 2023-03-10 2023-07-04 王子骁 Multidimensional thyroid nodule accurate evaluation system and method based on AI technology
CN116825336A (en) * 2023-08-30 2023-09-29 山东志诚普惠健康科技有限公司 AI-based medical information intelligent management method and system
CN117153419A (en) * 2023-10-31 2023-12-01 湖北福鑫科创信息技术有限公司 Data integration tool for medical institutions
CN117542467A (en) * 2024-01-09 2024-02-09 四川互慧软件有限公司 Automatic construction method of disease-specific standard database based on patient data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599088A (en) * 2008-11-18 2009-12-09 北京美智医疗科技有限公司 The mining multi-dimensional data system and method for medical information system
CN107680676A (en) * 2017-09-26 2018-02-09 电子科技大学 A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven
CN107908621A (en) * 2017-11-16 2018-04-13 东华大学 Tumor of breast risk assessment system based on ultrasonic examination report text data
CN108615560A (en) * 2018-03-19 2018-10-02 安徽锐欧赛智能科技有限公司 A kind of clinical medical data analysis method based on data mining
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599088A (en) * 2008-11-18 2009-12-09 北京美智医疗科技有限公司 The mining multi-dimensional data system and method for medical information system
CN107680676A (en) * 2017-09-26 2018-02-09 电子科技大学 A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven
CN107908621A (en) * 2017-11-16 2018-04-13 东华大学 Tumor of breast risk assessment system based on ultrasonic examination report text data
CN108615560A (en) * 2018-03-19 2018-10-02 安徽锐欧赛智能科技有限公司 A kind of clinical medical data analysis method based on data mining
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Mammary gland electronic health record joint Relation extraction and architectural system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《信息系统审计研究报告》课题组: "《信息系统审计研究报告》", 30 November 2015, 中国时代经济出版社 *
朱立峰等: "多中心临床大数据平台建设及深度应用", 《大数据》 *

Cited By (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI723868B (en) * 2019-06-26 2021-04-01 義守大學 Method for applying a label made after sampling to neural network training model
CN110277172A (en) * 2019-06-27 2019-09-24 齐鲁工业大学 A kind of clinical application behavior analysis system and its working method based on efficient negative sequence mining mode
CN110477954A (en) * 2019-07-08 2019-11-22 无锡海斯凯尔医学技术有限公司 Detection device based on elastogram
CN110477954B (en) * 2019-07-08 2021-07-27 无锡海斯凯尔医学技术有限公司 Detection equipment based on elasticity formation of image
WO2020233254A1 (en) * 2019-07-12 2020-11-26 之江实验室 Medical data analysis system integrating structured image data
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform
CN110415831B (en) * 2019-07-18 2023-04-18 天宜(天津)信息科技有限公司 Medical big data cloud service analysis platform
CN110570921A (en) * 2019-08-20 2019-12-13 广东省第二中医院(广东省中医药工程技术研究院) Clinical information processing system based on single disease category
CN110544528B (en) * 2019-08-29 2022-06-07 中南大学 Advanced learning-based upper and lower ophthalmic remote diagnosis platform and construction method thereof
CN110544528A (en) * 2019-08-29 2019-12-06 中南大学 advanced learning-based upper and lower ophthalmic remote diagnosis platform and construction method thereof
CN110767317A (en) * 2019-08-30 2020-02-07 贵州力创科技发展有限公司 Cloud computing platform and method based on data mining and big data analysis
CN110675952A (en) * 2019-09-19 2020-01-10 上海腾程医学科技信息有限公司 Checking decision method and device, terminal equipment and computer readable storage medium
CN110569372A (en) * 2019-09-20 2019-12-13 四川大学 construction method of heart disease big data knowledge graph system
CN110875095A (en) * 2019-09-27 2020-03-10 长沙瀚云信息科技有限公司 Standardized clinical big data center system
CN110737731B (en) * 2019-10-25 2023-12-29 徐州工程学院 Decision tree-based public accumulation user data refinement analysis system and method
CN110737731A (en) * 2019-10-25 2020-01-31 徐州工程学院 accumulation fund user data refinement analysis system and method based on decision tree
CN111081016A (en) * 2019-12-18 2020-04-28 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN111081016B (en) * 2019-12-18 2021-07-06 北京航空航天大学 Urban traffic abnormity identification method based on complex network theory
CN113053479A (en) * 2019-12-27 2021-06-29 天津幸福生命科技有限公司 Medical data processing method, device, medium and electronic equipment
CN111341456A (en) * 2020-02-21 2020-06-26 中南大学湘雅医院 Method and device for generating diabetic foot knowledge map and readable storage medium
CN111341456B (en) * 2020-02-21 2024-02-23 中南大学湘雅医院 Method and device for generating diabetic foot knowledge graph and readable storage medium
CN111382151A (en) * 2020-03-23 2020-07-07 新疆大学 CT medical image cleaning method based on data desensitization
CN111428930A (en) * 2020-03-24 2020-07-17 中电药明数据科技(成都)有限公司 GBDT-based medicine patient using number prediction method and system
CN111696675B (en) * 2020-05-22 2023-09-19 深圳赛安特技术服务有限公司 User data classification method and device based on Internet of things data and computer equipment
CN111696675A (en) * 2020-05-22 2020-09-22 平安国际智慧城市科技股份有限公司 User data classification method and device based on Internet of things data and computer equipment
CN111768846A (en) * 2020-05-27 2020-10-13 医利捷(上海)信息科技有限公司 Clinical data management method
CN111696665A (en) * 2020-06-10 2020-09-22 杭州联众医疗科技股份有限公司 Auxiliary decision making system based on time coding
CN111696666A (en) * 2020-06-10 2020-09-22 杭州联众医疗科技股份有限公司 Intelligent chronic disease management system based on time coding
CN111724898A (en) * 2020-06-15 2020-09-29 中国医学科学院医学信息研究所 Intelligent skin disease monitoring and early warning system based on big data technology
CN111696669A (en) * 2020-06-15 2020-09-22 山东搜搜中医信息科技有限公司 Quantitative diagnosis of clinical multidimensional data of traditional Chinese medicine
CN111710427A (en) * 2020-06-17 2020-09-25 广州市金域转化医学研究院有限公司 Cervical precancerous early lesion stage diagnosis model and establishment method
CN112150209A (en) * 2020-06-19 2020-12-29 南京理工大学 Construction method of CNN-LSTM time sequence prediction model based on clustering center
CN112150209B (en) * 2020-06-19 2022-10-18 南京理工大学 Construction method of CNN-LSTM time sequence prediction model based on clustering center
CN111739639A (en) * 2020-06-19 2020-10-02 杭州联众医疗科技股份有限公司 Perioperative complication data prediction system based on time coding
CN111863267B (en) * 2020-07-08 2024-01-26 首都医科大学附属北京天坛医院 Data information acquisition method, data analysis method, device and storage medium
CN111863267A (en) * 2020-07-08 2020-10-30 首都医科大学附属北京天坛医院 Data information acquisition method, data analysis device and storage medium
CN111949801A (en) * 2020-07-27 2020-11-17 西北工业大学 Knowledge graph fusion method of doctor experience knowledge and ultrasonic image information
CN111949801B (en) * 2020-07-27 2023-10-24 西北工业大学 Knowledge graph fusion method for doctor experience knowledge and ultrasonic image information
CN112037925B (en) * 2020-07-29 2023-06-23 郑州大学第一附属医院 LSTM algorithm-based early warning method for new major infectious diseases
CN112037925A (en) * 2020-07-29 2020-12-04 郑州大学第一附属医院 LSTM algorithm-based early warning method for newly-released major infectious diseases
CN111914026A (en) * 2020-07-31 2020-11-10 南京朗赢信息技术有限公司 General data exchange sharing service platform
CN111899828A (en) * 2020-07-31 2020-11-06 青岛百洋智能科技股份有限公司 Knowledge graph driven breast cancer diagnosis and treatment scheme recommendation system
CN112100286A (en) * 2020-08-14 2020-12-18 华南理工大学 Computer-aided decision-making method, device and system based on multi-dimensional data and server
CN111951976A (en) * 2020-08-21 2020-11-17 上海交通大学医学院附属第九人民医院 Value judgment method, system, terminal and medium based on medical data margin
CN111951976B (en) * 2020-08-21 2024-03-22 上海交通大学医学院附属第九人民医院 Value judging method, system, terminal and medium based on medical data allowance
CN111984987B (en) * 2020-09-01 2024-04-02 上海梅斯医药科技有限公司 Method, device, system and medium for desensitizing and restoring electronic medical records
CN111984987A (en) * 2020-09-01 2020-11-24 上海梅斯医药科技有限公司 Method, device, system and medium for desensitization and reduction of electronic medical record
CN112286985A (en) * 2020-10-13 2021-01-29 江苏云脑数据科技有限公司 Clinical research statistical analysis system based on cloud computing
CN112200374A (en) * 2020-10-15 2021-01-08 平安国际智慧城市科技股份有限公司 Medical data processing method, device, electronic equipment and medium
CN112185585A (en) * 2020-11-03 2021-01-05 浙江大学滨海产业技术研究院 Diabetes early warning method based on metabonomics
CN112380763A (en) * 2020-11-03 2021-02-19 浙大城市学院 System and method for analyzing reliability of in-pile component based on data mining
WO2021175038A1 (en) * 2020-11-13 2021-09-10 之江实验室 Patient data visualization method and system for assisting decision-making in chronic disease
US11521751B2 (en) * 2020-11-13 2022-12-06 Zhejiang Lab Patient data visualization method and system for assisting decision making in chronic diseases
CN112365976A (en) * 2020-11-14 2021-02-12 南昌大学第二附属医院 Compound disease clinical path construction method and system based on transfer learning
CN112365976B (en) * 2020-11-14 2023-08-11 南昌大学第二附属医院 Composite disease species clinical path construction method and system based on transfer learning
CN112446862A (en) * 2020-11-25 2021-03-05 北京医准智能科技有限公司 Dynamic breast ultrasound video full-focus real-time detection and segmentation device and system based on artificial intelligence and image processing method
CN112635074A (en) * 2020-12-21 2021-04-09 云南省疾病预防控制中心 AIDS prevention and treatment decision method based on multi-data analysis model
CN112786126B (en) * 2020-12-31 2023-11-03 天津开心生活科技有限公司 Time sequence analysis method and device for clinical test data, electronic equipment and medium
CN112786126A (en) * 2020-12-31 2021-05-11 天津开心生活科技有限公司 Time sequence analysis method and device of clinical test data, electronic equipment and medium
CN112988783A (en) * 2021-03-12 2021-06-18 李涛 Public opinion occurrence time sequence analysis method based on multidimensional data model
CN112927810A (en) * 2021-03-23 2021-06-08 崔剑虹 Smart medical response method based on big data and smart medical cloud computing system
CN113539471A (en) * 2021-03-26 2021-10-22 内蒙古卫数数据科技有限公司 Auxiliary diagnosis method and system for hyperplasia of mammary glands based on conventional inspection data
CN113096817A (en) * 2021-04-13 2021-07-09 北京大学 Method, apparatus, computer device and storage medium for disease prediction
CN113160999A (en) * 2021-04-25 2021-07-23 厦门拜特信息科技有限公司 Data structured analysis system and data processing method for medical decision
WO2022228473A1 (en) * 2021-04-27 2022-11-03 联峰远程健康管理服务有限公司 Smart health management system for use in telemedicine service and method used in same
CN113177040A (en) * 2021-04-29 2021-07-27 东北大学 Full-process big data cleaning and analyzing method for aluminum/copper plate strip production
CN113380360A (en) * 2021-06-07 2021-09-10 厦门大学 Similar medical record retrieval method and system based on multi-mode medical record map
CN113314201A (en) * 2021-06-17 2021-08-27 南通市第一人民医院 Neurology clinical nursing potential safety hazard analysis method and system
CN113314201B (en) * 2021-06-17 2022-05-13 南通市第一人民医院 Potential safety hazard analysis method and system for neurology clinical nursing
CN113436745A (en) * 2021-06-30 2021-09-24 四川大学华西医院 Artificial intelligence auxiliary diagnosis method based on database analysis
CN113609555A (en) * 2021-07-16 2021-11-05 黄河勘测规划设计研究院有限公司 Hydraulic metal structure design method based on big data technology
CN113609555B (en) * 2021-07-16 2023-10-20 黄河勘测规划设计研究院有限公司 Hydraulic metal structure design method based on big data technology
CN113436747A (en) * 2021-07-20 2021-09-24 四川省医学科学院·四川省人民医院 Medical data clinical auxiliary system and method based on medical data analysis model
CN113674867A (en) * 2021-07-27 2021-11-19 上海药慧信息技术有限公司 Clinical data mining method and device, electronic equipment and storage medium
CN113539414A (en) * 2021-07-30 2021-10-22 中电药明数据科技(成都)有限公司 Method and system for predicting rationality of antibiotic medication
CN113628734A (en) * 2021-08-02 2021-11-09 浙江海心智惠科技有限公司 Design method of oncology electronic medical advice system with clinical decision intelligent recommendation function
CN113688169A (en) * 2021-08-11 2021-11-23 北京科技大学 Mine potential safety hazard identification and early warning system based on big data analysis
CN113688169B (en) * 2021-08-11 2023-08-08 北京科技大学 Mine potential safety hazard identification and early warning system based on big data analysis
CN113823421B (en) * 2021-08-20 2024-02-13 武汉心络科技有限公司 Information providing method, apparatus, device and storage medium
CN113823421A (en) * 2021-08-20 2021-12-21 武汉心络科技有限公司 Information providing method, device, equipment and storage medium
CN113889279A (en) * 2021-09-28 2022-01-04 北京华彬立成科技有限公司 Combination therapy information mining and inquiring method, device and electronic equipment
CN113611411B (en) * 2021-10-09 2021-12-31 浙江大学 Body examination aid decision-making system based on false negative sample identification
WO2023056918A1 (en) * 2021-10-09 2023-04-13 浙江大学 False negative sample recognition-based physical examination assistant decision-making system
CN113611411A (en) * 2021-10-09 2021-11-05 浙江大学 Body examination aid decision-making system based on false negative sample identification
CN113744873B (en) * 2021-11-08 2022-02-11 浙江大学 Heating to-be-checked auxiliary differential diagnosis system based on task decomposition strategy
CN113744873A (en) * 2021-11-08 2021-12-03 浙江大学 Heating to-be-checked auxiliary differential diagnosis system based on task decomposition strategy
CN114121293A (en) * 2021-11-12 2022-03-01 北京华彬立成科技有限公司 Clinical trial information mining and inquiring method and device
CN114068013B (en) * 2021-11-16 2022-09-23 高峰 Cerebral artery occlusion artificial intelligence assistant decision system
CN114068013A (en) * 2021-11-16 2022-02-18 高峰 Cerebral artery occlusion artificial intelligence assistant decision system
CN114496177A (en) * 2022-01-24 2022-05-13 佳木斯大学 Method and system for detecting clinical infection source of infectious department based on big data
CN114464314A (en) * 2022-02-08 2022-05-10 四川大学华西医院 Clinical body symptom classification diagnosis system
CN114678132A (en) * 2022-02-22 2022-06-28 北京颐圣智能科技有限公司 Self-learning medical wind control system and method based on clinical behavior feedback
CN114678132B (en) * 2022-02-22 2023-07-18 北京颐圣智能科技有限公司 Self-learning medical wind control system and method based on clinical behavior feedback
CN114564755A (en) * 2022-03-03 2022-05-31 曾迎春 Cancer data management platform based on block chain technology
CN114912804A (en) * 2022-05-17 2022-08-16 四川大学华西医院 Scientific research data related property control method and system
CN115145993A (en) * 2022-07-05 2022-10-04 西南交通大学 Railway freight big data visualization display platform based on self-learning rule operation
CN115083601A (en) * 2022-07-25 2022-09-20 四川省医学科学院·四川省人民医院 Type 2diabetes auxiliary decision making system based on machine learning
CN115083616A (en) * 2022-08-16 2022-09-20 之江实验室 Chronic nephropathy subtype mining system based on self-supervision graph clustering
JP7404581B1 (en) 2022-08-16 2023-12-25 之江実験室 Chronic nephropathy subtype mining system based on self-supervised graph clustering
CN115083616B (en) * 2022-08-16 2022-11-08 之江实验室 Chronic nephropathy subtype mining system based on self-supervision graph clustering
CN115240800A (en) * 2022-09-26 2022-10-25 北京泽桥医疗科技股份有限公司 Medical data intelligent analysis execution method based on big data platform
CN115617840B (en) * 2022-12-19 2023-03-10 江西曼荼罗软件有限公司 Medical data retrieval platform construction method, system, computer and storage medium
CN115617840A (en) * 2022-12-19 2023-01-17 江西曼荼罗软件有限公司 Medical data retrieval platform construction method, system, computer and storage medium
CN116386848A (en) * 2023-03-10 2023-07-04 王子骁 Multidimensional thyroid nodule accurate evaluation system and method based on AI technology
CN116825336A (en) * 2023-08-30 2023-09-29 山东志诚普惠健康科技有限公司 AI-based medical information intelligent management method and system
CN117153419A (en) * 2023-10-31 2023-12-01 湖北福鑫科创信息技术有限公司 Data integration tool for medical institutions
CN117153419B (en) * 2023-10-31 2024-01-26 湖北福鑫科创信息技术有限公司 Data integration tool for medical institutions
CN117542467A (en) * 2024-01-09 2024-02-09 四川互慧软件有限公司 Automatic construction method of disease-specific standard database based on patient data
CN117542467B (en) * 2024-01-09 2024-04-12 四川互慧软件有限公司 Automatic construction method of disease-specific standard database based on patient data

Similar Documents

Publication Publication Date Title
CN109830303A (en) Clinical data mining analysis and aid decision-making method based on internet integration medical platform
Ambekar et al. Disease risk prediction by using convolutional neural network
US10824607B2 (en) Topological data analysis of data from a fact table and related dimension tables
Milovic et al. Prediction and decision making in health care using data mining
ȚĂRANU Data mining in healthcare: decision making and precision
US20180025093A1 (en) Query capabilities of topological data analysis graphs
CN109785927A (en) Clinical document structuring processing method based on internet integration medical platform
CN109841282A (en) A kind of Chinese medicine health control cloud system and its building method based on cloud computing
Dey et al. Study and analysis of data mining algorithms for healthcare decision support system
Djatna et al. An intuitionistic fuzzy diagnosis analytics for stroke disease
CN108962394B (en) Medical data decision support method and system
Chauhan et al. A robust model for big healthcare data analytics
CN116864139A (en) Disease risk assessment method, device, computer equipment and readable storage medium
Chou et al. Extracting drug utilization knowledge using self-organizing map and rough set theory
Datta et al. Development of predictive model of diabetic using supervised machine learning classification algorithm of ensemble voting
Ahmad Mining health data for breast cancer diagnosis using machine learning
Li et al. Study of E-business applications based on big data analysis in modern hospital health management
Retal et al. Machine learning for diabetes prediction: a systematic review and a conceptual framework for early prediction
Kamble et al. Smart Health Prediction System Using Data Mining
Jaber et al. Early prediction of diabetic using data mining
Gu et al. Which is more reliable, expert experience or information itself? weight scheme of complex cases for health management decision making
Jia et al. Dkdr: An approach of knowledge graph and deep reinforcement learning for disease diagnosis
Darabi et al. A Case-Based-Reasoning System for Feature Selection and Diagnosing Asthma
Saadi et al. Integration of fuzzy clustering into the case base reasoning for the prediction of response to immunotherapy treatment
Shah et al. A Review on Big Data Practices in Healthcare

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190531