CN110634565A - Regression analysis method for medical big data - Google Patents

Regression analysis method for medical big data Download PDF

Info

Publication number
CN110634565A
CN110634565A CN201910878524.3A CN201910878524A CN110634565A CN 110634565 A CN110634565 A CN 110634565A CN 201910878524 A CN201910878524 A CN 201910878524A CN 110634565 A CN110634565 A CN 110634565A
Authority
CN
China
Prior art keywords
big data
model
regression
data
medical big
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910878524.3A
Other languages
Chinese (zh)
Other versions
CN110634565B (en
Inventor
梅明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Weike Technology Co ltd
Original Assignee
Anhui Wei Aumann Robot Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Wei Aumann Robot Co Ltd filed Critical Anhui Wei Aumann Robot Co Ltd
Priority to CN201910878524.3A priority Critical patent/CN110634565B/en
Publication of CN110634565A publication Critical patent/CN110634565A/en
Application granted granted Critical
Publication of CN110634565B publication Critical patent/CN110634565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a regression analysis method for medical big data, which comprises the following steps: 1) preprocessing medical big data, determining variables to be predicted according to target tasks, determining input variables, and establishing a training set; 2) initializing the regression model; 3) learning to obtain a support vector regression model in the step g; 4) calculating model weight and sample weight; 5) and (6) integrating and outputting the model. The invention solves the problems of too complex model and larger deviation in the prior art, and has the advantages that: the complexity of the model can be obviously reduced, and the generalization performance of regression can be greatly improved.

Description

Regression analysis method for medical big data
Technical Field
The invention relates to the field of data processing, in particular to a regression analysis method for medical big data.
Background
Currently, there are hundreds of EBs in global medical health data and are increasing at an accelerated rate. Big data is changing medical research and practice from the rapid identification and establishment of large-scale research cohorts to artificial intelligence assisted clinical decision support systems. The medical big data industry develops rapidly by visual angle analysis of the technical category, and the medical information, gene sequencing and healthy intelligent equipment are mainly benefited from three technical progress and marketization level improvement. First, the medical informatization construction level is continuously improved, and systems such as HIS, CIS, PACS and the like are widely applied. CHIMA statistical data shows that the implementation proportion of the information management system of the hospital in China reaches 70-80%, and the information management system is concentrated on three-level medical institutions, so that the accumulation of a large amount of medical data provides a foundation for algorithm construction. Second, second generation gene sequencing technology rapidly reduces sequencing costs from $ 1000 to $ 0.1 ten thousand, and throughput is much higher than first generation sequencing, and increased applications accelerate the accumulation of biological data, bringing value to clinical operations and basic research and development. And thirdly, health management type intelligent hardware, such as intelligent bracelets, watches, body fat scales and other equipment, is rapidly popularized, and can track the health signs of patients in real time and continuously and mine useful data values, so that the development of medical assistance big data is facilitated. In addition, the technologies of data fusion, data visualization, image recognition processing, machine learning, artificial intelligence and the like are continuously improved, and a bottom-layer technical support is provided for the development of medical big data.
Regression analysis is a statistical analysis method for determining the quantitative relationship of interdependence between two or more variables. The application is very wide, and regression analysis is divided into unitary regression analysis and multiple regression analysis according to the number of related variables; according to the number of independent variables, simple regression analysis and multiple regression analysis can be divided; according to the type of relationship between independent variables and dependent variables, linear regression analysis and nonlinear regression analysis can be classified. If a regression analysis includes only one independent variable and one dependent variable and the relationship between the independent variable and the dependent variable can be approximated by a straight line, the regression analysis is called a univariate linear regression analysis. If two or more independent variables are included in the regression analysis and there is a linear correlation between the independent variables, it is referred to as a multiple linear regression analysis. The current relevant studies are shown below. Patent CN201710612240.0 proposes an analysis method for analyzing multimodal big data for medical institutions. Mainly for the analysis of multimodal big data of patients in hospital databases. The method can comprehensively consider the information data of a plurality of modes, effectively avoid the occurrence of the limited condition of a transmission network in the traditional data analysis process, and ensure the real-time feedback of the user information. The established multidimensional partial least square model is combined with a convolutional neural network method, so that the information loss can be reduced, a stable prediction model can be obtained, and a more detailed and accurate analysis report can be provided for a hospital. Patent CN201811570429.9 discloses a big data medical data feature extraction and intelligent analysis prediction method, which specifically includes the following steps: data cleaning, data vectorization, case mining and feature mining, deep neural network model training, disease diagnosis and treatment and cure rate prediction, analysis and verification model. Patent CN201910030377.4 discloses a medical insurance business data accurate analysis system based on big data, which comprises a data source module, a data analysis module and an analysis result output module, and the medical insurance business data accurate analysis system based on big data improves the accuracy of risk prediction through the statistics and analysis of the living habits and medical records of users, greatly helps the work of medical insurance business personnel, and improves the advantage of sales promotion success rate. These patents only show the system framework or use the most recent regression algorithm.
Disclosure of Invention
The invention provides a regression analysis method for medical big data, which solves the problems of too complex model and large deviation in the prior art, and specifically comprises the following steps:
step 1: preprocessing medical big data, determining variables needing prediction and input variables according to a target task, and establishing a training sample set { (x)i,yi) 1, 2.., m }, where m represents the total number of training data,is a vector of the d-dimension,
Figure BDA0002205130550000022
is a scalar quantity, (x)i,yi) For the (i) th training sample,
Figure BDA0002205130550000023
representing a real number domain;
step 2: initializing the regression model
Figure BDA0002205130550000024
The weight of the ith sample in the G step, G is the maximum training number, δ is the maximum allowable error, let G equal to 1,
Figure BDA0002205130550000025
m is the total number of samples;
and step 3: learning to obtain a support vector regression model f in the step gg(x) And calculate fg(x) Error rate of
Figure BDA0002205130550000026
Wherein
Figure BDA0002205130550000027
If r isg>50% of the total weight of the lubricant
Figure BDA0002205130550000028
And jumping to step 6;
and 4, step 4: setting model weights
Figure BDA0002205130550000029
Setting sample weights
Figure BDA00022051305500000210
Wherein the content of the first and second substances,
and 5: if g is<G, making G increase by 1, jumping to step 2, otherwise, making G increase by 1
Figure BDA00022051305500000212
Jumping to the step 6;
step 6: an integrated regression model was obtained as follows:
Figure BDA00022051305500000213
and predicting the data lacking the label by using F (x), wherein x is a sample.
Wherein, the support vector regression model form involved in step 3 is f (x) ═ WTPhi (x) + b, whichAnd W represents a normal vector, b is a displacement term, phi (x) is a kernel function, x is mapped to other spaces, and the optimal W and b are obtained by optimizing the following functions:
Figure BDA00022051305500000214
Figure BDA0002205130550000031
wherein C is>0 is a penalty coefficient, ξiAndis a relaxation variable, epsilon>0 is the maximum allowed deviation, the superscript T denotes transposition, and P is the objective function.
Compared with the prior art, the invention has the following advantages: the complexity of the model can be obviously reduced, and the generalization performance of regression can be greatly improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
A regression analysis method for medical big data, as shown in fig. 1, specifically comprising the following steps:
step 1: preprocessing medical big data, determining variables needing prediction and input variables according to a target task, and establishing a training sample set { (x)i,yi) 1, 2.., m }, where m represents the total number of training data,is a vector of the d-dimension,
Figure BDA0002205130550000034
is a scalar quantity, (x)i,yi) For the (i) th training sample,
Figure BDA0002205130550000035
representing a real number domain;
step 2: initializing the regression modelThe weight of the ith sample in the G step, G is the maximum training number, δ is the maximum allowable error, let G equal to 1,
Figure BDA0002205130550000037
m is the total number of samples;
and step 3: learning to obtain a support vector regression model f in the step gg(x) And calculate fg(x) Error rate of
Figure BDA0002205130550000038
Wherein
Figure BDA0002205130550000039
If r isg>50% of the total weight of the lubricant
Figure BDA00022051305500000310
And jumping to step 6;
and 4, step 4: setting model weights
Figure BDA00022051305500000311
Setting sample weights
Figure BDA00022051305500000312
Wherein the content of the first and second substances,
Figure BDA00022051305500000313
and 5: if g is<G, making G increase by 1, jumping to step 2, otherwise, making G increase by 1
Figure BDA00022051305500000314
Jumping to the step 6;
step 6: an integrated regression model was obtained as follows:
Figure BDA00022051305500000315
and predicting the data lacking the label by using F (x), wherein x is a sample.
Preferably, the support vector regression model involved in step 3 is of the form f (x) ═ WTPhi (x) + b, where W represents a normal vector, b is a displacement term, phi (x) is a kernel function, x is mapped to other spaces, and optimal W and b are obtained by optimizing the following functions:
Figure BDA0002205130550000041
Figure BDA0002205130550000042
wherein C is>0 is a penalty coefficient, ξiAnd
Figure BDA0002205130550000043
is a relaxation variable, epsilon>0 is the maximum allowed deviation, the superscript T denotes transposition, and P is the objective function.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (2)

1. A regression analysis method for medical big data is characterized by comprising the following steps:
step 1: preprocessing medical big data, determining variables needing prediction and input variables according to a target task, and establishing a training sample set { (x)i,yi) 1, 2.., m }, where m represents the total number of training data,
Figure FDA0002205130540000011
is a vector of the d-dimension,
Figure FDA0002205130540000012
is a scalar quantity, (x)i,yi) For the (i) th training sample,representing a real number domain;
step 2: initializing the regression modelThe weight of the ith sample in the G step, G is the maximum training number, δ is the maximum allowable error, let G equal to 1,
Figure FDA0002205130540000015
m is the total number of samples;
and step 3: learning to obtain a support vector regression model f in the step gg(x) And calculate fg(x) Error rate of
Figure FDA0002205130540000016
Wherein
If r isg>50% of the total weight of the lubricantAnd jumping to step 6;
and 4, step 4: setting model weights
Figure FDA0002205130540000019
Setting sample weights
Figure FDA00022051305400000110
Wherein the content of the first and second substances,
Figure FDA00022051305400000111
and 5: if g is<G, making G increase by 1, jumping to step 2, otherwise, making G increase by 1Jumping to the step 6;
step 6: an integrated regression model was obtained as follows:
Figure FDA00022051305400000113
and predicting the data lacking the label by using F (x), wherein x is a sample.
2. The regression analysis method for medical big data according to claim 1, wherein the support vector regression model involved in the step 3 is of the form f (x) W (W ═ W)TPhi (x) + b, where W represents a normal vector, b is a displacement term, phi (x) is a kernel function, x is mapped to other spaces, and optimal W and b are obtained by optimizing the following functions:
Figure FDA00022051305400000115
wherein C is>0 is a penalty coefficient, ξiAnd
Figure FDA00022051305400000116
is a relaxation variable, epsilon>0 is the maximum allowed deviation, the superscript T denotes transposition, and P is the objective function.
CN201910878524.3A 2019-09-18 2019-09-18 Regression analysis method for medical big data Active CN110634565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910878524.3A CN110634565B (en) 2019-09-18 2019-09-18 Regression analysis method for medical big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910878524.3A CN110634565B (en) 2019-09-18 2019-09-18 Regression analysis method for medical big data

Publications (2)

Publication Number Publication Date
CN110634565A true CN110634565A (en) 2019-12-31
CN110634565B CN110634565B (en) 2021-04-06

Family

ID=68971070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910878524.3A Active CN110634565B (en) 2019-09-18 2019-09-18 Regression analysis method for medical big data

Country Status (1)

Country Link
CN (1) CN110634565B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111770189A (en) * 2020-07-04 2020-10-13 广州智物互联科技有限公司 Networking type medical big data grading transmission method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094784A1 (en) * 2008-10-13 2010-04-15 Microsoft Corporation Generalized kernel learning in support vector regression
CN103218675A (en) * 2013-05-06 2013-07-24 国家电网公司 Short-term load prediction method based on clustering and sliding window
CN104657574A (en) * 2014-06-13 2015-05-27 苏州大学 Building method and device for medical diagnosis models
CN105160441A (en) * 2015-10-16 2015-12-16 江南大学 Real-time power load forecasting method based on integrated network of incremental transfinite vector regression machine
CN106874693A (en) * 2017-03-15 2017-06-20 国信优易数据有限公司 A kind of medical big data analysis process system and method
CN107590569A (en) * 2017-09-25 2018-01-16 山东浪潮云服务信息科技有限公司 A kind of data predication method and device
CN108764541A (en) * 2018-05-16 2018-11-06 天津大学 A kind of wind energy prediction technique of combination space-time characteristic and Error processing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100094784A1 (en) * 2008-10-13 2010-04-15 Microsoft Corporation Generalized kernel learning in support vector regression
CN103218675A (en) * 2013-05-06 2013-07-24 国家电网公司 Short-term load prediction method based on clustering and sliding window
CN104657574A (en) * 2014-06-13 2015-05-27 苏州大学 Building method and device for medical diagnosis models
CN105160441A (en) * 2015-10-16 2015-12-16 江南大学 Real-time power load forecasting method based on integrated network of incremental transfinite vector regression machine
CN106874693A (en) * 2017-03-15 2017-06-20 国信优易数据有限公司 A kind of medical big data analysis process system and method
CN107590569A (en) * 2017-09-25 2018-01-16 山东浪潮云服务信息科技有限公司 A kind of data predication method and device
CN108764541A (en) * 2018-05-16 2018-11-06 天津大学 A kind of wind energy prediction technique of combination space-time characteristic and Error processing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI QING ETC.: "Decremental Learning based on Sample-Weighted Support Vector Regression", 《2012 24TH CHINESE CONTROL AND DECISION CONFERENCE》 *
王洪鹏 等: "基于在线支持向量回归的车辆跟驰模型研究", 《电子技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111770189A (en) * 2020-07-04 2020-10-13 广州智物互联科技有限公司 Networking type medical big data grading transmission method and system

Also Published As

Publication number Publication date
CN110634565B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
ȚĂRANU Data mining in healthcare: decision making and precision
Chattopadhyay et al. A Case‐Based Reasoning system for complex medical diagnosis
US20150235143A1 (en) Transfer Learning For Predictive Model Development
Boukenze et al. Predictive analytics in healthcare system using data mining techniques
Liang et al. Big Data science and its applications in health and medical research: Challenges and opportunities
CN107833605A (en) A kind of coding method, device, server and the system of hospital&#39;s medical record information
CN110739076A (en) medical artificial intelligence public training platform
Jiang et al. A hybrid intelligent model for acute hypotensive episode prediction with large-scale data
CN111696661A (en) Patient clustering model construction method, patient clustering method and related equipment
CN110634565B (en) Regression analysis method for medical big data
CN110543594A (en) knowledge base-based personalized evidence-based correction recommendation method for prisoners
CN111445969A (en) Sales prediction method and system capable of flexibly adapting to noise
Iapăscurtă A less traditional approach to biomedical signal processing for sepsis prediction
Sinha et al. Automated detection of coronary artery disease using machine learning algorithm
Yazid et al. Clinical pathway variance prediction using artificial neural network for acute decompensated heart failure clinical pathway
Lin et al. Utilizing a Two-Stage Taguchi Method and Artificial Neural Network for the Precise Forecasting of Cardiovascular Disease Risk
Momo et al. Length of stay prediction for hospital management using domain adaptation
CN111048192B (en) Obstetric and research management method for medical couplet based on mobile terminal
Indumathi et al. Introduction to Machine Learning for Data Analytics
Pavlovskii et al. Hybrid predictive modelling for finding optimal multipurpose multicomponent therapy
CN116721730B (en) Whole-course patient management system based on digital therapy
Bokhari et al. Applying supervised and unsupervised learning techniques on dental patients’ records
US11081217B2 (en) Systems and methods for optimal health assessment and optimal preventive program development in population health management
Hariri Automatic Classification of Parkinson's disease patients vs Healthy controls using a vision-based finger-tapping test
Portela et al. Data Mining for Real-Time Intelligent Decision Support System in Intensive Care Medicine.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210316

Address after: 518109 room 1501, building e, phase II, Xinghe world, Minle community, Minzhi street, Longhua District, Shenzhen City, Guangdong Province

Applicant after: SHENZHEN WAKE UP TECHNOLOGY Co.,Ltd.

Address before: 230601 Jinxing Commercial City Phase II, 339 Shizhu Road, Hefei Economic and Technological Development Zone, Anhui Province, 2005

Applicant before: ANHUI WEIAOMAN ROBOT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 4208, Tower A, Hongrongyuan North Station Center, Minzhi Street North Station Community, Longhua District, Shenzhen City, Guangdong Province, 518000

Patentee after: Shenzhen Weike Technology Co.,Ltd.

Country or region after: China

Address before: 518109 room 1501, building e, phase II, Xinghe world, Minle community, Minzhi street, Longhua District, Shenzhen City, Guangdong Province

Patentee before: SHENZHEN WAKE UP TECHNOLOGY CO.,LTD.

Country or region before: China