CN107301323A - A kind of construction method of the disaggregated model related to psoriasis - Google Patents
A kind of construction method of the disaggregated model related to psoriasis Download PDFInfo
- Publication number
- CN107301323A CN107301323A CN201710692864.8A CN201710692864A CN107301323A CN 107301323 A CN107301323 A CN 107301323A CN 201710692864 A CN201710692864 A CN 201710692864A CN 107301323 A CN107301323 A CN 107301323A
- Authority
- CN
- China
- Prior art keywords
- psoriasis
- mrow
- data
- disaggregated model
- susceptibility loci
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to technical field of medical detection, and in particular to a kind of construction method of the disaggregated model related to psoriasis, comprises the following steps:(1) psoriasis susceptibility loci is chosen;(2) according to different types of susceptibility loci, it is converted into input data;(3) classification of data is carried out using Adaboost SVM models.The related technology of current shortage only rests on psoriasis data are classified and predicted and judges site whether there is to infer disease condition.The present invention is classified using effective Machine learning classifiers SVM, and has passed through adaboost frameworks come integrated SVM, improves the accuracy of grader.The model can integrate SNP, amino acid and type data and be classified, and consider the information of each dimension, improve the accuracy of data classification results.
Description
Technical field
The present invention relates to technical field of medical detection, and in particular to a kind of structure side of the disaggregated model related to psoriasis
Method.
Background technology
It is a kind of common complex disease that psoriasis, which is also known as psoriasis, has been reported that generation and the inherent cause phase of psoriasis
Close, especially HLA region (HLA), but very positively related site is not understood.
With the development of sequencing technologies and going deep into for genome research, in last year《Naturally it is hereditary》On just have been reported that Chinese
The high depth sequencing and precisely variation detection in MHC regions, the susceptible of several psoriasis is located in the analysis of its genome association
Site.But still lack classification and the forecast model of the susceptibility loci based on HLA regions at present.So being badly in need of related point of exploitation
Class forecasting tool carries out classification prediction using HLA regions susceptibility loci to data.
Psoriasis is most significantly correlated with HLA, but current technology shortage is targetedly used HLA regions.Recent HLA areas
Domain carries out precisely variation detection and broken through, and accurately located susceptibility loci related to psoriasis on HLA.The present invention is directed to
These susceptibility locis carry out coding to it and classified again with machine learning model Adaboost, can integrate and utilize HLA areas
The susceptibility loci information that domain is found.Comprehensive analysis is carried out to data using machine learning model, classification accuracy is improved, is silver-colored bits
The prevention examination of disease provides foundation.
The content of the invention
Present invention aim to address above-mentioned the deficiencies in the prior art, found and silver bits based on all standing to MHC regions
Sick related biomarker, based on the independent related susceptibility loci in HLA regions, point of psoriasis is built using SVM-Adaboost
Class model provides foundation there is provided a kind of construction method of the disaggregated model related to psoriasis for the prevention examination of psoriasis.
The present invention is achieved by the following technical solutions:
1 data processing and conversion
The variation of each sample is encoded.Variation information, including HLA types are obtained by high-flux sequence data
(C*06:02、C*07:04、DPB1*05:01), mononucleotide polymorphism site (SNP site) and amino acid (snp31443520,
B:Y33Y、B:Y91C、B:Y140S、snp32472030)。
Then to every sample, according to susceptibility loci, it is converted into the input data required for the present invention.Adopted for HLA types
Given a mark with editing distance, SNP and amino acid are using 0/1 marking.Specific method is as follows:1. susceptible HLA types are directed to, calculate each
Editing distance and marking of the individual type with susceptible type;2. SNP site is directed to, if mutation, which exists, is designated as 1, in the absence of note
For 0;3. amino acid mutation is directed to, if mutation, which exists, is designated as 1, in the absence of being designated as 0.
After the completion of marking, data are split at random, test set and training set is split as, test set and training set data is noted
It is not overlapping.When sample number is few, data can be divided into 5 parts (10 parts) according to 5 folding interior extrapolation methods (or 10 folding interior extrapolation methods), often
Secondary to take out 1 as test set, remaining is used as training set.
2 carry out the classification of data using adaboost-SVM models
The present invention come integrated supporting vector machine (SVM) grader, integrates all susceptible of utilization using adaboost methods
Site information, improves the accuracy of the classification of data.
2.1 structure on disaggregated model
2.1.1 subclassification model SVM
Supporting vector machine model SVM is classical machine learning classification software, belongs to learning method with supervision.The present invention is first
The gaussian kernel function (formula 1) utilized is by data projection to high-dimensional space.
Wherein, x is any point in space, and y is selected space center, and σ is width parameter, and K (x, y) is x to y space
Distance.
SVM model construction separation planes are used in high-dimensional space afterwards.Separation plane, which is built, mainly passes through separating distance
Plane nearest several points determine (A points as shown in Figure 1 are exactly one of nearest point), and by nearest point to separation plane
Line be referred to as supporting vector, plane when supporting vector reaches maximization is just set to separation plane, that is to say by point
Data are maximally separated every plane.The present invention uses SVM models (the reference site https based on python 2://
www.manning.com/books/machine-learning-in-action)。
2.1.2 disaggregated model Integrated Algorithm Adaboost
Adaboost is a kind of integrated approach based on mistake lifting classifier performance, is repeatedly instructed by each sample
Practice, corrected repeatedly by error rate grader finally integrate obtain it is integrated after result.Specific method:First one is assigned to sample
The equal weight of sample.Then train SVM in training manifold data and calculate the error rate (ε, formula 2) of the grader.
Error rate ε=number/total number of samples mesh (formula 2) of correctly classifying
Then gaussian kernel function σ is adjusted, afterwards the SVM again on same data set.Work as in second of training of grader
In, it will the weight (weight here is the vector of a various dimensions) of each sample is readjusted, wherein correct sample of classifying
Next classified weight will reduce, the next weight of the sample of classification error will be improved.That is, being finally reached classification
Weight when correct can be bigger than the weight accounting of classification error.Specific method is to calculate each grader according to error rate
Weight α.
Calculating can be updated to weight after α.
Classification is correct:
Classification error:
α is weight of the basic classification device in final classification device, and ε is the error rate of grader;(t) representative order, t is represented
This, t+1 is represented next time;DiFor i-th of training sample weights.
Calculate after weights D, initially enter next round iteration.The process of training and adjustment weight is repeated continuously, until
Training error rate is 0 or the number of Weak Classifier reaches designated value.The present invention is integrated using the adaboost based on python2
Framework (reference site https://www.manning.com/books/machine-learning-in-action)
3 pairs of data are classified and assessed
Build after input training set and test set, substitute into and classified in the adaboost-SVM models built.It is logical
The result and actual diseased whether situation for crossing disaggregated model are compared.By calculate accuracy rate and draw ROC curve come pair
As a result it is estimated.
ROC curve is the method for selecting optimal signal model.ROC curve area under (AUC) can generally be calculated
To judge disaggregated model quality, with specific reference to table 1.
Table 1
The beneficial effects of the present invention are:
Lack related technology at present psoriasis data are classified and predicted, only rest on and judge site whether there is
Infer disease condition.The present invention classified using effective Machine learning classifiers SVM, and passed through adaboost frameworks come
Integrated SVM, improves the accuracy of grader.The model can integrate SNP, amino acid and type data and be classified, and synthesis is examined
Consider the information of each dimension, improve the accuracy of data classification results.
Brief description of the drawings
Fig. 1 is the schematic diagram in high-dimensional space with SVM model construction separation planes;
Fig. 2 is the ROC curve of training set classification results of the present invention;
Fig. 3 is the ROC curve of test set classification results of the present invention.
Embodiment
To be best understood from the present invention, with reference to embodiment and accompanying drawing, the invention will be further described, following examples
Only it is that the present invention will be described rather than it is limited.
Embodiment 1
It has selected sample below psoriasis 30 years old and studied 5168 altogether.Using based on python2 language
Adaboost-SVM models build model for susceptibility loci and classified.
The processing and conversion of 1 data
In the implementation case, variation information ped and the map file of sample is obtained by the detection that makes a variation first.Basis afterwards
Susceptibility loci (table 2) extracts HLA regions variation information.The marking of wherein type (1,2,7) is given a mark according to editing distance
(scoring matrix is shown in Table 3), amino acid sites and SNP site (3,4,5,6,8) are given a mark according to presence or absence, and there is marking is
1, it is 0 in the absence of marking.
The susceptibility loci of table 2
The editing distance scoring matrix of table 3
Data list is obtained, due to data volume 5168, so this case selects 2000 as training set, remaining sample is made
For test set.
2 substitute into model
The data handled well are substituted into and calculated in the adaboost-SVM models that the present invention is built, this case sets 9
SVM classifier, σ values are gradually successively decreased from big to small from 30 to 3.
3 obtain result
As shown in Figures 2 and 3, this case classification error rate is 23.9%, and training set AUC (area under ROC curve) is 0.833,
Test set AUC is 0.868, illustrates that the present invention reaches good result in the present embodiment.
Embodiment described above is only that the preferred embodiment of the present invention is described, not to the model of the present invention
Enclose and be defined, on the premise of design spirit of the present invention is not departed from, technical side of the those of ordinary skill in the art to the present invention
In various modifications and improvement that case is made, the protection domain that claims of the present invention determination all should be fallen into.
Claims (7)
1. a kind of construction method of the disaggregated model related to psoriasis, it is characterised in that comprise the following steps:
(1) psoriasis susceptibility loci is chosen;
(2) according to different types of susceptibility loci, it is converted into input data;
(3) classification of data is carried out using Adaboost-SVM models.
2. a kind of construction method of disaggregated model related to psoriasis according to claim 1, it is characterised in that:Step
(1) the psoriasis susceptibility loci includes at least one of HLA types, SNP site and amino acid.
3. a kind of construction method of disaggregated model related to psoriasis according to claim 2, it is characterised in that:It is described
The susceptibility loci of HLA types includes C*06:02、C*07:04、DPB1*05:At least one of 01;The SNP site and amino
The susceptibility loci of acid includes snp31443520, B:Y33Y、B:Y91C、B:At least one of Y140S, snp32472030.
4. a kind of construction method of disaggregated model related to psoriasis according to claim 1, it is characterised in that step
(2) method for transformation described in is:If susceptibility loci is one section of region, given a mark according to its similarity;If susceptibility loci is
One site, then give a mark according to its presence or absence.
5. a kind of construction method of disaggregated model related to psoriasis according to claim 1, it is characterised in that step
(3) classification comprises the following steps:
(31) then data projection to high-dimensional space is used into SVM model constructions in high-dimensional space using gaussian kernel function
Separation plane;
(32) equally equal weight is assigned to sample, then SVM is trained in training manifold data and calculates the grader
Error rate trains Weak Classifier, then the Weak Classifier that each training is obtained to be combined into strong classifier;
(33) data are classified and assessed.
6. a kind of construction method of disaggregated model related to psoriasis according to claim 5, it is characterised in that step
(31) formula of the gaussian kernel function described in is:
<mrow>
<mi>K</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mrow>
<mo>-</mo>
<mo>|</mo>
<mo>|</mo>
<mi>x</mi>
<mo>-</mo>
<mi>y</mi>
<mo>|</mo>
<msup>
<mo>|</mo>
<mn>2</mn>
</msup>
</mrow>
<mrow>
<mn>2</mn>
<msup>
<mi>&sigma;</mi>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
Wherein, x is any point in space, and y is selected space center, and σ is width parameter, K (x, y) for x to y space away from
From.
7. a kind of method of disaggregated model related to psoriasis according to claim 5, it is characterised in that step (33)
The appraisal procedure is calculating ROC curve area under.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710692864.8A CN107301323B (en) | 2017-08-14 | 2017-08-14 | Method for constructing classification model related to psoriasis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710692864.8A CN107301323B (en) | 2017-08-14 | 2017-08-14 | Method for constructing classification model related to psoriasis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107301323A true CN107301323A (en) | 2017-10-27 |
CN107301323B CN107301323B (en) | 2020-11-03 |
Family
ID=60131823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710692864.8A Active CN107301323B (en) | 2017-08-14 | 2017-08-14 | Method for constructing classification model related to psoriasis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107301323B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052796A (en) * | 2017-12-26 | 2018-05-18 | 云南大学 | Global human mtDNA development tree classification querying methods based on integrated study |
CN108961207A (en) * | 2018-05-02 | 2018-12-07 | 上海大学 | Lymph node Malignant and benign lesions aided diagnosis method based on multi-modal ultrasound image |
CN114371135A (en) * | 2021-10-25 | 2022-04-19 | 孙良丹 | Evaluation system for evaluating psoriasis and application |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030077617A1 (en) * | 2001-10-24 | 2003-04-24 | Myungho Kim | Method for diagnosis of a disease by using multiple SNP (single nucleotide polymorphism) variations and clinical data |
US20130225662A1 (en) * | 2008-11-17 | 2013-08-29 | Veracyte, Inc. | Methods and compositions of molecular profiling for disease diagnostics |
WO2016183348A1 (en) * | 2015-05-12 | 2016-11-17 | The Johns Hopkins University | Methods, systems and devices comprising support vector machine for regulatory sequence features |
CN106202936A (en) * | 2016-07-13 | 2016-12-07 | 为朔医学数据科技(北京)有限公司 | A kind of disease risks Forecasting Methodology and system |
CN106778065A (en) * | 2016-12-30 | 2017-05-31 | 同济大学 | A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein |
-
2017
- 2017-08-14 CN CN201710692864.8A patent/CN107301323B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030077617A1 (en) * | 2001-10-24 | 2003-04-24 | Myungho Kim | Method for diagnosis of a disease by using multiple SNP (single nucleotide polymorphism) variations and clinical data |
US20130225662A1 (en) * | 2008-11-17 | 2013-08-29 | Veracyte, Inc. | Methods and compositions of molecular profiling for disease diagnostics |
WO2016183348A1 (en) * | 2015-05-12 | 2016-11-17 | The Johns Hopkins University | Methods, systems and devices comprising support vector machine for regulatory sequence features |
CN106202936A (en) * | 2016-07-13 | 2016-12-07 | 为朔医学数据科技(北京)有限公司 | A kind of disease risks Forecasting Methodology and system |
CN106778065A (en) * | 2016-12-30 | 2017-05-31 | 同济大学 | A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein |
Non-Patent Citations (4)
Title |
---|
VIMAL K. SHRIVASTAVA ET AL: "A novel and robust Bayesian approach for segmentation of psoriasis lesions and its risk stratification", 《COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE》 * |
刘杰: "肺癌关联的基因多态位点的识别与预测模型的构建", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 * |
王文俊: "汉族人银屑病HLA区域精细定位研究", 《中国博士学位论文全文数据库 医药卫生科技辑》 * |
王晓丹: "一种基于AdaBoost的SVM分类器", 《空军工程大学学报(自然科学版)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052796A (en) * | 2017-12-26 | 2018-05-18 | 云南大学 | Global human mtDNA development tree classification querying methods based on integrated study |
CN108052796B (en) * | 2017-12-26 | 2021-07-13 | 云南大学 | Global human mtDNA development tree classification query method based on ensemble learning |
CN108961207A (en) * | 2018-05-02 | 2018-12-07 | 上海大学 | Lymph node Malignant and benign lesions aided diagnosis method based on multi-modal ultrasound image |
CN108961207B (en) * | 2018-05-02 | 2022-11-04 | 上海大学 | Auxiliary diagnosis method for benign and malignant lymph node lesion based on multi-modal ultrasound images |
CN114371135A (en) * | 2021-10-25 | 2022-04-19 | 孙良丹 | Evaluation system for evaluating psoriasis and application |
CN114371135B (en) * | 2021-10-25 | 2024-01-30 | 孙良丹 | Evaluation system for evaluating psoriasis and application |
Also Published As
Publication number | Publication date |
---|---|
CN107301323B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105426842B (en) | Multiclass hand motion recognition method based on support vector machines and surface electromyogram signal | |
CN104408478B (en) | A kind of hyperspectral image classification method based on the sparse differentiation feature learning of layering | |
CN107301323A (en) | A kind of construction method of the disaggregated model related to psoriasis | |
CN106778853A (en) | Unbalanced data sorting technique based on weight cluster and sub- sampling | |
CN105975992A (en) | Unbalanced data classification method based on adaptive upsampling | |
CN104008375B (en) | The integrated face identification method of feature based fusion | |
CN104462409B (en) | Across language affection resources data identification method based on AdaBoost | |
CN103793694B (en) | Human face recognition method based on multiple-feature space sparse classifiers | |
CN103400144A (en) | Active learning method based on K-neighbor for support vector machine (SVM) | |
CN105069774A (en) | Object segmentation method based on multiple-instance learning and graph cuts optimization | |
CN111369045A (en) | Method for predicting short-term photovoltaic power generation power | |
CN107767387A (en) | Profile testing method based on the global modulation of changeable reception field yardstick | |
CN103426004B (en) | Model recognizing method based on error correcting output codes | |
CN107943830A (en) | A kind of data classification method suitable for higher-dimension large data sets | |
CN106251362A (en) | A kind of sliding window method for tracking target based on fast correlation neighborhood characteristics point and system | |
Aldhlan et al. | Novel mechanism to improve hadith classifier performance | |
CN103631753A (en) | Progressively-decreased subspace ensemble learning algorithm | |
Ozkok et al. | Convolutional neural network analysis of recurrence plots for high resolution melting classification | |
CN116821698A (en) | Wheat scab spore detection method based on semi-supervised learning | |
CN103810482A (en) | Multi-information fusion classification and identification method | |
CN106951728B (en) | Tumor key gene identification method based on particle swarm optimization and scoring criterion | |
CN111815609A (en) | Pathological image classification method and system based on context awareness and multi-model fusion | |
CN102930291A (en) | Automatic K adjacent local search heredity clustering method for graphic image | |
CN107451538A (en) | Human face data separability feature extracting method based on weighting maximum margin criterion | |
CN103246897B (en) | A kind of Weak Classifier inner structure method of adjustment based on AdaBoost |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |