CN107301323B - A method for constructing a classification model related to psoriasis - Google Patents

A method for constructing a classification model related to psoriasis Download PDF

Info

Publication number
CN107301323B
CN107301323B CN201710692864.8A CN201710692864A CN107301323B CN 107301323 B CN107301323 B CN 107301323B CN 201710692864 A CN201710692864 A CN 201710692864A CN 107301323 B CN107301323 B CN 107301323B
Authority
CN
China
Prior art keywords
psoriasis
data
classification
svm
susceptibility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710692864.8A
Other languages
Chinese (zh)
Other versions
CN107301323A (en
Inventor
孙良丹
张涛
甄琪
王文俊
钱文君
莫晓东
吴静
郑晓冬
李报
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Shenzhen Co Ltd
First Affiliated Hospital of Anhui Medical University
Original Assignee
BGI Shenzhen Co Ltd
First Affiliated Hospital of Anhui Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Shenzhen Co Ltd, First Affiliated Hospital of Anhui Medical University filed Critical BGI Shenzhen Co Ltd
Priority to CN201710692864.8A priority Critical patent/CN107301323B/en
Publication of CN107301323A publication Critical patent/CN107301323A/en
Application granted granted Critical
Publication of CN107301323B publication Critical patent/CN107301323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of medical detection, in particular to a method for constructing a classification model related to psoriasis, which comprises the following steps: (1) selecting psoriasis susceptible sites; (2) converting the susceptible loci into input data according to different types of susceptible loci; (3) and classifying the data by using an Adaboost-SVM model. At present, relevant technologies are lacked to classify and predict psoriasis data, and only the existence of judgment sites is remained to infer the diseased situation. The invention utilizes the effective machine learning classifier SVM to classify, integrates the SVM by the adaboost frame, and improves the accuracy of the classifier. The model can integrate SNP, amino acid and type data for classification, comprehensively considers the information of each dimension, and improves the accuracy of the classification result.

Description

一种与银屑病相关的分类模型的构建方法A method for constructing a classification model related to psoriasis

技术领域technical field

本发明涉及医学检测技术领域,具体涉及一种与银屑病相关的分类模型的构建方法。The invention relates to the technical field of medical detection, in particular to a method for constructing a classification model related to psoriasis.

背景技术Background technique

银屑病又称牛皮癣是一种常见的复杂疾病,有报道银屑病的发生与遗传因素相关,尤其是人类白细胞抗原区域(HLA),但真正相关的位点并未可知。Psoriasis, also known as psoriasis, is a common and complex disease. It has been reported that the occurrence of psoriasis is related to genetic factors, especially the human leukocyte antigen region (HLA), but the real relevant site is unknown.

随着测序技术的发展和基因组研究的深入,在去年《自然遗传》上就有报道中国人MHC区域的高深度测序和精准变异检测,在其基因组关联分析中定位了数个银屑病的易感位点。但是目前尚缺乏基于HLA区域的易感位点的分类和预测模型。所以急需开发相关的分类预测工具利用HLA区域易感位点对数据进行分类预测。With the development of sequencing technology and the deepening of genome research, high-depth sequencing and precise variant detection of Chinese MHC regions were reported in Nature Genetics last year, and several psoriasis prone spots were located in its genome association analysis. sense site. However, there is still a lack of classification and prediction models of susceptibility loci based on HLA regions. Therefore, there is an urgent need to develop relevant classification prediction tools to use HLA region susceptible loci to classify and predict data.

银屑病与HLA最显著相关,但目前的技术缺乏对HLA区域针对性的运用。近期HLA区域进行精准变异检测得到突破,精准的定位了HLA上与银屑病相关的易感位点。本发明针对这些易感位点对其进行编码和再用机器学习模型Adaboost进行分类,可以整合利用HLA区域找到的易感位点信息。利用机器学习模型对数据进行综合分析,提高分类准确性,为银屑病的预防筛查提供依据。Psoriasis is most significantly associated with HLA, but current technologies lack the ability to target HLA regions. Recently, a breakthrough has been made in accurate mutation detection in the HLA region, and the psoriasis-related susceptibility loci on the HLA have been accurately located. The present invention encodes these susceptibility sites and then uses the machine learning model Adaboost to classify them, and can integrate the susceptibility site information found by using the HLA region. The machine learning model is used to comprehensively analyze the data, improve the classification accuracy, and provide a basis for the prevention and screening of psoriasis.

发明内容SUMMARY OF THE INVENTION

本发明的目的是解决上述现有技术的不足,基于对MHC区域的全覆盖找到与银屑病相关的生物标记,基于HLA区域独立相关的易感位点,利用SVM-Adaboost构建银屑病的分类模型,提供一种与银屑病相关的分类模型的构建方法,为银屑病的预防筛查提供依据。The purpose of the present invention is to solve the above-mentioned deficiencies of the prior art, find the biomarkers related to psoriasis based on the full coverage of the MHC region, and use SVM-Adaboost to construct a psoriasis-related susceptibility site based on the independent related susceptibility sites of the HLA region. The classification model provides a method for constructing a classification model related to psoriasis, and provides a basis for the prevention and screening of psoriasis.

本发明是通过以下技术方案实现的:The present invention is achieved through the following technical solutions:

1 数据处理和转换1 Data processing and transformation

将各个样本的变异进行编码。通过高通量测序数据获得变异信息,包括HLA型别(C*06:02、C*07:04、DPB1*05:01),单核苷酸多态性位点(SNP位点)和氨基酸(snp31443520、B:Y33Y、B:Y91C、B:Y140S、snp32472030)。Coding the variation of each sample. Variation information, including HLA type (C*06:02, C*07:04, DPB1*05:01), single nucleotide polymorphism sites (SNP sites) and amino acids, was obtained from high-throughput sequencing data (snp31443520, B:Y33Y, B:Y91C, B:Y140S, snp32472030).

然后对每样本,根据易感位点,转化为本发明所需要的输入数据。针对HLA型别采用编辑距离打分,SNP和氨基酸采用0/1打分。具体方法如下:①针对易感HLA型别,计算每个个体该型别与易感型别的编辑距离并打分;②针对SNP位点,如果突变存在记为1,不存在记为0;③针对氨基酸突变,如果突变存在记为1,不存在记为0。Then, for each sample, according to the susceptibility locus, it is converted into the input data required by the present invention. Edit distance scoring was used for HLA types, and 0/1 scoring was used for SNPs and amino acids. The specific methods are as follows: ① For the susceptible HLA type, calculate the edit distance between the type and the susceptible type of each individual and score; ② For the SNP site, if the mutation exists, it is recorded as 1, and if there is no mutation, it is recorded as 0; ③ For amino acid mutations, the presence of the mutation is scored as 1, and the absence of the mutation is scored as 0.

打分完成后,将数据随机拆分,拆分为测试集和训练集,注意测试集和训练集数据没有重叠。样本数少的时候,可以按照5折交叉法(或10折交叉法)将数据分成5份(10份),每次取出1作为测试集,其余的作为训练集。After the scoring is completed, the data is randomly split into a test set and a training set. Note that the test set and training set data do not overlap. When the number of samples is small, the data can be divided into 5 parts (10 parts) according to the 5-fold crossover method (or 10-fold crossover method), and each time 1 is taken as the test set, and the rest are used as the training set.

2 利用adaboost-SVM模型进行数据的分类2 Classification of data using adaboost-SVM model

本发明利用adaboost方法来集成支持向量机(SVM)分类器,整合利用所有的易感位点信息,提高数据的分类的正确率。The invention utilizes the adaboost method to integrate the support vector machine (SVM) classifier, integrates and utilizes all susceptible site information, and improves the accuracy of data classification.

2.1 关于分类模型的构建2.1 About the construction of the classification model

2.1.1 子分类模型SVM2.1.1 Subclassification model SVM

支持向量机模型SVM是经典的机器学习分类软件,属于有监督式学习。本发明首先利用的高斯核函数(公式1)将数据投射到高维度空间。Support Vector Machine Model SVM is a classic machine learning classification software, which belongs to supervised learning. The Gaussian kernel function (Equation 1) first utilized by the present invention projects the data into a high-dimensional space.

Figure BDA0001378328710000021
Figure BDA0001378328710000021

其中,x为空间中任意一点,y为所选空间中心,σ为宽度参数,K(x,y)为x到y的空间距离。Among them, x is any point in the space, y is the center of the selected space, σ is the width parameter, and K(x, y) is the spatial distance from x to y.

之后高维度空间中用SVM模型构建分隔平面。分隔平面构建主要是通过距离分隔平面最近的数个点来确定(如图1所示A点就是最近的点之一),并且将最近的点到分隔平面的连线称为支持向量,当支持向量达到最大化时候的平面就设为分隔平面,也即是通过分隔平面将数据最大地分开。本发明采用基于python 2的SVM模型(参考网站https://www.manning.com/books/machine-learning-in-action)。Afterwards, the SVM model is used to construct the separation plane in the high-dimensional space. The construction of the separation plane is mainly determined by the points closest to the separation plane (as shown in Figure 1, point A is one of the closest points), and the connection between the nearest point and the separation plane is called a support vector. The plane when the vector is maximized is set as the separation plane, that is, the data is maximally separated by the separation plane. The present invention adopts the SVM model based on python 2 (refer to the website https://www.manning.com/books/machine-learning-in-action).

2.1.2 分类模型集成算法Adaboost2.1.2 Classification Model Integration Algorithm Adaboost

Adaboost是一种基于错误提升分类器性能的集成方法,通过每一个样本多次训练,通过错误率反复修正分类器最后整合得到集成后的结果。具体方法:首先对样本赋予一样同等的权重。然后在训练数集数据上训练SVM并计算该分类器的错误率(ε,公式2)。Adaboost is an ensemble method that improves the performance of classifiers based on errors. Through multiple training of each sample, the classifier is repeatedly corrected by the error rate and finally integrated to obtain the integrated results. Specific method: First, assign the same weight to the samples. The SVM is then trained on the training dataset and the error rate (ε, Equation 2) of this classifier is calculated.

错误率ε=正确分类数目/总样本数目 (公式2)Error rate ε=Number of correct classifications/Number of total samples (Formula 2)

然后调整高斯核函数σ,之后在同一数据集上再次SVM。在分类器的第二次训练当中,将会重新调整每个样本的权重(这里的权重是一个多维度的向量),其中分类正确样本的下次分类权重将会降低,分类错误的样本的下次权重将会提高。也就是说,最终达到分类正确时候的权重会比分类错误的权重占比要大。具体方法是根据错误率计算每个分类器的权重α。Then adjust the Gaussian kernel function σ, and then perform SVM again on the same dataset. In the second training of the classifier, the weight of each sample will be re-adjusted (here the weight is a multi-dimensional vector), in which the next classification weight of the correctly classified sample will be reduced, and the lower classification weight of the wrongly classified sample will be reduced. The secondary weight will increase. That is to say, in the end, the weight of the correct classification will be larger than the weight of the wrong classification. The specific method is to calculate the weight α of each classifier according to the error rate.

Figure BDA0001378328710000031
Figure BDA0001378328710000031

计算出α之后可以对权重进行更新。The weights can be updated after α is calculated.

分类正确:Correct classification:

Figure BDA0001378328710000032
Figure BDA0001378328710000032

分类错误:Misclassification:

Figure BDA0001378328710000033
Figure BDA0001378328710000033

α为基本分类器在最终分类器中的权重,ε为分类器的错误率;(t)代表顺序,t代表本次,t+1代表下一次;Di为第i个训练样本权值。α is the weight of the basic classifier in the final classifier, ε is the error rate of the classifier; (t) represents the order, t represents this time, and t+1 represents the next time; D i is the weight of the ith training sample.

计算权值D之后,开始进入下一轮迭代。不断地重复训练和调整权重的过程,直到训练错误率为0或者弱分类器的数目达到指定值。本发明采用基于python2的adaboost集成框架(参考网站https://www.manning.com/books/machine-learning-in-action)After calculating the weight D, start to enter the next round of iteration. The process of training and adjusting the weights is repeated continuously until the training error rate is 0 or the number of weak classifiers reaches a specified value. The present invention adopts the adaboost integration framework based on python2 (refer to the website https://www.manning.com/books/machine-learning-in-action)

3 对数据进行分类和评估3 Classify and evaluate data

构建好输入训练集和测试集之后,代入构建的adaboost-SVM模型中进行分类。通过分类模型的结果与实际患病与否的情况进行比较。通过计算准确率和绘制ROC曲线来对结果进行评估。After the input training set and test set are constructed, they are substituted into the constructed adaboost-SVM model for classification. The results of the classification model are compared with the actual disease or not. The results were evaluated by calculating the accuracy and plotting the ROC curve.

ROC曲线是用于选择最佳的信号模型的方法。通常可计算ROC曲线下方面积(AUC)来判断分类模型好坏,具体参考表1。The ROC curve is the method used to select the best signal model. Usually, the area under the ROC curve (AUC) can be calculated to judge whether the classification model is good or bad. For details, refer to Table 1.

表1Table 1

Figure BDA0001378328710000041
Figure BDA0001378328710000041

本发明的有益效果在于:The beneficial effects of the present invention are:

目前缺乏相关的技术来对银屑病数据进行分类和预测,只停留在判断位点有无来推断患病情况。本发明利用有效的机器学习分类器SVM进行分类,并通过了adaboost框架来集成SVM,提高分类器的准确性。该模型可以整合SNP、氨基酸和型别数据进行分类,综合考虑各个维度的信息,提高了数据了分类结果的准确性。At present, there is a lack of relevant technologies to classify and predict psoriasis data, and it only stops at judging the presence or absence of loci to infer the disease situation. The invention uses the effective machine learning classifier SVM for classification, and integrates the SVM through the adaboost framework, so as to improve the accuracy of the classifier. The model can integrate SNP, amino acid and type data for classification, comprehensively consider the information of each dimension, and improve the accuracy of the classification results.

附图说明Description of drawings

图1为高维度空间中用SVM模型构建分隔平面的示意图;Fig. 1 is a schematic diagram of constructing a separation plane with an SVM model in a high-dimensional space;

图2为本发明训练集分类结果的ROC曲线;Fig. 2 is the ROC curve of training set classification result of the present invention;

图3为本发明测试集分类结果的ROC曲线。Fig. 3 is the ROC curve of the classification result of the test set of the present invention.

具体实施方式Detailed ways

为更好理解本发明,下面结合实施例及附图对本发明作进一步描述,以下实施例仅是对本发明进行说明而非对其加以限定。In order to better understand the present invention, the present invention will be further described below with reference to the embodiments and the accompanying drawings. The following embodiments are only to illustrate the present invention and not to limit it.

实施例1Example 1

选择了银屑病30岁以下样本进行研究共计5168例。利用基于python2语言的adaboost-SVM模型针对易感位点构建模型进行分类。A total of 5168 patients with psoriasis under the age of 30 were selected for the study. The adaboost-SVM model based on the python2 language was used to construct a model for the classification of susceptible loci.

1 数据的处理和转换1 Data processing and transformation

本实施案例中,首先通过变异检测获得样本的变异信息ped和map文件。之后根据易感位点(表2)提取出HLA区域变异信息。其中型别(1、2、7)的打分按照编辑距离进行打分(打分矩阵见表3),氨基酸位点和SNP位点(3、4、5、6、8)按照存在与否进行打分,存在打分为1,不存在打分为0。In this implementation case, the variation information ped and map files of the sample are first obtained through variation detection. Afterwards, HLA region variation information was extracted according to the susceptible sites (Table 2). Types (1, 2, 7) are scored according to edit distance (see Table 3 for scoring matrix), and amino acid sites and SNP sites (3, 4, 5, 6, 8) are scored according to their presence or absence, The presence is scored as 1, the absence is scored as 0.

表2 易感位点Table 2 Susceptibility sites

Figure BDA0001378328710000051
Figure BDA0001378328710000051

表3 编辑距离打分矩阵Table 3 Edit distance scoring matrix

Figure BDA0001378328710000052
Figure BDA0001378328710000052

得到数据列表,由于数据量5168例,所以本案选择2000例作为训练集,余下样本作为测试集。The data list is obtained. Due to the data volume of 5168 cases, 2000 cases are selected as the training set in this case, and the remaining samples are used as the test set.

2 代入模型2 Substitute the model

将处理好的数据代入本发明构建的adaboost-SVM模型中进行计算,本案设置9个SVM分类器,σ取值从30到3,从大到小逐次递减。Substitute the processed data into the adaboost-SVM model constructed by the present invention for calculation. In this case, 9 SVM classifiers are set, and the value of σ ranges from 30 to 3, decreasing successively from large to small.

3 得到结果3 get the result

如图2和3所示,本案分类错误率为23.9%,训练集AUC(ROC曲线下面积)为0.833,测试集AUC为0.868,说明本发明在本实施例中达到良好效果。As shown in Figures 2 and 3, the classification error rate of this case is 23.9%, the training set AUC (area under the ROC curve) is 0.833, and the test set AUC is 0.868, indicating that the present invention achieves good results in this embodiment.

以上所述实施方式仅仅是对本发明的优选实施方式进行描述,并非对本发明的范围进行限定,在不脱离本发明设计精神的前提下,本领域普通技术人员对本发明的技术方案作出的各种变形和改进,均应落入本发明的权利要求书确定的保护范围内。The above-mentioned embodiments are only to describe the preferred embodiments of the present invention, and do not limit the scope of the present invention. On the premise of not departing from the design spirit of the present invention, various modifications made by those of ordinary skill in the art to the technical solutions of the present invention and improvements, all should fall within the protection scope determined by the claims of the present invention.

Claims (3)

1.一种与银屑病相关的分类模型的构建方法,其特征在于,包括以下步骤:1. a construction method of a classification model relevant to psoriasis, is characterized in that, comprises the following steps: (1)选取银屑病易感位点;(1) Select psoriasis susceptibility sites; (2)根据不同类型的易感位点,转化为输入数据;(2) According to different types of susceptibility loci, convert it into input data; (3)利用Adaboost-SVM模型进行数据的分类;(3) Use the Adaboost-SVM model to classify data; 步骤(1)所述银屑病易感位点包括HLA型别、SNP位点和氨基酸中的至少一种;The psoriasis susceptibility site of step (1) includes at least one of HLA type, SNP site and amino acid; 所述HLA型别的易感位点包括C*06:02、C*07:04、DPB1*05:01中的至少一种;The susceptibility site of the HLA type includes at least one of C*06:02, C*07:04, and DPB1*05:01; 所述SNP位点和氨基酸的易感位点包括snp31443520、B:Y33Y、B:Y91C、B:Y140S、snp32472030中的至少一种;The SNP site and the susceptibility site of the amino acid include at least one of snp31443520, B:Y33Y, B:Y91C, B:Y140S, and snp32472030; 步骤(2)所述的转化方法为:针对HLA型别采用编辑距离打分,SNP和氨基酸采用0/1打分;具体方法如下:①针对易感HLA型别,计算每个个体该型别与易感型别的编辑距离并打分;②针对SNP位点,如果突变存在记为1,不存在记为0;③针对氨基酸突变,如果突变存在记为1,不存在记为0;The transformation method described in step (2) is: using edit distance scoring for HLA type, and 0/1 scoring for SNP and amino acid; the specific method is as follows: 1. For susceptible HLA type, calculate the difference between the type and the susceptible HLA type of each individual; 2) For SNP sites, if the mutation exists, it is recorded as 1, and if there is no mutation, it is recorded as 0; (3) for amino acid mutations, if the mutation exists, it is recorded as 1, and if there is no mutation, it is recorded as 0; 步骤(3)所述分类包括以下步骤:The classification of step (3) includes the following steps: (31)利用高斯核函数将数据投射到高维度空间,然后在高维度空间中用SVM模型构建分隔平面;(31) using Gaussian kernel function to project the data into high-dimensional space, and then use SVM model to construct the separation plane in the high-dimensional space; (32)对样本赋予一样同等的权重,然后在训练数集数据上训练SVM并计算分类器的错误率训练弱分类器,再将各个训练得到的弱分类器组合成强分类器;(32) Give the same equal weight to the sample, then train the SVM on the training data set data and calculate the error rate of the classifier to train the weak classifier, and then combine the weak classifiers obtained by each training into a strong classifier; (33)对数据进行分类和评估。(33) Classify and evaluate data. 2.根据权利要求1所述的一种与银屑病相关的分类模型的构建方法,其特征在于,步骤(31)所述的高斯核函数的公式为:2. the construction method of a kind of classification model relevant to psoriasis according to claim 1, is characterized in that, the formula of the described Gaussian kernel function of step (31) is:
Figure FDA0002644322990000021
Figure FDA0002644322990000021
其中,x为空间中任意一点,y为所选空间中心,σ为宽度参数,K(x,y)为x到y的空间距离。Among them, x is any point in the space, y is the center of the selected space, σ is the width parameter, and K(x, y) is the spatial distance from x to y.
3.根据权利要求1所述的一种与银屑病相关的分类模型的构建方法,其特征在于,步骤(33)所述评估方法为计算ROC曲线下方面积。3. The method for constructing a classification model related to psoriasis according to claim 1, wherein the evaluation method in step (33) is to calculate the area under the ROC curve.
CN201710692864.8A 2017-08-14 2017-08-14 A method for constructing a classification model related to psoriasis Active CN107301323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710692864.8A CN107301323B (en) 2017-08-14 2017-08-14 A method for constructing a classification model related to psoriasis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710692864.8A CN107301323B (en) 2017-08-14 2017-08-14 A method for constructing a classification model related to psoriasis

Publications (2)

Publication Number Publication Date
CN107301323A CN107301323A (en) 2017-10-27
CN107301323B true CN107301323B (en) 2020-11-03

Family

ID=60131823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710692864.8A Active CN107301323B (en) 2017-08-14 2017-08-14 A method for constructing a classification model related to psoriasis

Country Status (1)

Country Link
CN (1) CN107301323B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052796B (en) * 2017-12-26 2021-07-13 云南大学 Global human mtDNA development tree classification query method based on ensemble learning
CN108961207B (en) * 2018-05-02 2022-11-04 上海大学 Auxiliary diagnosis method for benign and malignant lymph node lesions based on multimodal ultrasound images
CN114371135B (en) * 2021-10-25 2024-01-30 孙良丹 Evaluation system for evaluating psoriasis and application

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016183348A1 (en) * 2015-05-12 2016-11-17 The Johns Hopkins University Methods, systems and devices comprising support vector machine for regulatory sequence features
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20030032395A (en) * 2001-10-24 2003-04-26 김명호 Method for Analyzing Correlation between Multiple SNP and Disease
US8541170B2 (en) * 2008-11-17 2013-09-24 Veracyte, Inc. Methods and compositions of molecular profiling for disease diagnostics
CN106202936A (en) * 2016-07-13 2016-12-07 为朔医学数据科技(北京)有限公司 A kind of disease risks Forecasting Methodology and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016183348A1 (en) * 2015-05-12 2016-11-17 The Johns Hopkins University Methods, systems and devices comprising support vector machine for regulatory sequence features
CN106778065A (en) * 2016-12-30 2017-05-31 同济大学 A kind of Forecasting Methodology based on multivariate data prediction DNA mutation influence interactions between protein

Also Published As

Publication number Publication date
CN107301323A (en) 2017-10-27

Similar Documents

Publication Publication Date Title
CN109273096B (en) A machine learning-based drug risk classification assessment method
WO2020199345A1 (en) Semi-supervised and heterogeneous software defect prediction algorithm employing github
US11837329B2 (en) Method for classifying multi-granularity breast cancer genes based on double self-adaptive neighborhood radius
CN105069470A (en) Classification model training method and device
CN106778065B (en) A kind of prediction technique influencing interactions between protein based on multivariate data prediction DNA mutation
CN105938523B (en) The Gene Selection Method of feature based identification and independence
CN103955628B (en) The protein vitamin binding site estimation method being merged based on subspace
CN104866863B (en) A kind of biomarker screening technique
CN107301323B (en) A method for constructing a classification model related to psoriasis
CN108038352B (en) Method for mining whole genome key genes by combining differential analysis and association rules
CN103761426B (en) A kind of method and system quickly identifying feature combination in high dimensional data
CN106156805A (en) A kind of classifier training method of sample label missing data
CN110853756A (en) Esophagus cancer risk prediction method based on SOM neural network and SVM
CN105740914A (en) Vehicle license plate identification method and system based on neighboring multi-classifier combination
US20220277811A1 (en) Detecting False Positive Variant Calls In Next-Generation Sequencing
CN104318241A (en) Local density spectral clustering similarity measurement algorithm based on Self-tuning
CN105825078A (en) Small sample gene expression data classification method based on gene big data
CN102346817B (en) Prediction method for establishing allergen of allergen-family featured peptides by means of SVM (Support Vector Machine)
CN117393042A (en) Analysis method for predicting pathogenicity of missense mutation
CN107480441B (en) Modeling method and system for children septic shock prognosis prediction
CN109326329A (en) An ensemble learning-based method for predicting the action sites of zinc-binding proteins in a non-equilibrium model
CN106951728B (en) Tumor key gene identification method based on particle swarm optimization and scoring criterion
CN103093239B (en) A kind of merged point to neighborhood information build drawing method
CN111863135B (en) False positive structural variation filtering method, storage medium and computing device
CN110211632A (en) A kind of nucleotide unit point mutation detection method neural network based

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant