WO2022011855A1 - 一种假阳性结构变异过滤方法、存储介质及计算设备 - Google Patents
一种假阳性结构变异过滤方法、存储介质及计算设备 Download PDFInfo
- Publication number
- WO2022011855A1 WO2022011855A1 PCT/CN2020/120315 CN2020120315W WO2022011855A1 WO 2022011855 A1 WO2022011855 A1 WO 2022011855A1 CN 2020120315 W CN2020120315 W CN 2020120315W WO 2022011855 A1 WO2022011855 A1 WO 2022011855A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- purity
- data
- structural variation
- feature
- samples
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000001914 filtration Methods 0.000 title claims abstract description 27
- 238000003860 storage Methods 0.000 title claims abstract description 6
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 238000004458 analytical method Methods 0.000 claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims description 32
- 230000005012 migration Effects 0.000 claims description 30
- 238000013508 migration Methods 0.000 claims description 30
- 238000012549 training Methods 0.000 claims description 23
- 238000012360 testing method Methods 0.000 claims description 17
- 230000009466 transformation Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 238000009826 distribution Methods 0.000 claims description 13
- 238000003066 decision tree Methods 0.000 claims description 12
- 238000013145 classification model Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 abstract description 23
- 238000012546 transfer Methods 0.000 abstract description 6
- 238000000605 extraction Methods 0.000 abstract description 3
- 206010028980 Neoplasm Diseases 0.000 description 12
- 239000000243 solution Substances 0.000 description 8
- 238000010790 dilution Methods 0.000 description 7
- 239000012895 dilution Substances 0.000 description 7
- 238000013526 transfer learning Methods 0.000 description 7
- 108090000623 proteins and genes Proteins 0.000 description 5
- 238000007481 next generation sequencing Methods 0.000 description 4
- 238000004088 simulation Methods 0.000 description 4
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000000711 cancerogenic effect Effects 0.000 description 1
- 231100000315 carcinogenic Toxicity 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 230000001808 coupling effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000008303 genetic mechanism Effects 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Definitions
- the invention belongs to the technical field of data science, and in particular relates to a false positive structural variation filtering method, a storage medium and a computing device considering diluted sequencing signals.
- Genome Structural Variations (English name: Structural Variations, English abbreviation: SV) refers to changes in gene structure, which are a kind of complex and directly carcinogenic chromosomal variation. Tumours arise in cells due to the accumulation of genomic variation in their tissue properties.
- next-generation sequencing technology English name: Next Generation Sequencing, English abbreviation: NGS
- NGS Next Generation Sequencing, English abbreviation: NGS
- the identification of genetic structural variation is obtained by comparing and analyzing individual gene sequencing results with reference sequences.
- the existing structural variation detection methods and software can accurately detect different types of structural variation and determine the size, location and other information of the variation. Accurate identification of structural variants can not only accelerate human research on genetic mechanisms, but also play a very important role in revealing complex disease mechanisms.
- the first type of methods use features as benchmarks to filter false positives, and filter structural variants that do not pass the feature threshold as false positive structural variants. Therefore, if the feature threshold is not properly set, it is easy to cause misjudgment, and these one-size-fits-all benchmarks will It is difficult to find a threshold setting that perfectly distinguishes false positives and does not delete low-frequency variants by mistake. When dealing with low-purity samples, the accuracy rate is very low;
- Machine learning filtering methods use samples of fixed purity as a training set. These methods treat the false positive filtering problem as a classification problem and use different features as classification criteria. Although the filtering effect is good, the classification feature baseline obtained by training is only suitable for this fixed feature. When they deal with low-purity samples different from the training samples, the baseline of the classification feature is no longer accurate, and the classification accuracy is significantly reduced, showing a very high false positives.
- the technical problem to be solved by the present invention is to provide a filtering method, storage medium and equipment that consider the false positive structural variation of the diluted sequencing signal in view of the above-mentioned deficiencies in the prior art. Variation detection is affected by tumor purity and clonal structure, and when the sequencing signal is diluted to produce a large number of false positives, the problem of using transfer learning strategy to achieve false positive filtering.
- the present invention adopts the following technical solutions to realize:
- a false-positive structural variant filtering method that considers diluted sequencing signals, including the following steps:
- transformation matrices after feature dimension reduction are obtained, which contain 23 column vectors. Each column vector is used as a feature, and a new set of all features of structural variation ⁇ ' is obtained.
- the transformation matrix W is used as the feature data set, and the corresponding label set is the original label set Y p , each candidate structural variation is represented by a vector x′ of 23 features in a row, and the label is the original label y, and the classification model is trained based on the extreme random tree model, Predict true and false positive structural variants;
- the prediction set of tags Y 'p are classified as true positive structural variants 1, classified as false positives 0 structural variants, structural variation of the filter tag 0 is classified as structural variants of true-positive result as the final output, complete false positive Structural variant filtering.
- step S2 is specifically:
- step S3 is specifically:
- the migration component analysis uses the maximum mean difference to measure the distance between the distributions of the two domains
- step S301 the target domain data set D t is specifically:
- n 2 represents the number of samples in the target domain, is the feature space and label of the target domain, p is the sample purity of the target domain, and P is the set of samples of different purity;
- the source domain dataset D s specifically:
- n 1 represents the number of samples in the source domain
- p j is the sample purity of the source domain
- step S302 the maximum mean difference distance DISTANCE (D s , D t ) is calculated as follows:
- x i is the data in the source domain
- x j is the data in the target domain
- n 1 represents the number of samples in the source domain
- n 2 represents the number of samples in the target domain
- step S303 is specifically:
- the maximum mean difference distance matrix L is calculated, and the calculation method of each element l ij is:
- the center matrix H is:
- x i is the data in the source domain
- x j is the data in the target domain
- n 1 represents the number of samples in the source domain
- n 2 represents the number of samples in the target domain
- K s,s , K t,t are the Gram matrices defined on the source domain and target domain data in the embedding space respectively
- K s,t is the Gram matrix defined on the cross-domain data
- K t,s K s, t T .
- step S4 is specifically:
- Test set for each purity A training set corresponding to multiple purities other than itself
- the model trained on each training set uses the test set to classify the true and false structural variants, and obtains the label set ⁇ ' of all purity samples, including m-1 label sets.
- step S5 the final predicted label set Y' p is:
- n is the number of samples with different purity.
- Another technical solution of the present invention is a computer-readable storage medium storing one or more programs, the one or more programs including instructions that, when executed by a computing device, cause the computing device to execute any of the methods described.
- Another technical solution of the present invention is, a kind of filtering equipment, comprising:
- One or more processors a memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including using instructions for performing any of the described methods.
- the present invention has the following beneficial effects:
- the present invention is a structural variation detection false positive filtering method based on a migration learning strategy considering the diluted sequencing signal, data migration based on the migration learning strategy and then using a machine learning model for classification, which solves the problem of feature selection, tumor purity and clonal structure of the existing methods.
- the problem of false positives caused by the diluted sequencing signal samples does not require the accurate value of the sample purity, and can be applied to samples of different purities, and shows good performance.
- the characteristic data of different sample purities are used as the source domain and the target domain respectively, and the data migration is performed by using transfer component analysis (English name: Transfer Component Analysis, English abbreviation: TCA), and the optimal parameters of the method are obtained through multiple experiments. Finally, the feature transformation matrix of the two fields is obtained;
- the source domain feature transformation matrices of different sample purities are respectively input into the extreme decision tree (English name: Extra Tree, English abbreviation: ET) for training, and the optimal parameters of the model are obtained through grid search, and finally multiple trainings are obtained. Good extreme decision tree model.
- the fixed sample purity target domain feature transformation matrix is input into each extreme decision tree model as a test set, and a majority voting method is used to determine the final prediction label for the results predicted by all models;
- the structural variants with false positive labels are filtered, and the output is true positive results.
- the present invention extracts the initial features from the structural variation detection result file, and combines the migration component analysis method and the extreme decision tree model to use the same model to well adapt to the structural variation detection samples with different degrees of diluted sequencing signals, and The filtering is more accurate and stable.
- Fig. 1 is the flow chart of the present invention
- Figure 2 is a graph of the comparison results of a small number of samples in the simulation data set, where (a) is the accuracy, (b) is the recall rate, (c) is the F1 value, and (d) is the precision;
- Figure 3 shows the comparison result of the wrongly labeled samples in the simulation data set, in which (a) is the accuracy, (b) is the recall rate, (c) is the F1 value, and (d) is the precision;
- Figure 4 is a comparison chart of the experimental results in the real data set.
- a layer/element when referred to as being "on" another layer/element, it can be directly on the other layer/element or intervening layers/elements may be present therebetween. element.
- a layer/element when a layer/element is “on” another layer/element in one orientation, then when the orientation is reversed, the layer/element can be "under” the other layer/element.
- the transfer learning strategy can indiscriminately judge the entire purity of the samples regardless of the purity of the model training samples, remove false positives, and improve the accuracy of low-frequency mutation detection.
- Transfer learning involves extracting meaningful latent representations from a pre-trained model for a new, similar goal. It is able to "transfer" knowledge from one domain (called the source) to another domain (called the target). In this way, the knowledge of the false positive filtering machine learning model of a certain sample purity can be used to reconstruct other sample purity models.
- the invention provides a false positive structural variation filtering method FPTLfilter (Filtering False Positive structural variants based on Transfer Learning) considering the diluted sequencing signal, and the input data is the structural variation candidate extracted from the result file of the existing structural variation detection tool Set feature data, and the output data is the structural variant set after filtering false positive structural variants.
- FPTLfilter Frtering False Positive structural variants based on Transfer Learning
- the present invention is based on the general consensus of the following academic circles:
- Tumor purity and clonal structure will cause the signal of structural variation to be detected to be diluted, the data information will change, the classification baseline obtained by training on fixed samples is no longer applicable, and lower sample purity can lead to false positive variant identification.
- a method for filtering false positive structural variants considering diluted sequencing signals of the present invention includes the following steps:
- Running existing structural variation detection tools from different sample purity data to detect structural variation in order to ensure that the range of candidate structural variation sets detected is large enough, a large number of false positive samples can be introduced, and a training set and test with balanced sample labels can be provided for the classification model.
- Set the filter condition threshold in the detection tool to the lowest level to obtain candidate sets of structural variants with different purities.
- the result file generated after the paired-end sequencing data generated by the second-generation sequencing technology is aligned with the reference genome sequence contains the alignment information of each read data, such as alignment position, alignment quality, sequence fragment and other information. This information is also included in the VCF (Variant Call Format) file of the structural variation detection result. If a certain information can reflect a certain attribute of the structural variation from some aspects, this information can be extracted as an effective feature for classification. Extracting features from the result file includes the following steps:
- n is the number of instances.
- the feature dataset corresponds to a corresponding label set representation category, where 1 represents the true positive structural variant class, 0 represents the false positive structural variant class, and the structural variant sample label dataset with purity p is represented as Y p , specifically:
- the present invention uses the migration model based on the migration learning method migration component analysis to perform data migration on the structural variation feature data sets of different purities, so as to shorten the distance between the data distributions of different purities. Specifically include the following steps:
- the structural variation feature set with a fixed purity p in the purity space is used as the target domain dataset D t , specifically:
- n 2 represents the number of samples in the target domain
- p is the sample purity of the target domain
- P is the set of samples of different purity.
- n 1 represents the number of samples in the source domain
- p j is the sample purity of the source domain
- the migration component analysis uses the maximum mean difference (English name: maximum mean discrepancy, English abbreviation: MMD) to measure the distance between the distributions of the two domains;
- the maximum mean difference distance DISTANCE(D s ,D t ) is calculated as follows:
- x i is the data in the source domain
- x j is the data in the target domain
- x i is the data in the source domain
- x j is the data in the target domain
- n 1 represents the number of samples in the source domain
- n 2 represents the number of samples in the target domain
- K s,s , K t,t are the Gram matrices defined on the source domain and target domain data in the embedding space respectively
- K s,t is the Gram matrix defined on the cross-domain data
- K t,s K s, t T .
- Z′ i is the set of all purity vectors for each new feature
- the transformation matrix W is used as the feature data set, and the corresponding label set is the original label set Y p , each candidate structural variation is represented by a vector x′ of 23 features in a row, and the label is the original label y.
- the present invention is based on an extreme random tree model. to train a classification model to predict true and false positive structural variants, including the following steps:
- Test set for each purity A training set corresponding to multiple purities other than itself
- the model trained on each training set uses the test set to classify the true and false structural variants, and obtains the label set of all purity samples Contains m-1 tag sets.
- Each purity prediction label set in the set ⁇ ' is valid data, and a single label cannot be used as the final classification result.
- the majority voting method is used to vote on m-1 purity prediction labels, and the result obtained by voting is all predictions
- the label with the most votes in the label set is used as the final predicted label set for the classification of true and false positive structural variants as follows:
- n is the number of samples with different purity.
- Prediction label set Y 'p are classified as true positive structural variants 1, classified as false positives 0 structural variants, structural variation of the filter tag 0 is classified as structural variants of true-positive result as the final output.
- the necessity of transfer learning is first tested, and the feature datasets before and after data transfer are respectively applied to the extreme decision tree classification model. Less and the label set contains the wrong label.
- the four metrics of accuracy, precision, recall and F1 value are used to measure the performance of the model.
- sample purities are 5%, 10%, 15%, 20%, 25%, 30% of the structural variant candidate set samples.
- the present invention innovatively uses transfer learning for data transfer of samples of different purity, and we first perform a transfer learning necessity test.
- Each purity structural variant candidate set is a balanced dataset containing 4000 samples, and the ratio of true positive and false positive class samples is 1:1.
- TCA represents the classification result using the transformation matrix obtained by the migration component analysis
- BASE represents the classification result of the extracted feature data. The true and false positive classification results are shown in Table 1.
- Table 1 Classification results of feature data before and after migration component analysis
- datasize100 (200, 300) respectively represents a single category of the three samples number, the x-axis represents the purity of the sample, and the y-axis represents the value; in Figure 3, the proportion10% (20%, 30%) represents the label error rate of the three samples respectively, the x-axis represents the purity of the sample, and the y-axis represents the value.
- FPTLfilter can accurately identify false-positive structural variants, adapt well at different purities, can significantly reduce false positives, and is very efficient and stable in low-purity samples.
- the present invention is a false positive structural variation filtering method considering diluted sequencing signals, which solves the problem that existing algorithms cannot be well applied to samples with different degrees of diluted sequencing signals. Since the migration component analysis is used to perform data migration for tumor samples of different purity, the present invention overcomes the sample characteristic data distribution interval caused by the dilution of the sample sequencing signal, so that the present invention can show good performance under different sample purities.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
数据集 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
准确度 | 73.00 | 81.00 | 81.00 | 87.00 | 84.00 | 90.00 | 86.00 | 99.00 |
召回率 | 88.00 | 90.00 | 78.00 | 90.00 | 96.00 | 94.00 | 90.00 | 98.00 |
F1值 | 76.52 | 82.57 | 80.41 | 87.38 | 85.71 | 90.38 | 86.54 | 98.99 |
精确度 | 67.69 | 76.27 | 82.98 | 84.91 | 77.42 | 87.04 | 83.33 | 100.00 |
Claims (10)
- 一种假阳性结构变异过滤方法,其特征在于,包括以下步骤:S1、从不同样本纯度数据运行已有的结构变异检测工具检测结构变异,将检测工具中的过滤条件阈值调整到最低,获取结构变异候选集;S2、以体现结构变异属性作为分类有效特征,从结果文件中特征提取;S3、将每个特征向量存为一行,作为一个实例用以表示其对应的候选结构变异,将纯度为p的结构变异样本特征数据集记为X p,纯度为p的结构变异样本标签数据集表示为Y p,结合以上特征和标签,将纯度空间里的所有结构变异候选集记为Η,使用基于迁移学习方法迁移成分分析的迁移模型来对不同纯度的结构变异特征数据集进行数据迁移,拉近不同纯度数据分布的距离,实现不同纯度的特征数据迁移;S4、不同纯度的结构变异特征数据集迁移后得到两个特征降维后的转换矩阵,含有23个列向量,将每个列向量作为一个特征,得到新的结构变异所有特征集合Θ',将转换矩阵W作为特征数据集,对应的标签集为原标签集Y p,每个候选结构变异用一行23个特征的向量x′表示,标签为原标签y,基于极端随机树模型训练分类模型,对真假阳性结构变异进行预测;S5、使用多数投票法对m-1个纯度的预测标签进行投票,投票得到的结果为所有预测标签集中票数最多的标签,将该结果作为真假阳性结构变异分类的最终预测标签集合Y' p;S6、预测标签集合Y' p中真阳性结构变异分类为1,假阳结构变异分类为0,过滤标签为0的结构变异,被归类为真阳性的结构变异作为最终输出结果,完成假阳性结构变异过滤。
- 根据权利要求1所述的方法,其特征在于,步骤S2具体为:S201、将所有纯度的集合纯度空间记为P,从不同纯度的结构变异检测结果文件中提取出全部的读数据相关信息;S202、对于每个候选结构变异,从全部信息中提取出26个特征,将所有特征集合记为Θ。
- 根据权利要求1所述的方法,其特征在于,步骤S3具体为:S301、将纯度空间中的固定纯度为p的结构变异特征集作为目标域数据集D t,纯度空间中的其他纯度为p j的结构变异特征集作为源域数据集D s;S302、迁移成分分析利用最大均值差异衡量两个域的分布的距离;S303、借用支持向量机核函数的思想求解最大均值差异距离;S304、根据(KLK+μI) -1KLK计算特征分解矩阵,并取前M个特征向量构造纯度p j到纯度p的特征数据转换矩阵W。
- 一种存储一个或多个程序的计算机可读存储介质,其特征在于,所述一个或多个程序包括指令,所述指令当由计算设备执行时,使得所述计算设备执行根据权利要求1至8所述的方法中的任一方法。
- 一种计算设备,其特征在于,包括:一个或多个处理器、存储器及一个或多个程序,其中一个或多个程序存储在所述存储器中并被配置为所述一个或多个处理器执行,所述一个或多个程序包括用于执行根据权利要求1至8所述的方法中的任一方法的指令。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010681632.4 | 2020-07-15 | ||
CN202010681632.4A CN111863135B (zh) | 2020-07-15 | 2020-07-15 | 一种假阳性结构变异过滤方法、存储介质及计算设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022011855A1 true WO2022011855A1 (zh) | 2022-01-20 |
Family
ID=72984289
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/120315 WO2022011855A1 (zh) | 2020-07-15 | 2020-10-12 | 一种假阳性结构变异过滤方法、存储介质及计算设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111863135B (zh) |
WO (1) | WO2022011855A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117096070A (zh) * | 2023-10-19 | 2023-11-21 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | 一种基于领域自适应的半导体加工工艺异常检测方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112927753A (zh) * | 2021-02-22 | 2021-06-08 | 中南大学 | 一种基于迁移学习识别蛋白质和rna复合物界面热点残基的方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109658983A (zh) * | 2018-12-20 | 2019-04-19 | 深圳市海普洛斯生物科技有限公司 | 一种识别和消除核酸变异检测中假阳性的方法和装置 |
CN109903815A (zh) * | 2019-02-28 | 2019-06-18 | 北京化工大学 | 基于特征挖掘的基因翻转变异检测方法 |
CN110084314A (zh) * | 2019-05-06 | 2019-08-02 | 西安交通大学 | 一种针对靶向捕获基因测序数据的假阳性基因突变过滤方法 |
US20200105373A1 (en) * | 2018-09-28 | 2020-04-02 | 10X Genomics, Inc. | Systems and methods for cellular analysis using nucleic acid sequencing |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012034251A2 (zh) * | 2010-09-14 | 2012-03-22 | 深圳华大基因科技有限公司 | 一种基因组结构性变异检测方法和系统 |
AU2017100960A4 (en) * | 2017-07-13 | 2017-08-10 | Macau University Of Science And Technology | Method of identifying a gene associated with a disease or pathological condition of the disease |
CN109280702A (zh) * | 2017-07-21 | 2019-01-29 | 深圳华大基因研究院 | 确定个体染色体结构异常的方法和系统 |
CN111326212B (zh) * | 2020-02-18 | 2023-06-23 | 福建和瑞基因科技有限公司 | 一种结构变异的检测方法 |
-
2020
- 2020-07-15 CN CN202010681632.4A patent/CN111863135B/zh active Active
- 2020-10-12 WO PCT/CN2020/120315 patent/WO2022011855A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200105373A1 (en) * | 2018-09-28 | 2020-04-02 | 10X Genomics, Inc. | Systems and methods for cellular analysis using nucleic acid sequencing |
CN109658983A (zh) * | 2018-12-20 | 2019-04-19 | 深圳市海普洛斯生物科技有限公司 | 一种识别和消除核酸变异检测中假阳性的方法和装置 |
CN109903815A (zh) * | 2019-02-28 | 2019-06-18 | 北京化工大学 | 基于特征挖掘的基因翻转变异检测方法 |
CN110084314A (zh) * | 2019-05-06 | 2019-08-02 | 西安交通大学 | 一种针对靶向捕获基因测序数据的假阳性基因突变过滤方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117096070A (zh) * | 2023-10-19 | 2023-11-21 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | 一种基于领域自适应的半导体加工工艺异常检测方法 |
CN117096070B (zh) * | 2023-10-19 | 2024-01-05 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | 一种基于领域自适应的半导体加工工艺异常检测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111863135B (zh) | 2022-06-07 |
CN111863135A (zh) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108038352B (zh) | 结合差异化分析和关联规则挖掘全基因组关键基因的方法 | |
Hanczar et al. | Ensemble methods for biclustering tasks | |
Yu et al. | Self-paced learning for k-means clustering algorithm | |
CN107292330A (zh) | 一种基于监督学习和半监督学习双重信息的迭代式标签噪声识别算法 | |
WO2022011855A1 (zh) | 一种假阳性结构变异过滤方法、存储介质及计算设备 | |
WO2023217290A1 (zh) | 基于图神经网络的基因表型预测 | |
CN111009321A (zh) | 一种机器学习分类模型在青少年孤独症辅助诊断中的应用方法 | |
CN112633601A (zh) | 疾病事件发生概率的预测方法、装置、设备及计算机介质 | |
Mukhopadhyay | Large-scale mode identification and data-driven sciences | |
CN111860656B (zh) | 分类器训练方法、装置、设备以及存储介质 | |
CN109376790A (zh) | 一种基于渗流分析的二元分类方法 | |
CN104200134A (zh) | 一种基于局部线性嵌入算法的肿瘤基因表数据特征选择方法 | |
CN110010204B (zh) | 基于融合网络和多打分策略的预后生物标志物识别方法 | |
Sudharson et al. | Enhancing the Efficiency of Lung Disease Prediction using CatBoost and Expectation Maximization Algorithms | |
Hao et al. | Vp-detector: A 3d multi-scale dense convolutional neural network for macromolecule localization and classification in cryo-electron tomograms | |
Yuan et al. | Self-organizing maps for cellular in silico staining and cell substate classification | |
CN109191452B (zh) | 一种基于主动学习的腹腔ct图像腹膜转移自动标记方法 | |
CN117195027A (zh) | 基于成员选择的簇加权聚类集成方法 | |
CN112287036A (zh) | 一种基于谱聚类的离群点检测方法 | |
CN109214466A (zh) | 一种基于密度的新型聚类算法 | |
Ashraf et al. | Iterative weighted k-NN for constructing missing feature values in Wisconsin breast cancer dataset | |
CN104778479B (zh) | 一种基于稀疏编码提取子的图像分类方法及系统 | |
US9569584B2 (en) | Combining RNAi imaging data with genomic data for gene interaction network construction | |
CN109272020B (zh) | 一种肌电数据中离群点的处理方法和系统 | |
Su et al. | Whole slide cervical image classification based on convolutional neural network and random forest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20944873 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20944873 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/08/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20944873 Country of ref document: EP Kind code of ref document: A1 |