CN116580848A - Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers - Google Patents
Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers Download PDFInfo
- Publication number
- CN116580848A CN116580848A CN202310538812.0A CN202310538812A CN116580848A CN 116580848 A CN116580848 A CN 116580848A CN 202310538812 A CN202310538812 A CN 202310538812A CN 116580848 A CN116580848 A CN 116580848A
- Authority
- CN
- China
- Prior art keywords
- data
- cancer
- head
- head attention
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000007246 mechanism Effects 0.000 title claims abstract description 46
- 239000000126 substance Substances 0.000 title claims abstract description 37
- 201000011510 cancer Diseases 0.000 claims abstract description 59
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 40
- 239000013598 vector Substances 0.000 claims description 31
- 239000011159 matrix material Substances 0.000 claims description 28
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000007500 overflow downdraw method Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 15
- 230000008506 pathogenesis Effects 0.000 abstract description 3
- 238000011156 evaluation Methods 0.000 description 11
- 210000004027 cell Anatomy 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 5
- 238000007621 cluster analysis Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000004083 survival effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 238000004393 prognosis Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 108010026552 Proteome Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000002775 capsule Substances 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 208000037051 Chromosomal Instability Diseases 0.000 description 1
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 1
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 1
- 208000018142 Leiomyosarcoma Diseases 0.000 description 1
- 208000032818 Microsatellite Instability Diseases 0.000 description 1
- 208000031839 Peripheral nerve sheath tumour malignant Diseases 0.000 description 1
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000008485 antagonism Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 239000003560 cancer drug Substances 0.000 description 1
- 238000004138 cluster model Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000002962 histologic effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 206010024627 liposarcoma Diseases 0.000 description 1
- 201000009020 malignant peripheral nerve sheath tumor Diseases 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 238000010197 meta-analysis Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 208000029974 neurofibrosarcoma Diseases 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000008844 regulatory mechanism Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers, which comprises the following steps: s1, collecting and preprocessing data of multiple groups of cancer data; s2, completing a classification task of multiple groups of cancer data by adopting a supervised multi-head attention model; and S3, learning by adopting an decoupling comparison learning model based on a multi-head attention mechanism to complete the clustering task of the cancer multi-group data. The invention can obtain better effect on classification task and clustering task, and can analyze pathogenesis of cancer with clinical information.
Description
Technical Field
The invention relates to the technical field of artificial intelligence and bioinformatics, in particular to a method for analyzing multiple groups of chemical data of cancers based on a multi-head attention mechanism.
Background
With the development of high throughput sequencing technology, the era of precise medicine has arrived. A large amount of biomedical data presents an explosive growth and is collected and consolidated in a common database. Extensive efforts such as cancer genetic profiling (TCGA) have accumulated the genome, transcriptome, proteome and clinical data of more than 20 cancers from thousands of patients [1]. The rich data can help researchers understand the heterogeneity of captured biological processes and phenotypes from different perspectives. However, high throughput sequencing techniques acquire a large amount of data, a small number of samples, a large amount of noise between the data, and a large difference between the platforms. Thus, extracting valuable information from high-throughput data presents a significant challenge.
A single group of study can theoretically and efficiently carry out accurate analysis on the study object. Currently, single-group science has become an important research means in the field of life science, and has been widely used in genomics and proteomics. With the development of research, in order to gain insight into the interrelationship and regulatory mechanisms between molecules in organisms, multiple sets of chemical analyses have integrated genomics in an unbiased manner, epigenetic, transcriptomic, and proteomic systems to analyze the mechanisms and phenotypes of the living system. At present, multiple sets of chemical data have become hot spots for research in many fields, such as cancer research, drug development, agriculture and environmental fields, and software and tools for analysis of multiple sets of chemical data, such as R-package limma, DESeq2, edge, etc., and software such as metaanalysis, proteome Discoverer, etc., have also appeared. Second, some researchers have developed various methods of multi-set data processing, such as: the multi-core learning method comprises Bayesian consensus clustering and dimension reduction based on machine learning.
Some algorithms for deep learning have recently been widely used in the study of multiple sets of chemical data. Some researchers have proposed 16 representative deep learning methods to classify and cluster multiple sets of chemical data, including Fully Connected Neural Networks (FCNN), convolutional Neural Networks (CNN), graphic neural networks (GCN), self-encoders (AE), capsule networks (capsule net), and Generation of Antagonism Networks (GAN), among others. Some researchers have proposed an end-to-end multi-modal deep learning model (scMDC) that characterizes different data sources and co-learns the deep embedded potential features for cluster analysis. Some researchers have proposed a unified multi-discipline data multitasking deep learning framework (omi embedded) that supports dimension reduction, multi-discipline integration, tumor type classification, phenotypic feature reconstruction and survival prediction. Some researchers have proposed an extensible and interpretable multi-set of chemical deep learning frameworks (deeplomix) for use in survival analysis of cancer. It is used to extract the relationship between clinical survival time and multiple sets of study data based on a deep learning framework to predict prognosis. Some researchers propose a neural network method based on multiple-input multiple-output deep challenge learning, accurately model complex data, and identify molecular subtypes of tumor samples using consensus clustering and a gaussian mixture model. Some researchers have proposed using the field component analysis (NCA) algorithm to select relevant features from multiple sets of data retrieved from TCGA and cancer drug susceptibility Genomics (GDSC) databases and develop survival and predictive models. Second, there are also some deep learning and machine learning methods applied to the diagnosis and prognosis of tumor subtypes.
Disclosure of Invention
The invention aims to provide a multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers, which overcomes the defects in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a method for analyzing multiple sets of biological data for cancer based on a multi-headed attentional mechanism, comprising the steps of:
s1, collecting and preprocessing data of multiple groups of cancer data;
s2, completing a classification task of multiple groups of cancer data by adopting a supervised multi-head attention model;
and S3, learning by adopting an decoupling comparison learning model based on a multi-head attention mechanism to complete the clustering task of the cancer multi-group data.
Further, the step S1 specifically includes:
s11, carrying out normalization operation on multiple groups of cancer data, and carrying out unified operation on the dimensions of different data;
s12, combining and integrating data features from different groups, and disturbing the sequence of sample data to add noise to the samples to generate training data.
Further, the step of generating the supervised multi-head attention model in the step S2 is as follows:
s21, designing a multi-head attention encoder;
s22, creating symmetrical multi-head attention encoders based on the multi-head attention encoders;
s23, creating a supervised multi-head attention model based on the symmetrical multi-head attention encoder.
Further, the step S21 includes:
s211, carrying out position coding on the cancer multiunit data, and reserving the relation among all positions in the sequence;
s212, extracting features by adopting a symmetrical multi-head attention mechanism;
s213, carrying out head separation calculation on the multiple groups of the learning data features by adopting a multi-head attention mechanism;
s214, performing multi-group self-attention processing on the original input sequence, and then splicing the results of each group of attention together for linear transformation once to obtain a final output result.
Further, the step S22 specifically includes: feature sharing of multiple groups of the learning data is achieved by sharing a weight matrix, feature extraction is performed in a symmetric multi-head self-attention encoder, the learned weight features share weights in a feature map, in back propagation, since the weight matrix is shared, the symmetric multi-head attention encoder updates the weight gradient with the same value, and two independent multi-head attention encoders are connected in parallel to obtain the symmetric multi-head attention encoder.
Further, the step S23 specifically includes:
s231, extracting the characteristics of multiple groups of chemical data by adopting a symmetrical multi-head attention mechanism encoder to generate a characteristic matrix W 1 And W is 2 ;
S232, adopting a feature fusion method of multiplying elements by elements to obtain a feature matrix W 1 And W is 2 The features of the (a) are multiplied element by element to obtain a fused feature vector;
s233, sending the fused feature vectors into a three-layer perceptron for normalization operation, projecting the feature vectors into a new feature space, and generating a new feature matrixCalculating an error between the single prediction sample and the label by adopting a cross entropy loss function;
s234, calculating the feature matrixAnd the distance between the tags, the total loss function L is obtained.
Further, the formula for calculating the error between the single prediction sample and the label using the cross entropy loss function in step S233 is:
the calculation formula of the total loss function L in step S234 is as follows:
further, the step S3 specifically includes:
s31, projecting the projection in the step S233 to a new feature spaceMiddle->As a training positive sample, n-1 pairs are added +.>As a training negative sample n-1 pairs, the similarity of paired samples is measured by cosine distance:
in the formula, i, j E [1, N]For the purpose of calculationAnd->Error of each view in a database, creating a cross entropy loss functionThen the loss function between positive and negative samples is:
wherein k is [1,2], and τ is a temperature parameter in the model that controls softness;
s32, removing the dead pairs from denominators by adopting a decoupling comparison learning method to realize decoupling comparison learning, wherein the process is as follows:
s33, enhancing all data by calculatingObtaining cross entropy loss of decoupling comparison learning, and enabling a model to identify all positive samples in a data set, wherein the cross entropy loss comprises the following steps:
s34, feature matrixAnd->The cosine similarity is also used to calculate the error between a pair of samples, as follows:
in the formula, i, j E [1, M]For the purpose of calculationAnd->Error of each view in the list, create a cluster penalty function +.>The loss function between each pair of positive and negative samples can be expressed as:
s35, through learning of all positive and negative sample pairs, the total loss function is expressed as:
in the method, in the process of the invention,is the entropy of the subtype cluster allocation probability, outputting most of the label features after each loss calculation.
Further, the feature spaceClustering samples by using decoupling comparison loss function to realize output of clustering labels, wherein the feature space is +.>And->The characteristics use the clustering loss function to calculate and cluster the sample, in order to realize the output of the clustering characteristic, in the clustering task, the total loss function is:
L=L D +L C 。
compared with the prior art, the invention has the advantages that:
1. the multi-headed attention mechanism model (SMA) supervised in the present invention achieves 100% accurate subtype classification on simulated single cell and cancer multi-set of chemical data sets.
2. The invention learns multiple groups of chemical data characteristics through an decoupling comparison learning model (DMACL) and clusters and identifies the subtype of the cancer, and the unsupervised comparison learning method carries out subtype analysis by calculating the similarity among multiple groups of chemical data samples. The DMACL model shows significant advantages over the 16 deep learning models.
3. The invention can obtain better effect on classification task and clustering task, and can analyze pathogenesis of cancer with clinical information.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a multi-headed self-attention encoder frame of the present invention.
FIG. 2 is the performance of seven supervised methods of classifying cancer benchmark datasets used in the tasks of the present invention.
FIG. 3 is a graph of C-index, silhouette score and Davies Bouldin score of 11 unsupervised methods of the present invention on a single cell multicellular multi-set of data. Based on the cluster analysis of the single cell dataset, three internal indices C-index, silhouette score, and Davies Bouldin score (a, b, C) were calculated.
FIG. 4 is a C-index of 11 unsupervised methods on a cancer reference dataset for a clustering task according to the present invention.
FIG. 5 is a Davies Bouldin score on a cancer benchmark dataset used in a clustering task with 11 unsupervised methods of the present invention.
FIG. 6 is a Davies Bouldin score of 11 unsupervised methods on a cancer reference dataset used in a clustering task according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Referring to fig. 1 and 2, the present embodiment discloses a method for analyzing multiple sets of chemical data of cancer based on a multi-head attention mechanism, comprising the following steps:
and S1, collecting and preprocessing the multiple sets of cancer data.
Specifically, step S1 includes the steps of:
and S11, carrying out normalization operation on the cancer multi-group data, and carrying out unified operation on the dimensions of different data so as to facilitate subsequent feature extraction.
And step S12, combining and integrating data features from different groups, wherein the integration purpose is to improve the coverage rate of the data, increase the information quantity of the data and improve the interpretability of the data. Then, the sequence of the sample data is disturbed, noise is added to the samples, and training data is generated.
And S2, completing the task of classifying the multiple groups of cancer data by using a supervised multi-head attention model (SMA).
Specifically, step S2 includes the steps of:
step S21, designing a multi-head attention encoder.
Wherein, step S21 includes:
in step S211, since the multiple sets of chemical data need to be sent into the slice framework for feature extraction and data dimension reduction, the model is not processed according to the order of the multiple sets of chemical data, so that the multiple sets of chemical data of the cancer need to be position-coded, and the relation between the positions in the sequence is preserved.
In step S212, through linear transformation of position coding, the attention mechanism module can better capture the relationship between different positions in the sequence data, so as to improve the performance of the model. In the feature extraction section, the present embodiment performs feature extraction using a symmetrical multi-head attention mechanism. The present embodiment uses a fully connected layer to implement linear transformation of input, and the process can be expressed as:
y pe =x pe W+b
wherein x is pe The code vector representing each position, W being the characteristic weight of the data, b being each characteristic weightAnd (5) a bias vector. Through linear transformation of position coding, the attention mechanism module can better capture the relation between different positions in the sequence data, so that the performance of the model is improved.
In the feature extraction section, the present embodiment performs feature extraction using a symmetrical multi-head attention mechanism. Tensor matrix after multiple groups of chemical data cancat is W mn . Multi-head attention mechanism will W mn And (5) carrying out head-dividing calculation on the characteristics. Herein, W is mn Along the last dimension, a number of small feature vectors are divided, each called a head, the number of heads h of the multi-head attention mechanism being set to 80. For each head, it is necessary to calculate its weight for other attentiveness using a dot product attentiveness mechanism, the vector of its self-attentiveness output is:
wherein Q, K respectively represent the feature matrix output by the same head, and V represents the feature matrix obtained by another head.Is the dimension of the matrix and can be used to reduce the dimension after the feature matrix product. The multi-head attention mechanism is characterized in that the original input sequence is subjected to a plurality of groups of self-attention processing procedures; and then splicing the results of each group of attention together for linear transformation once to obtain a final output result. The process of its calculation can be expressed as:
MultiHead(Q,K,V)=Concat(head 1 ,…head n )W o
step S213, the multi-group learning data feature is processed by head separation calculation by adopting a multi-head attention mechanism. In this embodiment, the feature is divided into a plurality of small feature vectors along the last dimension, each small feature vector is called a header, and the header number h of the multi-header attention mechanism is set to 80. For each head, a dot product attention mechanism needs to be used to calculate its attention weight for the others. The multi-headed attention mechanism builds an attention layer according to the h-size. During forward pass, the feature matrix is fed into the input layer of the feedforward module. The neurons of each input layer correspond to the columns of each feature matrix, i.e., one feature. Each neuron weights and biases its input, then calculates the output by activating the function, and passes the output to the next layer of neurons, and finally the output layer outputs the feature matrix.
In step S214, the multi-head attention mechanism performs multiple groups of self-attention processing on the original input sequence, and then splices the results of each group of attention together for linear transformation to obtain the final output result.
Step S22, a symmetrical multi-head attention encoder is created based on the multi-head attention encoder. The method comprises the following steps:
feature sharing of multiple sets of chemical data may generally be achieved by sharing a weight matrix. Since feature extraction is performed in a symmetric multi-headed self-attention encoder, the learned weight features are identical, and thus weights can be shared in the feature map. Second, in back propagation, since the weight matrix is shared, the symmetric multi-headed attention encoder can update the weight gradient with the same value.
Step S23, a supervised multi-head attention model is created based on the symmetrical multi-head attention encoder. The method specifically comprises the following steps:
in step S231, interactions and relationships between different types of data can be found and key components and pathways in the biological system can be identified through multiple sets of mathematical data classification. The comprehensive analysis can provide important clues and insights for researching pathogenesis of complex diseases, searching new therapeutic targets and the like. The three data sets provided by the experiment already contain labels for all samples. Thus, the present experiment suggests using a supervised multi-headed attention mechanism (SMA) modelTypes are used to classify cancer types. The experiment adopts a symmetrical multi-head attention mechanism encoder to extract the characteristics of multiple groups of chemical data and generate a characteristic matrix W 1 And W is 2 。
Step S232, adopting a feature fusion method of multiplying elements by elements to obtain a feature matrix W 1 And W is 2 The method can highlight the unique features of the encoder to improve the performance and generalization capability of the model.
Step S233, sending the fused feature vectors into a three-layer perceptron (MLP) for normalization operation, projecting the feature vectors into a new feature space, and generating a new feature matrixThe error between the single prediction sample and the label is calculated by adopting the cross entropy loss function, and the process is as follows:
step S234, after calculating the feature matrixAnd the distance between the tags, the total loss function L is obtained.
And S3, learning by adopting an decoupling comparison learning model (DMACL) based on a multi-head attention mechanism to complete the clustering task of the cancer multi-group data.
Clustering of cancer subtypes aims to separate similar cancer samples into the same subtype and minimize the differences between different subtypes in order to better understand the biological characteristics and molecular mechanisms of cancer, providing better diagnosis, treatment and prognosis for patients. Unsupervised decoupling comparison learning can greatly improve matching similarity, and in the task of cancer typing, in view of the fact that available labels are not provided by experiments, both positive samples and negative samples are composed of pseudo labels generated by data enhancement.
S31, projecting the projection in the step S233 to a new feature spaceMiddle->As a training positive sample, n-1 pairs are added +.>As a training negative sample n-1 pairs, the similarity of paired samples is measured by cosine distance:
in the formula, i, j E [1, N]For the purpose of calculationAnd->Error of each view in the picture, creating a cross entropy loss function +.>Then the loss function between positive and negative samples is:
where k.epsilon.1, 2, τ is the temperature parameter in the model that controls softness, and in general, the negative-positive coupling (NPC) multiplier in the cross entropy loss (InfoNCE) tends to affect the results of model training, and the following two cases occur. Positive samples near the first, anchor point would be considered important information because they are the only positive samples we have. At the same time, the gradient of the negative sample is gradually reduced. Second, when the negative samples are far apart and less informative, the model may erroneously decrease the learning rate from the positive samples. This means that the model will emphasize the negative samples even more than consider the information of the positive and negative samples in balance. This may lead to errors in the model in processing the positive samples, thereby reducing the accuracy of the model.
Step S32, removing the dead pairs from denominators by adopting a decoupling comparison learning method to realize decoupling comparison learning, wherein the process is as follows:
step S33, after enhancing by calculating all dataObtaining cross entropy loss of decoupling comparison learning, and enabling a model to identify all positive samples in a data set, wherein the cross entropy loss comprises the following steps:
the concept of "labels, i.e. representations", is most common in comparative clusters. The basic idea of this approach is to encode the labels as feature vectors and input them into a cluster model for training along with the feature vectors of the data points. The clustering problem can be converted into a contrast learning problem by embedding the labels into the feature space. I.e., data points within the same cluster should be closer in feature space, while data points between different clusters should be farther apart in feature space. This allows the clusters to which data points belong to be determined by comparing the similarity between them.
Step S34, feature matrixAnd->The cosine similarity is also used to calculate the error between a pair of samples, as follows:
in the formula, i, j E [1, M]For the purpose of calculationAnd->Error of each view in the list, create a cluster penalty function +.>The loss function between each pair of positive and negative samples can be expressed as:
step S35, through learning of all positive and negative sample pairs, the total loss function is expressed as:
in the method, in the process of the invention,is the entropy of the subtype cluster allocation probability, outputting most of the label features after each loss calculation.
The characteristics use the decoupling comparison loss function to perform clustering operation on the samples, so that output of clustering labels is realized. />And->Feature sample entry using cluster loss function computationAnd performing row clustering operation, so as to realize the output of clustering features. The clustering model is an end-to-end training and prediction, so that in the training process of the model, the simultaneous optimization of the decoupling comparison loss function and the clustering loss function is ensured, and finally in the clustering task, the total loss function is as follows:
L=L D +L c 。
the invention is further illustrated by the following examples:
the present example compares the classification performance of SMA with the method of classification of 6 common histology data: (1) lfNN model: each histology vector is stitched into a feature vector as input to the model, and multiple neural networks perform feature extraction and use Softmax as the final layer of output classification. (2) efNN model: each histologic vector serves as an input to the model, and multiple neural networks perform feature extraction and connect the outputs into one vector using Softmax as the final layer of output classification. (3) lfCNN model: it is similar to efNN, with the addition of convolution and pooling layers in the lfCNN model. And after the plurality of histology vectors are connected into one feature vector, the feature vector is sent into a convolution layer and a pooling layer, the output features are flattened into and out of a fully-connected network, and final prediction is carried out. (4) efCNN model: it is similar to lfNN. Each histology vector is input to the convolutional and pooling layers and the output features are flattened, connected and fed into a fully connected neural network for final prediction. (5) moGCN model: it uses the GCN to learn the characteristics of the omics data and perform classification tasks. To perform group-specific classification, a multi-layer GCN needs to be built for each group data type. (6) moGAT model: the GAT in the moGCN model replaces the GCN to obtain the moGAT model. In the test method, the lfNN, efNN, lfCNN, efCNN, moGCN, moGAT, SMA model is trained with a direct concatenation of pre-processed multiple sets of mathematical data as inputs, all models being trained using the same pre-processed data. The classification effect of all models can be referred to the data of table 1 under both equivalent and heterogenic conditions in the simulated dataset.
Table 1 7 performance of the supervision method with all clusters being the same size
Samples of 5clusters of random sizes,10clusters of random sizes,15clusters of random sizes were experimentally selected. These seven supervised approaches are essentially designed for sample classification, which classifies samples of a true cluster (subtype). In the classification task, to quantitatively evaluate the 7 methods of supervised models, we used a simple random cross-validation method to train models and test models. Meanwhile, all models apply three evaluation indexes of Accurcry, F1 macro and F1 weighted to measure the classification performance. From table 2, the efNN, moGCN, moGAT, SMA model was obtained in the classification task of multiple school groups. The efCNN model is significantly lower than the performance of the other 6 models in the sample classification task of 15clusters of random sizes. The lfNN model was significantly lower in the sample classification task of 10clusters of random sizes than the other 6 models. This may be because the simulated data set, after passing through the multi-layer convolution and pooling layers, has a model over-fitting phenomenon, resulting in a false positive of the model in classification. The lfNN model only achieves the best effect in the sample classification task of 5clusters of random sizes. This may be that the lfNN model fails to learn the multiple-learning features during the feature extraction process, and misjudgment occurs.
In the classification task, this example will investigate the performance of lfNN, efNN, lfCNN, efCNN, moGCN, moGAT and SMA models in single cell datasets, similar to the method by which these models are evaluated in the simulated dataset. All models used a simple cross-validation method to classify samples of three cancer cell lines and performance of the classification was measured by three evaluation indicators, accuracy, F1 macro, F1 weighted.
Table 2 performance of six supervised methods on single cell multicellular multiple sets of mathematical data
As described in table 2, both lfNN, efNN, moGCN, moGAT and SMA models were found to peak on all three evaluations of Accuracy, F1 macro, F1 weighted, which showed that these models achieved optimal performance on the classification task. The lfCNN and efCNN models still have no other models with good effect in the test, and one of the main reasons is probably that the number of layers of the convolution layer and the pooling layer is relatively small, and the effect of feature extraction is not obvious enough. Another reason may be that no regularization penalty is added to the model, or some other reason not limited to improper learning rate, improper batch size, etc.
In the classification task, experiments were chosen on five datasets with true cancer subtypes, similar to lfNN, efNN, lfCNN, efCNN, moGCN, moGAT and SMA models in the methods of modeling the dataset and single cell dataset evaluation model. These methods classify true cancer subtype samples. All model training and testing used a simple cross-validation method and the performance of the classification was measured by three evaluation indexes, accuracy, F1 macro, F1 weighted. For each cancer dataset we selected three sets of data samples, 59, 272, 206, 144 and 198 samples of BRCA, GBM, SARC, LUAD and STAD, respectively. Five cancer subtypes, luminalA, luminalB, basal-like, normal-like, HER2-enriched, are included in BRCA. GBM includes 4 cancer subtypes, respectively, proserual, classification, mesenchymal, neuroal. SARC also includes five cancer subtypes, dediferentiated liposarcoma, leiomyosarcoma, undiferentiated pleomorphic sarcoma, myxobrarcoma, malignant peripheral nerve sheath tumor, synovial sarca, respectively. LUAD includes four cancer subtypes, formerly bronchioid, formerly squamoid, forming magnetic. STAD includes Epstein-Barr virus, microsatellite instability, genomically stable, chromosomal instability.
As shown in FIG. 3, the SMA model classifies subtypes of cancer in BRCA, GBM, SARC and STAD to obtain three evaluation indexes of Accuracy, F1 macro and F1 weighted, which all reach 1, so as to achieve the performance of accurate classification. However, when classifying the LUAD subtypes, only the three evaluation indexes of Accuracy, F1 macro, and F1 weighted respectively obtain high performance of 0.958,0.93,0.91. Compared with other models, the SMA model has better classification effect in classification task than other models. This is possible because the SMA model is able to focus on the positional information of the input sequence at the same time, thereby enabling global information to be captured. Secondly, the SMA model has deeper structure and more parameters, and can help learn more characteristics, so that the classification accuracy is improved. Thus, SMA models can be used as a standard method of classifying multiple sets of chemical data for cancer.
The clustering performance of the DMACL model is compared with a common method for clustering 10 kinds of histology data: (1) lfAE, namely, multiple groups of chemical data are connected into a feature vector, and then AE formed by an encoder and a decoder is subjected to feature clustering. Wherein the ReLu function is used for the activation function of all layers of the encoder and the intermediate layers of the decoder and tanh is used for the last layer of the decoder. (2) efAE: it is similar to the lfAE model, and only when processing multiple sets of data, the AE extracts features simultaneously for multiple sets of data, respectively. (3) lfDAE: the lfDAE will process the vector features of each omic data independently. The partially corrupted data is constructed by adding noise to the input data and restored to the original input data by encoding and decoding. (4) efDAE: the efDAE will process the vector features of the spliced sets of mathematical data. The other steps thereafter are the same as lfDAE. (5) lfVAE: similar to the efAE model, multiple groups of the mathematical data are spliced into one-dimensional feature vectors, and then feature clustering analysis is carried out on the VAE (compared with AE, the potential vectors of the VAE closely follow unit Gaussian distribution). (6) efVAE: it is similar to the lfVAE model, but at the input of the model, each of the histology data allows the VAE to perform a feature cluster analysis, respectively. (7) lfSVAE: in contrast to lfVAE, this model replaces VAE with only SVAE (SVAE is a stacked VAE model in which all hidden layers follow a unit gaussian distribution) and the rest is unchanged. (8) efSVAE: each hidden layer of the encoder is fully connected to two output layers, the sampling step being identical to the VAE. In the evaluation, a multiplier similar to β -VAE was added to the loss function. (9) lfmmdVAE: it is similar to lfVAE, which is used to train the omics data and finally classify the characteristics of the multi-set of mathematics integration. (10) efmmdVAE: one VAE is also used for training the omics data. The other parts are identical except that the loss function is different from the efVAE.
In the clustering task, the experiment uses a model to perform feature extraction on the simulated multi-set of chemical data to obtain 5-dimensional, 10-dimensional and 15-dimensional embeddings. The embedded dimension is set according to the number of clusters in the simulated multi-set of chemical data. And then, clustering the dimension reduction results of the multiple groups of the mathematical data by adopting a k-means algorithm. Sample clusters were finally obtained to compare the performance of eleven unsupervised methods.
In the clustering task of the simulation dataset, the embodiment firstly uses the C-index evaluation index to measure the consistency between the clusters fused by the multiple groups of the chemical data and the real clusters. The lower the C-index, the smaller the distance between the clustering samples, and the better the clustering effect of the model. As can be seen from the experimental results summarized in table 3, most of the clustering methods have better clustering performance. However, the DMACL model reaches 0.002,0.022,0.023 for each of the values of C-index under the condition that the clusters have a random size, and then 0.005,0.021,0.014 for each of the values of C-index obtained by the DMACL model under the condition that the clusters have the same size. It exceeds the other models in this evaluation index. This may be because when the multi-head attention mechanism is extracting multiple sets of chemical data, it is more focused on the local information of the data to extract more significant data bits. Secondly, the DMACL model clustering effect is still good with the increase of the clustering quantity.
Table 3 simulates C-index for eleven unsupervised methods on the dataset.
Silhouette score is obtained by calculating the contour coefficient of each sample, measuring how well a sample is assigned to the correct cluster. From table 4, we find that the probability of the efVAE model assigning samples into the correct clusters is high. The DMACL model only obtains the ranks 3, 5, and 7, respectively, if the clusters are the same size. The reasons for poor clustering of DMACL models may be poor sample quality of the simulated data, noise in the data set, outliers, etc. Second, the imbalance in the size distribution of the individual clusters in the dataset may also result in a lower value for Silhouette score. Second, silhouette score itself has certain limitations such as inaccurate assessment of clustering effects for density non-uniformity.
Table 4 11 unsupervised methods Silhouette score on simulated dataset.
From Table 5, it can be found that the efVAE model achieves a lower Davies Bouldin score in the clustering of the simulation data. This may be because the VAE encodes the input data into potential vectors and then generates new data from the potential vectors to learn the distribution of the data. The DMACL model only obtains the 3 rd, 6 th and 6 th ranks, respectively, on condition that the clusters have random sizes. The reason the Davies Bouldin score is not high may be because the data features in the simulated data set are not significant enough, resulting in a multi-headed attentiveness mechanism that cannot extract valid features. Second, it can also be found that the number of clusters is also one of the reasons for influencing the Davies Bouldin score.
Table 5 Davies Bouldin score on simulated dataset for 11 unsupervised methods
And (3) evaluating the DMACL model in single-cell data, and for a clustering task of a single-cell data set, performing feature fusion on multiple groups of chemical data by all models to obtain fused two-dimensional embedding. And then using a k-means algorithm to perform dimension reduction and clustering on the multi-learning data. The performance of eleven unsupervised methods was finally compared by obtaining the results of class 1 clusters. Experiments used C-index, silhouette score, and Davies Bouldin score to evaluate the effect of model clustering. As shown in fig. 3, the DMACL model obtained the lowest C-index value and Davies Bouldin score, higher silhouette score when clustering samples. Therefore, the DMACL model becomes the best model for single cell dataset clustering. This is probably because single cell data has information of long sequences, and DMACL models have a multi-headed attention mechanism to handle long sequences, thereby reducing the occurrence of gradient extinction and gradient explosion during model training. In a word, the DMACL model can better capture the characteristics of single cell data, so that the accuracy of clustering is improved.
The cancer multi-group data has the characteristics of high dimensionality, diversity, noise and the like. For clustering tasks, eleven unsupervised models are first used to fuse multiple sets of cancer data to obtain 10-dimensional embedments. And then clustering the multiple groups of the chemical data by adopting a k-means algorithm. Experiments with a number of clusters from 1 to 7 were explored, as the best number of clusters was not determined. And finally, carrying out cluster analysis on the samples by using an unsupervised model. In evaluating the self-supervised clustering model, the performance of the model was measured using the C-index, silhouette score, and Davies Bouldin score evaluation index. As shown in fig. 4, in the clustering experiments of all models, the C-index of the DMACL model was mainly concentrated in the middle part of the radar chart. According to the coordinates of the radar map, the closer the data is to the center point, the smaller the value. Thus, the C-index value of the DMACL model reflects an almost exact clustering of clustered samples. This is probably because the DMACL model has a strong generalization ability, which can help the model capture more features, thereby improving the effects of feature extraction and data dimension reduction. The efmmdVAE, the efVAE and the lfmmdVAE models have good clustering effect and can be used as a reference model of a plurality of groups of cancer data sets.
From fig. 5, it was found that Silhouette scores of the DMACL model takes larger values on most of the cancer multi-set of data, which are distributed mainly on the outer circle of the radar chart. The DMACL model only achieves lower values on the 2 cluster tasks of SKCM and lucc. This may be the case when the structure of the cancer multi-cluster data is complex and the number of data points is large, the 2 clusters may be used to generate an under-fitting condition, that is, the essential features of the data set cannot be captured well, so that some confusing data points may exist between the two clusters after segmentation.
Davies Bouldin scores is also an important evaluation index for analyzing the clustering effect of the DMACL model. Therefore, we also use Davies Bouldin scores to measure the performance of the model. As shown in fig. 6, davies Bouldin scores takes the smallest value among the 2 clusters and the 3 clusters. In clusters 4, 5 and 6, the DMACL model had a poorer clustering effect on LUCS and LIHC. It may be that when the structure of the multi-set data set itself is complex and the data points are small, the use of 4 clusters, 5clusters and 6 clusters may be over-fitted, i.e. the division of the data set into three clusters may lead to some unnecessary subdivision which does not reflect the essential features of the data set well, resulting in poor clustering.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the patentees may make various modifications or alterations within the scope of the appended claims, and are intended to be within the scope of the invention as described in the claims.
Claims (9)
1. A method for analyzing multiple sets of biological data for cancer based on a multi-headed attentional mechanism, comprising the steps of:
s1, collecting and preprocessing data of multiple groups of cancer data;
s2, completing a classification task of multiple groups of cancer data by adopting a supervised multi-head attention model;
and S3, learning by adopting an decoupling comparison learning model based on a multi-head attention mechanism to complete the clustering task of the cancer multi-group data.
2. The method for analyzing multiple sets of chemical data for cancer based on the multi-head attentiveness mechanism as claimed in claim 1, wherein said step S1 specifically comprises:
s11, carrying out normalization operation on multiple groups of cancer data, and carrying out unified operation on the dimensions of different data;
s12, combining and integrating data features from different groups, and disturbing the sequence of sample data to add noise to the samples to generate training data.
3. The method for analyzing multiple sets of chemical data for cancer based on the multi-head attention mechanism according to claim 1, wherein the step of generating the supervised multi-head attention model in step S2 is:
s21, designing a multi-head attention encoder;
s22, creating symmetrical multi-head attention encoders based on the multi-head attention encoders;
s23, creating a supervised multi-head attention model based on the symmetrical multi-head attention encoder.
4. The method for analyzing multiple sets of chemical data for cancer based on the multi-head attentiveness mechanism as claimed in claim 3, wherein said step S21 includes:
s211, carrying out position coding on the cancer multiunit data, and reserving the relation among all positions in the sequence;
s212, extracting features by adopting a symmetrical multi-head attention mechanism;
s213, carrying out head separation calculation on the multiple groups of the learning data features by adopting a multi-head attention mechanism;
s214, performing multi-group self-attention processing on the original input sequence, and then splicing the results of each group of attention together for linear transformation once to obtain a final output result.
5. The method for analyzing multiple sets of chemical data for cancer based on the multi-head attentiveness mechanism as claimed in claim 3, wherein said step S22 is specifically: feature sharing of multiple groups of the learning data is achieved by sharing a weight matrix, feature extraction is performed in a symmetric multi-head self-attention encoder, the learned weight features share weights in a feature map, in back propagation, since the weight matrix is shared, the symmetric multi-head attention encoder updates the weight gradient with the same value, and two independent multi-head attention encoders are connected in parallel to obtain the symmetric multi-head attention encoder.
6. The method for analyzing multiple sets of chemical data for cancer based on the multi-head attentiveness mechanism as claimed in claim 3, wherein said step S23 is specifically:
s231, extracting the characteristics of multiple groups of chemical data by adopting a symmetrical multi-head attention mechanism encoder to generate a characteristic matrix W 1 And W is 2 ;
S232, adopting a feature fusion method of multiplying elements by elements to obtain a feature matrix W 1 And W is 2 The features of the (a) are multiplied element by element to obtain a fused feature vector;
s233, sending the fused feature vectors into a three-layer perceptron for normalization operation, projecting the feature vectors into a new feature space, and generating a new feature matrixCalculating an error between the single prediction sample and the label by adopting a cross entropy loss function;
s234, calculating the feature matrixAnd the distance between the tags, the total loss function L is obtained.
7. The method for analyzing multiple sets of cancer data based on the multi-head attention mechanism according to claim 6, wherein the formula for calculating the error between the single prediction sample and the label using the cross entropy loss function in step S233 is:
the calculation formula of the total loss function L in step S234 is as follows:
8. the method for analyzing multiple sets of chemical data for cancer based on the multi-head attentiveness mechanism as claimed in claim 6, wherein said step S3 specifically comprises:
s31, projecting the projection in the step S233 to a new feature spaceMiddle->As a training positive sample, n-1 pairs are added +.>As a training negative sample n-1 pairs, the similarity of paired samples is measured by cosine distance:
in the formula, i, j E [1, N]For the purpose of calculationAnd->Error of each view in the picture, creating a cross entropy loss function +.>Then the loss function between positive and negative samples is:
wherein k is [1,2], and τ is a temperature parameter in the model that controls softness;
s32, removing the dead pairs from denominators by adopting a decoupling comparison learning method to realize decoupling comparison learning, wherein the process is as follows:
s33, enhancing all data by calculatingObtaining cross entropy loss of decoupling comparison learning, and enabling a model to identify all positive samples in a data set, wherein the cross entropy loss comprises the following steps:
s34, feature matrix The cosine similarity is also used to calculate the error between a pair of samples, as follows:
in the formula, i, j E [1, M]For the purpose of calculationAnd->Error of each view in the list, create a cluster penalty function +.>The loss function between each pair of positive and negative samples can be expressed as:
s35, through learning of all positive and negative sample pairs, the total loss function is expressed as:
in the method, in the process of the invention,is the entropy of the subtype cluster allocation probability, outputting most of the label features after each loss calculation.
9. The multi-head attention mechanism based method of analyzing multiple sets of cancer data of claim 8,
the feature spaceClustering samples by using decoupling comparison loss function to realize output of clustering labels, wherein the feature space is +.>And->The characteristics use the clustering loss function to calculate and cluster the sample, in order to realize the output of the clustering characteristic, in the clustering task, the total loss function is:
L=L D +L C 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310538812.0A CN116580848A (en) | 2023-05-15 | 2023-05-15 | Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310538812.0A CN116580848A (en) | 2023-05-15 | 2023-05-15 | Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116580848A true CN116580848A (en) | 2023-08-11 |
Family
ID=87539080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310538812.0A Pending CN116580848A (en) | 2023-05-15 | 2023-05-15 | Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116580848A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117409968A (en) * | 2023-10-27 | 2024-01-16 | 电子科技大学 | Hierarchical attention-based cancer dynamic survival analysis method and system |
CN117854599A (en) * | 2024-03-07 | 2024-04-09 | 北京大学 | Batch effect processing method, equipment and storage medium for multi-mode cell data |
CN117854599B (en) * | 2024-03-07 | 2024-05-28 | 北京大学 | Batch effect processing method, equipment and storage medium for multi-mode cell data |
-
2023
- 2023-05-15 CN CN202310538812.0A patent/CN116580848A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117409968A (en) * | 2023-10-27 | 2024-01-16 | 电子科技大学 | Hierarchical attention-based cancer dynamic survival analysis method and system |
CN117409968B (en) * | 2023-10-27 | 2024-05-03 | 电子科技大学 | Hierarchical attention-based cancer dynamic survival analysis method and system |
CN117854599A (en) * | 2024-03-07 | 2024-04-09 | 北京大学 | Batch effect processing method, equipment and storage medium for multi-mode cell data |
CN117854599B (en) * | 2024-03-07 | 2024-05-28 | 北京大学 | Batch effect processing method, equipment and storage medium for multi-mode cell data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111667884B (en) | Convolutional neural network model for predicting protein interactions using protein primary sequences based on attention mechanism | |
CN111785329B (en) | Single-cell RNA sequencing clustering method based on countermeasure automatic encoder | |
CN111210871A (en) | Protein-protein interaction prediction method based on deep forest | |
CN106529207B (en) | A kind of prediction technique of the protein in conjunction with ribonucleic acid | |
CN111370073B (en) | Medicine interaction rule prediction method based on deep learning | |
Schimunek et al. | Context-enriched molecule representations improve few-shot drug discovery | |
CN111798935A (en) | Universal compound structure-property correlation prediction method based on neural network | |
CN116580848A (en) | Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers | |
CN116798652A (en) | Anticancer drug response prediction method based on multitasking learning | |
CN113257357B (en) | Protein residue contact map prediction method | |
CN112270950A (en) | Fusion network drug target relation prediction method based on network enhancement and graph regularization | |
CN112085245A (en) | Protein residue contact prediction method based on deep residual error neural network | |
CN115661498A (en) | Self-optimization single cell clustering method | |
Pan et al. | Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data | |
AL-Bermany et al. | Microarray gene expression data for detection alzheimer’s disease using k-means and deep learning | |
CN115083511A (en) | Peripheral gene regulation and control feature extraction method based on graph representation learning and attention | |
Cudic et al. | Prediction of sorghum bicolor genotype from in-situ images using autoencoder-identified SNPs | |
Bai et al. | Clustering single-cell rna sequencing data by deep learning algorithm | |
Vigil et al. | DNA Sequencing Using Machine Learning Algorithms | |
Mariño et al. | Two weighted c-medoids batch SOM algorithms for dissimilarity data | |
Elhassani et al. | Deep Learning concepts for genomics: an overview | |
Lopez | Charting Cellular States, One Cell at a Time: Computational, Inferential and Modeling Perspectives | |
Chowdhury | Cell Type Classification Via Deep Learning On Single-Cell Gene Expression Data | |
Dolgikh | Unsupervised Generative Learning with Handwritten Digits | |
Lv et al. | EasyFS: an Efficient Model-free Feature Selection Framework via Elastic Transformation of Features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |