CN117153260A - Spatial transcriptome data clustering method, device and medium based on contrast learning - Google Patents

Spatial transcriptome data clustering method, device and medium based on contrast learning Download PDF

Info

Publication number
CN117153260A
CN117153260A CN202311204657.5A CN202311204657A CN117153260A CN 117153260 A CN117153260 A CN 117153260A CN 202311204657 A CN202311204657 A CN 202311204657A CN 117153260 A CN117153260 A CN 117153260A
Authority
CN
China
Prior art keywords
node
representation
node representation
loss
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311204657.5A
Other languages
Chinese (zh)
Other versions
CN117153260B (en
Inventor
李君一
韩睿
王旭
王轩
刘博�
王亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202311204657.5A priority Critical patent/CN117153260B/en
Publication of CN117153260A publication Critical patent/CN117153260A/en
Application granted granted Critical
Publication of CN117153260B publication Critical patent/CN117153260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a spatial transcriptome data clustering method, device, equipment and storage medium based on contrast learning, wherein the method comprises the following steps: obtaining weighted feature matrix and adjacency matrix based on the space transcriptome data and constructing adjacency graph; respectively inputting the adjacency graph into two encoders of the twin network structure to learn a first node representation and a second node representation; constructing a positive sample set for calculating contrast loss based on the first node representation and the second node representation; calculating cluster loss based on soft cluster distribution and auxiliary distribution of the nodes; and training the model through comparison loss and clustering loss guidance so as to obtain a clustering result. The node representation for constructing the positive sample set is obtained through the comparison learning of the twin network structure, the comparison loss and the clustering loss are calculated, and the model training is guided based on the comparison loss and the clustering loss among the nodes, so that the data clustering method for the genome transcriptome data is obtained based on the comparison learning, and the pertinence and the accuracy of the spatial transcriptome data clustering are improved.

Description

Spatial transcriptome data clustering method, device and medium based on contrast learning
Technical Field
The invention relates to the technical field of data processing, in particular to a spatial transcriptome data clustering method, device and equipment based on contrast learning and a storage medium.
Background
The function of complex tissues is fundamentally related to the spatial environment of different cell types, and the relative location of transcriptional expression in tissues is critical for understanding their biological functions and for describing the biological network of interactions. The space transcriptome can not only provide data information such as transcriptome of a study object, but also can position the space position of the transcriptome in the tissue, thereby providing valuable insight for research and diagnosis.
There are many studies currently taking into account the addition of spatial information to cluster analysis and developing different clustering algorithms for spatial transcriptomes. The method generally constructs positive and negative sample pairs based on data enhancement, and common data enhancement modes for graph structures and gene expression matrices, such as random addition and deletion edges and nodes, random masking and random shuffling gene expression matrices, inevitably destroy biological meanings contained in original structures, and cannot achieve the effect of constructing positive samples well. Meanwhile, the requirement of memory and the consumption of time during training are greatly increased by constructing a large-scale negative sample. Space transcriptomics has great development prospect, and each large platform is also continuously updating technical methods. The platform technology difference can bring data difference, for example, the minimum unit of the capturing point can be a cell or a spot, the captured genes have different numbers, the histological images can not be obtained, and the capability of the existing method for being compatible with the data of each large platform is to be improved. Meanwhile, most of methods based on contrast learning are to learn potential representations of nodes first and then cluster the potential representations by using the learned potential representations, which results in that the process of learning the potential representations is not optimized for clustering tasks.
Disclosure of Invention
The invention provides a spatial transcriptome data clustering method, device equipment and storage medium based on contrast learning, aiming at improving the pertinence of spatial transcriptome data clustering.
In order to achieve the above object, the present invention provides a spatial transcriptome data clustering method based on contrast learning, the method comprising:
preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
respectively inputting the adjacency graph into a first encoder and a second encoder of a twin network structure, and learning corresponding first node representation and second node representation through the first encoder and the second encoder;
constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set;
determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution;
and training a model through the contrast loss and the clustering loss guiding model, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed.
Optionally, preprocessing the gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacency matrix, and constructing an adjacency graph based on the weighted feature matrix and the adjacency matrix includes:
constructing a gene expression matrix based on gene expression quantity information of a space transcriptome, wherein behavior cells/capture points of the gene expression matrix are listed as gene expression quantities;
preprocessing the gene expression quantity data in the gene expression matrix, and performing dimension reduction processing on the gene expression matrix to obtain a feature matrix;
a adjacency graph of the feature matrix is determined based on spatial information between the cells/capture points.
Optionally, the determining the adjacency graph of the feature matrix based on the spatial information between the cells/capture points comprises:
taking the cells/capture points as nodes, and calculating the distance between the nodes according to the space coordinates;
adding edges to a plurality of neighbors of each node nearest to each node to obtain an adjacency matrix;
selecting target edges based on the distance threshold, and weighting each target edge based on the distance between the nodes to obtain a weighted adjacency matrix;
an adjacency graph is obtained based on nodes in the feature matrix and edges of the weighted adjacency matrix.
Optionally, the constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predictor of the second node representation and the first node representation according to the positive sample set includes:
constructing a positive sample set of nodes based on the first node representation and the second node representation of each node;
the contrast loss is calculated based on the total number of nodes, the first node representation within the positive sample set, and the node representation predicted value of the second node representation.
Optionally, the constructing a positive sample set based on the first node representation, the second node representation, and the spatial location comprises:
determining cosine similarity between the target node and the other nodes based on the second node representation of the target node and the first node representation of the other nodes;
after cosine similarity between each target node and other nodes is obtained, determining a K-neighbor node set of each target node;
determining the intersection of the target node in the K-neighbor node set and the neighbor nodes in the adjacency graph as a local semantic positive sample;
determining similar nodes belonging to the same K-means cluster as the target node, and determining the intersection of the K-neighbor node set and the similar nodes as a global semantic positive sample;
And determining the union set of the local semantic positive samples and the global semantic positive samples as a positive sample set of target nodes.
Optionally, before the constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set, the method further includes:
and inputting the second node representation into a predictor, and obtaining a node representation predicted value of the second node representation.
Optionally, the determining a soft cluster distribution and an auxiliary distribution of each node, and calculating a cluster loss based on the soft cluster distribution and the auxiliary distribution includes:
performing initial K-means clustering on the second node representation to obtain a plurality of clusters, and determining centroid characteristic representation of each cluster;
determining soft cluster distribution of the corresponding nodes based on the barycenter characteristic representation of the cluster where the nodes are located and the node representation predicted value represented by the second node;
determining corresponding auxiliary distribution based on the soft cluster distribution of the nodes;
the cluster loss is calculated based on the soft cluster distribution and the auxiliary distribution of all the nodes.
In order to achieve the above object, the present invention further provides a spatial transcriptome data clustering device based on contrast learning, including:
The construction module is used for preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
the learning module is used for inputting the adjacency graph into a first encoder and a second encoder of a twin network structure respectively, and learning corresponding first node representation and second node representation through the first encoder and the second encoder;
the contrast loss calculation module is used for constructing a positive sample set based on the first node representation and the second node representation, and calculating the contrast loss between the node representation predicted value of the second node representation and the first node representation according to the positive sample set;
the cluster loss calculation module is used for determining soft cluster distribution and auxiliary distribution of each node and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution;
and the clustering module is used for guiding model training through the contrast loss and the clustering loss, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed.
To achieve the above object, the present invention also provides a spatial transcriptome data clustering device based on contrast learning, comprising a memory, a processor and a spatial transcriptome data clustering program based on contrast learning stored on the memory, which when executed by the processor, implements the steps of the method as described above.
To achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a contrast-learning-based spatial transcriptome data clustering program which, when executed by a processor, implements the steps of the method as described above
Compared with the prior art, the spatial transcriptome data clustering method, device and equipment based on contrast learning and storage medium provided by the invention comprise the following steps: preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix; respectively inputting the adjacency graph into a first encoder and a second encoder of a twin network structure, and learning corresponding first node representation and second node representation through the first encoder and the second encoder; constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set; determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution; and training a model through the contrast loss and the clustering loss guiding model, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed. In this way, node representation for constructing a positive sample set is obtained through comparison learning by the twin network structure, and further, the comparison loss and the clustering loss are calculated, and training is conducted on the basis of the comparison loss and the clustering loss between nodes, so that a data clustering method for the genome data is obtained on the basis of the comparison learning, and the pertinence and the accuracy of the spatial transcriptome data clustering are improved.
Drawings
FIG. 1 is a schematic hardware architecture of a spatial transcriptome data clustering apparatus based on contrast learning according to various embodiments of the present invention;
FIG. 2 is a flow chart of a first embodiment of a spatial transcriptome data clustering method based on contrast learning according to the present invention;
FIG. 3 is a schematic view of a scenario involved in a first embodiment of a contrast learning-based spatial transcriptome data clustering method of the present invention;
FIG. 4 is a schematic diagram of a refinement flow of a first embodiment of a spatial transcriptome data clustering method based on contrast learning according to the present invention;
FIG. 5 is a flow chart of a second embodiment of a spatial transcriptome data clustering method based on contrast learning according to the present invention;
FIG. 6 is a schematic diagram of functional modules of a first embodiment of a spatial transcriptome data clustering apparatus based on contrast learning according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention mainly relates to a spatial transcriptome data clustering device based on contrast learning, which refers to a network connection device capable of realizing network connection, and the spatial transcriptome data clustering device based on contrast learning can be a server, a cloud platform and the like.
Referring to fig. 1, fig. 1 is a schematic hardware structure of a spatial transcriptome data clustering apparatus based on contrast learning according to various embodiments of the present invention. In an embodiment of the present invention, the spatial transcriptome data clustering device based on contrast learning may include a processor 1001 (e.g., a central processing unit Central Processing Unit, a CPU), a communication bus 1002, an input port 1003, an output port 1004, and a memory 1005. Wherein the communication bus 1002 is used to enable connected communications between these components; the input port 1003 is used for data input; the output port 1004 is used for data output, and the memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory, and the memory 1005 may be an optional storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration shown in fig. 1 is not limiting of the invention and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
With continued reference to FIG. 1, the memory 1005 of FIG. 1, which is a readable storage medium, may include an operating system, a network communication module, an application module, and a contrast learning-based spatial transcriptome data clustering routine. In fig. 1, the network communication module is mainly used for connecting with a server and performing data communication with the server; and the processor 1001 is configured to call a spatial transcriptome data clustering program based on contrast learning stored in the memory 1005, and perform the following operations:
Preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
respectively inputting the adjacency graph into a first encoder and a second encoder of a twin network structure, and learning corresponding first node representation and second node representation through the first encoder and the second encoder;
constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set;
determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution;
and training a model through the contrast loss and the clustering loss guiding model, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed.
The spatial transcriptome data clustering device based on the contrast learning provides a first embodiment of the spatial transcriptome data clustering method based on the contrast learning. Referring to fig. 2, fig. 2 is a flowchart of a first embodiment of a spatial transcriptome data clustering method based on contrast learning according to the present invention.
The first embodiment of the present invention proposes a spatial transcriptome data clustering method based on contrast learning, as shown in fig. 2, fig. 2 is a schematic flow chart of the first embodiment of the spatial transcriptome data clustering method based on contrast learning, and the method includes:
s101, preprocessing a gene expression matrix constructed based on space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
specifically, referring to fig. 3, fig. 3 is a scene graph related to a first embodiment of a spatial transcriptome data clustering method based on contrast learning according to the present invention. As shown in FIG. 3a, it is first necessary to construct a Gene Expression matrix (Gene Expression) based on the information of the Gene Expression level of the space transcriptome (spatial transcriptiomics), and the behavior cells/capture points of the Gene Expression matrix are listed as the Gene Expression level. And preprocessing the data, and then performing PCA dimension reduction to obtain a feature matrix X (Feature Matrix X). And calculating the distance between the cells/capture points according to the space coordinates by taking the cells/capture points as nodes, and adding edges to k nearest neighbors of each cell/capture point to obtain an adjacency matrix. A weighted adjacency matrix a of adjacency matrices is further obtained. And the feature matrix X is used as the attribute of the node, and finally, an adjacency Graph (X, a) is obtained, and in this embodiment, the adjacency Graph is represented as Graph (X, a) = (V, E), where V represents the node and the node set V contains all cells/capture points, E represents the edge set, and the edge set E contains edges between all nodes.
Step S102, inputting the adjacency graph into a first encoder and a second encoder of a twin network structure respectively, wherein the adjacency graph is passed through the first encoder f ξ And said second encoder f θ Learning a corresponding first node representation H ξ And the second node represents H θ
The spatial transcriptome data clustering model based on contrast learning designed in the embodiment is based on a contrast learning method, potential representation of nodes is learned through a twin network structure, a positive sample set of the nodes is constructed to design contrast loss, then clustering loss is introduced to construct a total loss function to guide training, and feature learning and spatial clustering results are optimized.
The framework of the spatial transcriptome data clustering model based on contrast learning is shown in fig. 3, and the scene graph shown in fig. 3 is also the framework of the spatial transcriptome data clustering model based on contrast learning. As shown in fig. 3a, after preprocessing the space transcriptome data, an adjacency Graph (X, a) is obtained, and the adjacency Graph (X, a) is input into a first Encoder (Teacher Encoder) and a second Encoder (Student Encoder) of a twin network structure, respectively, and the first Encoder is denoted as f in this embodiment ξ The second encoder is denoted as f θ . I.e. inputting the edges in the weighted adjacency matrix a and the nodes in the feature matrix X into the twin network structure, a first encoder f ξ Learning the first node representation H ξ The second encoder is denoted as f θ Learning the second node representation H θ
The twin network refers to two networks of identical structure and different parameters, in this embodiment a first encoder f ξ And a second braidingEncoder f θ . First encoder f ξ And a second encoder f θ Two layers of GCNs (Graph Convolutional network, graph rolling networks) with the same structure and different parameters are respectively and randomly initialized. In this embodiment, the first encoder f ξ Is based on the parameters of the second encoder f θ Is obtained by momentum update of the parameters of the model (C). Will second encoder f θ The parameter of (a) is denoted as θ, and the first encoder f ξ The parameter of (a) is expressed as xi:
ξ←τξ+(1-τ)θ
where τ is the momentum update coefficient. The updating of xi is slower and smoother, thus effectively preventing possible crash phenomenon in the learning process, so that the model cannot learn meaningful representation.
Inputting the edge of the weighted adjacent matrix A obtained after preprocessing and the node of the characteristic matrix X, and a first encoder f ξ Learning the first node representation H ξ The second encoder is denoted as f θ Learning the second node representation H θ
H ζ =f ξ (X,A)
H θ =f θ (X,A)
Thus, the first node representation H of each node is obtained by the twinning network structure ξ And the second node represents H θ
Step S103, constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set;
referring to fig. 4, fig. 4 is a schematic diagram of a refinement flow of a first embodiment of a spatial transcriptome data clustering method based on contrast learning according to the present invention, as shown in fig. 4, step S103 includes:
step S1031: constructing a positive sample set of nodes based on the first node representation and the second node representation of each node;
representing the first node as H ξ And the second node represents H θ Is determined as the target row, and the target is determinedThe node is denoted as v i I.e. the first target node representsAnd the second target node representation->v i E V is represented by different characteristics learned by different encoders. For any target node v i By->And->Determining a target node v i Positive sample set P of (2) i . In actual operation, each row needs to be taken as a target row to determine a corresponding positive sample set.
Specifically, first, a cosine similarity sim (v) between the target node and the other nodes is determined based on the second node representation of the target node and the first node representation of the other nodes i ,v j ) The target node is any one of all nodes;
the selection strategy of positive samples is important in relation to whether the comparison learning can successfully learn a meaningful node representation. For a given target node v i Calculating a target node v i Through a second encoder f θ Learned second node representationWith other nodes v j Through the first encoder f ξ The first node learned represents +>Distance between, i.e. calculate +.>Andcosine similarity of the target node to other nodes is expressed as sim (v i ,v j ) The following steps are:
wherein II represents the modulo operation on the vector. Thus, the cosine similarity between the nodes can be calculated.
After the cosine similarity between each target node and other nodes is obtained, determining a K-neighbor node set N of each target node i The method comprises the steps of carrying out a first treatment on the surface of the Representing a set of K-neighbor nodes as N i The present embodiment determines the K-Neighbor node set based on the known steps of the KNN (K-Nearest Neighbor) algorithm, and will not be described in detail herein. K-neighbor node set N i Node and v in (2) i Is spatially adjacent in representation, so N i Can be taken as v i And (3) reasonably selecting a positive sample set.
Considering only nearest neighbors in the representation space may not only ignore the original structural information in the adjacency graph, but may also ignore the global semantic information of the graph. Based on this, the present embodiment also designs a Local semantic Positive sample (Local Positive) and a Global semantic Positive sample (Global Positive) to capture the target node v, respectively i Positive samples in local and global semantic contexts.
The embodiment sets the target node in the K-neighbor node set N i The intersection with the neighbor node in the adjacency graph is determined to be a local semantic positive sample; the local semantic positive sample is denoted as L i The following steps are:
L i =N i ∩A i
wherein A is i Representing neighbor nodes in the adjacency graph G. Local semantic positive sample L i In K-neighbor node set N i On the basis of which local semantic information between nodes is considered.
Determining similar nodes belonging to the same K-means cluster as the target node, and combining the K-neighbor node set Ni with the similar nodesThe intersection of points is determined as a global positive semantic sample; representing global semantic positive samples as G i The following steps are:
G i =N i ∩C i
wherein C is i Refers to the node v with the target node in the K-means clustering result i Like nodes in the same cluster. C (C) i In K-neighbor node set N i On the basis of which global semantic information between nodes is taken into account.
Finally, determining the union of the local semantic positive sample and the global semantic positive sample as a positive sample set P of the target node i . With continued reference to FIG. 3, as shown in FIG. 3c, the union of local semantic positive samples and global semantic positive samples is a positive sample set P i . Representing a positive sample set of target nodes as P i Then there is P i =L i ∪G i
Step S1032: calculating a contrast loss L based on the total number of nodes, the first node representation in the positive sample set, and the node representation predicted value represented by the second node loss
With continued reference to fig. 3, as shown in fig. 3b, the present embodiment represents the second node as H in advance θ Representing input predictors (predictors q) θ ) Obtaining a node representation predicted value represented by the second node, the present embodiment represents the node representation predicted value represented by the second node as Z θ
The second node represents H through feature transformation of the predictor θ Predicted value Z of (2) θ Representing H with the first node ξ The difference in characteristics between them can be further enlarged. Based on this, a contrast loss function is designed with the aim of reducing the distance of each node to its corresponding node in the positive sample set P, i.e. as close as possible to the second node predictorAnd positive sample set P i Other nodes in (b) representWill beThe contrast loss is denoted as L con The following steps are: />
Where N represents the total number of nodes, i.e., the number of nodes, i represents the ith row in which the node is located,is the second node representing H θ The i-th row node of the list represents->Predicted value of v j Representing a target node v i Nodes other than->Is->The other nodes are represented, T represents the transpose of the vector, and II represents the modulo operation.
Step S104, determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss L based on the soft cluster distribution and auxiliary distribution cluster
Because the graph clustering task is unsupervised, nodes which cannot be learned in the training process represent whether the nodes are well optimized or not. In order to enable the generated node representation to better serve the clustering task, the spatial transcriptome data clustering model based on contrast learning related to the embodiment adds clustering into a training process, and the training of the encoder is optimized through clustering loss.
Firstly, carrying out initial K-means clustering on a second node representation to obtain a plurality of clusters, and determining the centroid characteristic representation of each cluster; taking the centroid feature as a learning process represented by a soft tag supervision node;
for the learned second node characteristic H θ Firstly, using k-means to make initial clustering, and said algorithm can produce several polymersClass clusters and deriving a centroid signature representation of the centroid of each cluster, e.g. the centroid signature representation of cluster u is denoted μ u . One way to solve the unsupervised learning task is to generate a "soft" tag, and then use this "soft" tag to supervise the process of parameter learning.
Then, determining soft cluster distribution of the corresponding nodes based on the barycenter characteristic representation of the cluster where the nodes are located and the node representation predicted value represented by the second node;
Representing predicted value z by soft cluster distribution measurement node based on t-distribution i And cluster centroid mu u Is a similarity of (3). From the previous outputs, the characteristic representation of each node and the centroid characteristic representation of each cluster can be obtained at present, so that node v can be obtained i Probability q belonging to cluster u iu Seen as a soft cluster distribution for each node.
Representing the soft cluster distribution of node i as q iu The following steps are:
wherein z is i Representing predicted values for nodes, μ u Centroid feature representation, mu, of cluster u where node i is located k The centroid feature representing cluster k represents the number of clusters.
Determining corresponding auxiliary distribution based on the soft cluster distribution of the nodes; distributing soft clusters q iu The corresponding auxiliary distribution is denoted as p iu The following steps are:
where k represents the number of clusters.
The cluster loss is calculated based on the soft cluster distribution and the auxiliary distribution of all the nodes. The present embodiment defines a soft cluster distribution q iu Auxiliary profile p iu Then the soft cluster distribution q is pulled up by KL divergence iu Auxiliary profile p iu To optimize both cluster distribution and node representation, thus obtaining cluster loss.
The cluster loss is expressed as L cluster The calculation formula of the cluster loss can be expressed as follows:
Wherein KL (P||Q) represents the soft cluster distribution Q iu Auxiliary profile p iu KL divergence between (Kullback-Leibler divergence). As shown in fig. 3d, KL divergence draws in soft cluster distribution q iu Auxiliary profile p iu Is a distance of (3).
And step S105, training a model through the contrast loss and the clustering loss, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed.
In the auxiliary distribution p iu In (2) by the square q iu An "emphasis" effect can be achieved to emphasize those effects of high probability "confidence allocation". During training, the auxiliary distribution p iu In effect, a label is provided. And finally, fitting the difference between the two probability distributions through a calculation formula of the clustering loss to achieve the aim of unsupervised clustering. Meanwhile, the calculation formula of the clustering loss is also used as the clustering loss L cluster The whole training process is guided.
The embodiment obtains the second node representation H through contrast learning θ And contrast loss L con Wherein the second node represents H θ Further used for an unsupervised clustering module to obtain a clustering loss L cluster Cluster loss L cluste Loss of contrast L con The sum is the total loss. The total loss is denoted as L all The following steps are:
L all =L con +L cluster
after the training of the spatial transcription data clustering model based on contrast learning is completed, the spatial transcription data clustering model is distributed according to soft clusters q iu An estimate of the target node vi label is obtained,namely, the clustering result, expressed as si, is:
i.e. for soft cluster distribution q iu And taking the arg function as the maximum value of the clustering result.
According to the scheme, the genetic expression matrix constructed based on the space transcriptome data is preprocessed to obtain the weighted feature matrix and the adjacent matrix, and an adjacent graph is constructed based on the weighted feature matrix and the adjacent matrix; respectively inputting the adjacency graph into a first encoder and a second encoder of a twin network structure, and learning corresponding first node representation and second node representation through the first encoder and the second encoder; constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set; determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution; and training a model through the contrast loss and the clustering loss guiding model, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed. In this way, node representation for constructing a positive sample set is obtained through comparison learning by the twin network structure, and further, the comparison loss and the clustering loss are calculated, and training is conducted on the basis of the comparison loss and the clustering loss between nodes, so that a data clustering method for the genome data is obtained on the basis of the comparison learning, and the pertinence and the accuracy of the spatial transcriptome data clustering are improved.
As shown in fig. 5, a second embodiment of the present invention proposes a spatial transcriptome data clustering method based on contrast learning, based on the first embodiment shown in fig. 2, the step S101 includes:
step S1011, constructing a gene expression matrix based on gene expression quantity information of the space transcriptome, wherein behavior cells/capture points of the gene expression matrix are listed as gene expression quantity;
the information of the gene expression level required in this example was extracted from the spatial transcriptome, and mainly includes the cell/capture site and the gene expression level. Then, the cells/capture spots are used as rows and the gene expression amounts are used as columns to construct a gene expression matrix.
Step S1012, preprocessing gene expression quantity data in the gene expression matrix, and performing dimension reduction processing on the gene expression matrix to obtain a feature matrix X;
the original gene expression quantity data has a large amount of noise, and data preprocessing is needed. And deleting genes expressed in less than 3 cells according to the high-dimensional sparsity of the gene expression data, carrying out normalization and standardization treatment on the data, and finally reducing the dimension by using PCA to obtain a feature matrix X.
Step S1013, determining an adjacency graph of the feature matrix X based on the spatial information between the cells/capture points.
The graph formed by the cells/capture points is subjected to self-supervision clustering, so that a graph which can display the relationship between the cells/capture points, namely an adjacent graph, is constructed by utilizing spatial information.
Specifically, the cell/capture points are used as nodes, and the distance between the nodes is calculated according to the space coordinates; the present embodiment calculates the distance between any two nodes.
Adding edges to a plurality of neighbors of each node nearest to each node to obtain an adjacency matrix; selecting target edges based on the distance threshold value, and weighting each target edge based on the distance between the nodes to obtain a weighted adjacency matrix A; referring to fig. 3a, a closer target edge is determined based on a distance threshold (threshold).
An adjacency graph is obtained based on nodes in the feature matrix X and edges of the weighted adjacency matrix. Taking the feature matrix X as the attribute of the node, finally obtaining an adjacency graph G (X, a) = (V, E), wherein E represents a node set, V contains all cells/capture points, E represents an edge set, and E contains edges between all nodes.
According to the embodiment, the weighted feature matrix and the weighted adjacent matrix are obtained by preprocessing the gene expression matrix constructed based on the space transcriptome data through the scheme, and the adjacent graph is constructed based on the weighted feature matrix and the adjacent matrix, so that the space transcriptome data is processed, the adjacent graph favorable for contrast learning is obtained, and the implementation of the contrast learning step is facilitated.
Further, to achieve the above objective, the present invention further provides a spatial transcriptome data clustering device based on contrast learning, specifically, referring to fig. 6, fig. 6 is a schematic functional block diagram of a first embodiment of a spatial transcriptome data clustering device based on contrast learning according to the present invention, where the device includes:
the construction module 10 is used for preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
a learning module 20, configured to input the adjacency graph into a first encoder and a second encoder of a twin network structure, respectively, and learn corresponding first node representations and second node representations through the first encoder and the second encoder;
a contrast loss calculation module 30, configured to construct a positive sample set based on the first node representation and the second node representation, and calculate a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set;
a cluster loss calculation module 40, configured to determine a soft cluster distribution and an auxiliary distribution of each node, and calculate a cluster loss based on the soft cluster distribution and the auxiliary distribution;
And the clustering module 50 is used for guiding model training through the contrast loss and the clustering loss, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed.
Further, the building block 10 comprises:
a construction unit for constructing a gene expression matrix based on gene expression amount information of the space transcriptome, wherein behavior cells/capture points of the gene expression matrix are listed as gene expression amounts;
the preprocessing unit is used for preprocessing the gene expression quantity data in the gene expression matrix and performing dimension reduction processing on the gene expression matrix to obtain a feature matrix;
and an adjacency graph determining unit for determining an adjacency graph of the feature matrix based on the spatial information between the cells/capture points.
Further, the adjacency graph determination unit includes:
a calculating subunit, configured to calculate a distance between nodes according to the spatial coordinates by using the cell/capture point as a node;
an adding subunit, configured to add edges to a plurality of neighbors of each node that are closest to each node, to obtain an adjacency matrix;
the weighting subunit is used for selecting target edges based on the distance threshold value, and carrying out weighting processing on each target edge based on the distance between the nodes to obtain a weighted adjacent matrix;
An obtaining subunit is configured to obtain an adjacency graph based on the nodes in the feature matrix and the edges of the weighted adjacency matrix.
Further, the contrast loss calculation module 30 includes:
a positive sample set construction unit for constructing a positive sample set of nodes based on the first node representation and the second node representation of each node;
and the contrast loss calculation unit is used for calculating the contrast loss based on the total number of the nodes, the first node representation in the positive sample set and the node representation predicted value of the second node representation.
Further, the positive sample set constructing unit includes:
a cosine similarity determination subunit, configured to determine cosine similarity between the target node and the other nodes based on the second node representation of the target node and the first node representation of the other nodes;
the K-neighbor node set determining subunit is used for determining the K-neighbor node set of each target node after the cosine similarity between each target node and other nodes is obtained;
the local semantic positive sample set determining subunit is used for determining the intersection set of the target node in the K-neighbor node set and the neighbor nodes in the adjacent graph as a local semantic positive sample;
the global semantic positive sample set determining subunit is used for determining similar nodes belonging to the same K-means cluster with the target node, and determining the intersection set of the K-neighbor node set and the similar nodes as a global semantic positive sample;
And the positive sample set determining subunit is used for determining the union set of the local semantic positive samples and the global semantic positive samples as a positive sample set of a target node.
Further, the contrast loss calculation module 30 further includes:
and inputting the second node representation into a predictor, and obtaining a node representation predicted value of the second node representation.
Further, the cluster loss calculation module 50 further includes:
the clustering unit is used for carrying out initial K-means clustering on the second node representation, obtaining a plurality of clustering clusters and determining the centroid characteristic representation of each clustering cluster;
and a soft cluster distribution determining unit. The soft cluster distribution of the corresponding node is determined based on the centroid feature representation of the cluster where the node is located and the node representation prediction value of the second node representation;
an auxiliary distribution determining unit, configured to determine a corresponding auxiliary distribution based on the soft cluster distribution of the node;
and the cluster loss calculation unit is used for calculating cluster loss based on the soft cluster distribution and the auxiliary distribution of all the nodes.
In addition, the invention also provides a computer readable storage medium, on which a spatial transcriptome data clustering program based on contrast learning is stored, and the steps of the spatial transcriptome data clustering method based on contrast learning described above are implemented when the spatial transcriptome data clustering program based on contrast learning is run by a processor, which are not described herein.
Compared with the prior art, the spatial transcriptome data clustering method, device and equipment based on contrast learning and storage medium provided by the invention comprise the following steps: preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix; respectively inputting the adjacency graph into a first encoder and a second encoder of a twin network structure, and learning corresponding first node representation and second node representation through the first encoder and the second encoder; constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set; determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution; and training a model through the contrast loss and the clustering loss guiding model, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed. In this way, node representation for constructing a positive sample set is obtained through comparison learning by the twin network structure, and further, the comparison loss and the clustering loss are calculated, and training is conducted on the basis of the comparison loss and the clustering loss between nodes, so that a data clustering method for the genome data is obtained on the basis of the comparison learning, and the pertinence and the accuracy of the spatial transcriptome data clustering are improved.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or modifications in the structures or processes described in the specification and drawings, or the direct or indirect application of the present invention to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A spatial transcriptome data clustering method based on contrast learning, the method comprising:
preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
respectively inputting the adjacency graph into a first encoder and a second encoder of a twin network structure, and learning corresponding first node representation and second node representation through the first encoder and the second encoder;
constructing a positive sample set based on the first node representation and the second node representation, and calculating a contrast loss between a node representation predicted value of the second node representation and the first node representation according to the positive sample set;
determining soft cluster distribution and auxiliary distribution of each node, and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution;
Training a model by the contrast loss and the clustering loss guidance model, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed
2. The method of claim 1, wherein preprocessing the gene expression matrix constructed based on the spatial transcriptome data to obtain a weighted feature matrix and an adjacency matrix, and constructing an adjacency graph based on the weighted feature matrix and adjacency matrix comprises:
constructing a gene expression matrix based on gene expression quantity information of a space transcriptome, wherein behavior cells/capture points of the gene expression matrix are listed as gene expression quantities;
preprocessing the gene expression quantity data in the gene expression matrix, and performing dimension reduction processing on the gene expression matrix to obtain a feature matrix;
a adjacency graph of the feature matrix is determined based on spatial information between the cells/capture points.
3. The method of claim 2, wherein the determining the adjacency graph of the feature matrix based on spatial information between the cells/capture points comprises:
taking the cells/capture points as nodes, and calculating the distance between the nodes according to the space coordinates;
Adding edges to a plurality of neighbors of each node nearest to each node to obtain an adjacency matrix;
selecting target edges based on the distance threshold, and weighting each target edge based on the distance between the nodes to obtain a weighted adjacency matrix;
obtaining an adjacency graph based on nodes in the feature matrix and edges of the weighted adjacency matrix
4. The method of claim 1, wherein constructing a positive sample set based on the first node representation and the second node representation, and calculating a loss of contrast between a node representation predictor of the second node representation and the first node representation from the positive sample set comprises:
constructing a positive sample set of nodes based on the first node representation and the second node representation of each node;
the contrast loss is calculated based on the total number of nodes, the first node representation within the positive sample set, and the node representation predicted value of the second node representation.
5. The method of claim 4, wherein the constructing a positive sample set based on the first node representation, the second node representation, and spatial location comprises:
determining cosine similarity between the target node and the other nodes based on the second node representation of the target node and the first node representation of the other nodes;
After cosine similarity between each target node and other nodes is obtained, determining a K-neighbor node set of each target node;
determining the intersection of the target node in the K-neighbor node set and the neighbor nodes in the adjacency graph as a local semantic positive sample;
determining similar nodes belonging to the same K-means cluster as the target node, and determining the intersection of the K-neighbor node set and the similar nodes as a global semantic positive sample;
and determining the union set of the local semantic positive samples and the global semantic positive samples as a positive sample set of target nodes.
6. The method of claim 1, wherein prior to constructing a positive sample set based on the first node representation and the second node representation and calculating a contrast loss between a node representation predictor of the second node representation and the first node representation from the positive sample set, further comprising:
inputting the second node representation into a predictor to obtain a node representation predicted value of the second node representation
7. The method of claim 1, wherein the determining soft cluster distribution and auxiliary distribution for each node and calculating cluster loss based on the soft cluster distribution and auxiliary distribution comprises:
Performing initial K-means clustering on the second node representation to obtain a plurality of clusters, and determining centroid characteristic representation of each cluster;
determining soft cluster distribution of the corresponding nodes based on the barycenter characteristic representation of the cluster where the nodes are located and the node representation predicted value represented by the second node;
determining corresponding auxiliary distribution based on the soft cluster distribution of the nodes;
the cluster loss is calculated based on the soft cluster distribution and the auxiliary distribution of all the nodes.
8. A spatial transcriptome data clustering device based on contrast learning, comprising:
the construction module is used for preprocessing a gene expression matrix constructed based on the space transcriptome data to obtain a weighted feature matrix and an adjacent matrix, and constructing an adjacent graph based on the weighted feature matrix and the adjacent matrix;
the learning module is used for inputting the adjacency graph into a first encoder and a second encoder of a twin network structure respectively, and learning corresponding first node representation and second node representation through the first encoder and the second encoder;
the contrast loss calculation module is used for constructing a positive sample set based on the first node representation and the second node representation, and calculating the contrast loss between the node representation predicted value of the second node representation and the first node representation according to the positive sample set;
The cluster loss calculation module is used for determining soft cluster distribution and auxiliary distribution of each node and calculating cluster loss based on the soft cluster distribution and the auxiliary distribution;
and the clustering module is used for guiding model training through the contrast loss and the clustering loss, and obtaining a clustering result of the corresponding node based on the soft clustering distribution after model training is completed.
9. A contrast learning based spatial transcriptome data clustering device comprising a memory, a processor and a contrast learning based spatial transcriptome data clustering routine stored on the memory, which when executed by the processor performs the steps of the method according to any of claims 1-7.
10. A computer readable storage medium, characterized in that it has stored thereon a contrast learning based spatial transcriptome data clustering program, which when executed by a processor, implements the steps of the method according to any of claims 1-7.
CN202311204657.5A 2023-09-18 2023-09-18 Spatial transcriptome data clustering method, device and medium based on contrast learning Active CN117153260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311204657.5A CN117153260B (en) 2023-09-18 2023-09-18 Spatial transcriptome data clustering method, device and medium based on contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311204657.5A CN117153260B (en) 2023-09-18 2023-09-18 Spatial transcriptome data clustering method, device and medium based on contrast learning

Publications (2)

Publication Number Publication Date
CN117153260A true CN117153260A (en) 2023-12-01
CN117153260B CN117153260B (en) 2024-06-25

Family

ID=88908033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311204657.5A Active CN117153260B (en) 2023-09-18 2023-09-18 Spatial transcriptome data clustering method, device and medium based on contrast learning

Country Status (1)

Country Link
CN (1) CN117153260B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117763163A (en) * 2023-12-25 2024-03-26 北京智谱华章科技有限公司 Method, device, equipment and medium for fusion encoding of text structure information of graph
CN118016149A (en) * 2024-04-09 2024-05-10 太原理工大学 Spatial domain identification method for integrating space transcriptome multi-mode information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098519A1 (en) * 2014-06-11 2016-04-07 Jorge S. Zwir Systems and methods for scalable unsupervised multisource analysis
KR20210102039A (en) * 2020-02-11 2021-08-19 삼성전자주식회사 Electronic device and control method thereof
CN115310554A (en) * 2022-08-24 2022-11-08 江苏至信信用评估咨询有限公司 Item allocation strategy, system, storage medium and device based on deep clustering
CN116312782A (en) * 2023-05-18 2023-06-23 南京航空航天大学 Spatial transcriptome spot region clustering method fusing image gene data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160098519A1 (en) * 2014-06-11 2016-04-07 Jorge S. Zwir Systems and methods for scalable unsupervised multisource analysis
KR20210102039A (en) * 2020-02-11 2021-08-19 삼성전자주식회사 Electronic device and control method thereof
CN115310554A (en) * 2022-08-24 2022-11-08 江苏至信信用评估咨询有限公司 Item allocation strategy, system, storage medium and device based on deep clustering
CN116312782A (en) * 2023-05-18 2023-06-23 南京航空航天大学 Spatial transcriptome spot region clustering method fusing image gene data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侯海薇: "基于无监督表征学习的深度聚类研究进展", 《模式识别与人工智能》, 15 November 2022 (2022-11-15), pages 999 - 1014 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117763163A (en) * 2023-12-25 2024-03-26 北京智谱华章科技有限公司 Method, device, equipment and medium for fusion encoding of text structure information of graph
CN118016149A (en) * 2024-04-09 2024-05-10 太原理工大学 Spatial domain identification method for integrating space transcriptome multi-mode information

Also Published As

Publication number Publication date
CN117153260B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN117153260B (en) Spatial transcriptome data clustering method, device and medium based on contrast learning
Gu et al. Stack-captioning: Coarse-to-fine learning for image captioning
CN111461226A (en) Countermeasure sample generation method, device, terminal and readable storage medium
CN108665065B (en) Method, device and equipment for processing task data and storage medium
CN107169983B (en) Multi-threshold image segmentation method based on cross variation artificial fish swarm algorithm
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN103679715B (en) A kind of handset image feature extracting method based on Non-negative Matrix Factorization
CN114202123A (en) Service data prediction method and device, electronic equipment and storage medium
Liu et al. EACP: An effective automatic channel pruning for neural networks
CN110555530B (en) Distributed large-scale gene regulation and control network construction method
Xia et al. TCC-net: A two-stage training method with contradictory loss and co-teaching based on meta-learning for learning with noisy labels
CN111598093B (en) Method, device, equipment and medium for generating structured information of characters in picture
WO2021059527A1 (en) Learning device, learning method, and recording medium
CN109859063B (en) Community discovery method and device, storage medium and terminal equipment
CN114638823B (en) Full-slice image classification method and device based on attention mechanism sequence model
KR102000832B1 (en) miRNA and mRNA ASSOCIATION ANALYSIS METHOD AND GENERATING APPARATUS FOR miRNA and mRNA ASSOCIATION NETWORK
CN115148292A (en) Artificial intelligence-based DNA (deoxyribonucleic acid) motif prediction method, device, equipment and medium
Tuba et al. Modified seeker optimization algorithm for image segmentation by multilevel thresholding
CN115587616A (en) Network model training method and device, storage medium and computer equipment
CN114613433A (en) Method for analyzing pseudo-time trajectory of single-cell transcriptome data and computer system
CN114268625B (en) Feature selection method, device, equipment and storage medium
CN113435519A (en) Sample data enhancement method, device, equipment and medium based on antagonistic interpolation
Li et al. A BYY scale-incremental EM algorithm for Gaussian mixture learning
Li et al. CoAxNN: Optimizing on-device deep learning with conditional approximate neural networks
US9183503B2 (en) Sparse higher-order Markov random field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant