WO2023195564A1 - Spatial transcriptome information analysis apparatus and analysis method using same - Google Patents

Spatial transcriptome information analysis apparatus and analysis method using same Download PDF

Info

Publication number
WO2023195564A1
WO2023195564A1 PCT/KR2022/005223 KR2022005223W WO2023195564A1 WO 2023195564 A1 WO2023195564 A1 WO 2023195564A1 KR 2022005223 W KR2022005223 W KR 2022005223W WO 2023195564 A1 WO2023195564 A1 WO 2023195564A1
Authority
WO
WIPO (PCT)
Prior art keywords
spatial
data
transcriptome
information analysis
information
Prior art date
Application number
PCT/KR2022/005223
Other languages
French (fr)
Korean (ko)
Inventor
서미경
이대승
최홍윤
Original Assignee
주식회사 포트래이
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 포트래이 filed Critical 주식회사 포트래이
Publication of WO2023195564A1 publication Critical patent/WO2023195564A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Definitions

  • the present invention relates to a spatial transcriptome information analysis device and an analysis method using the same.
  • the present invention relates to an analysis device using reconstructed data that reconstructs spatial transcriptome information so that empty spaces without transcript information in tissue images are interpolated, and an analysis method using the same. It's about.
  • Spatial transcriptome data refers to the sum of data containing spatial location information and transcriptome information (gene expression information). Spatial transcriptome data is data consisting of hundreds to tens of thousands of spots, and the spot refers to a very small part of the tissue. In other words, spatial transcriptome data is data composed of tissue location information and expression information of genes in that tissue.
  • Spatial transcriptome data requires analysis of tens of thousands of small spatial regions (spots) of tens of thousands of gene expression information, and the location information of the spots is also added, so an appropriate analysis method is required.
  • the purpose of the present invention is to recognize the problems and needs described above, and to infer information in the empty space between spots without spatial transcriptome information to select gene sets showing similar expression patterns in space or to determine gene expression between different tissues.
  • the goal is to provide a spatial transcriptome information analysis device that can easily compare aspects and a spatial transcriptome information analysis method using the same.
  • the present invention was created to achieve the object of the present invention as described above, and includes location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and the plurality of spots (P 1 , ..., P N ) and an information receiving unit 110 that receives spatial transcript data consisting of transcript information (R 1 , ..., R N ) corresponding to each of them; Data reconstruction to calculate reconstructed data in which the spatial transcriptome data is reconstructed so that empty spaces between the plurality of spots (P 1 , ..., P N ) without the transcript information (R 1 , ..., R N ) are interpolated.
  • Disclosed is a spatial transcriptome information analysis device 100 including a transcriptome information analysis unit 130 that analyzes gene expression patterns based on the reconstructed data.
  • the transcript information (R 1 , ..., R N ) may include information about the expression level of each of the plurality of transcripts (A 1 , ..., A M ).
  • the gene expression pattern may be a gene expression pattern of the same tissue as the tissue image (TI) or a gene expression pattern of a different tissue.
  • the reconstruction data is such that, for each of the plurality of transcripts (A 1 , ..., A M ) , the expression level is determined by the central coordinates (C 1 , ..., C N ) can include reconstructed transcriptome distribution information assuming that it is distributed according to a continuous probability distribution centered on ).
  • the continuous probability distribution may be a normal distribution with the central coordinates (C 1 , ..., C N ) as the median and a preset dispersion value.
  • the spatial transcriptome information analysis device 100 produces a two-dimensional image (T 1 , ..., T K visualizing the spatial distribution of the plurality of transcripts (A 1 , ..., A M ) from the transcript distribution information ) may additionally include an image generator 140 that generates.
  • the transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts characteristic values of the reconstructed data, and a clustering unit that generates a cluster (CLT) that clusters the reconstructed data based on the similarity of the characteristic values. It may include unit 134.
  • the feature extraction unit 132 may extract the feature values by reducing the reconstructed data to low-dimensional data.
  • the feature extraction unit 132 may include an artificial neural network model that compresses the reconstructed data into low-dimensional data.
  • the artificial neural network model may use the reconstruction data as learning data.
  • the characteristic value may be a latent vector value expressed as the low-dimensional data.
  • the clustering unit 134 may perform clustering using an unsupervised learning-based clustering algorithm.
  • the clustering unit 134 can derive a gene set (G) associated with the cluster (CLT).
  • the clustering unit 134 may finally select genes to be included in the gene set (G) based on at least one of the silhouette value and the correlation coefficient of the cluster (CLT).
  • the image generator 140 may generate the two-dimensional images (T 1 , ..., T K ) for each of the different tissue images (TI).
  • the spatial transcriptome information analysis device 100 includes a spatial normalization unit that performs spatial normalization on the two-dimensional images (T 1 , ..., T K ) to generate spatial normalized images (S 1 , ..., S K ). (150) may be additionally included.
  • the transcriptome information analysis unit 130 compares the spatial normalized images (S 1 , ..., S K ) for the different tissue images (TI) to determine the difference between the different tissue images (TI). Gene expression patterns can be compared and analyzed.
  • the present invention includes a spatial transcriptome information analysis device (100);
  • a spatial transcriptome information analysis system (1000) is disclosed, which includes the spatial transcriptome information analysis device (100) and a user terminal (300) connected through a network.
  • the present invention discloses a spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100.
  • the present invention discloses a computer-executable spatial transcriptome information analysis program for performing a spatial transcriptome information analysis method.
  • the present invention provides location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information corresponding to each of the plurality of spots (P 1 , ..., P N ).
  • an image generation device 200 that generates a two-dimensional image (T 1 , ..., T K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data composed of (R 1 , ..., R N ). do.
  • the expression level of each of the plurality of transcripts (A 1 , ..., AM ) is determined by the central coordinates (C 1 , ..., C N ) of the plurality of spots (P 1 , ..., P N ). It can include reconstructed transcriptome distribution information assuming that it is distributed along a continuous probability distribution with the center.
  • the present invention discloses a genetic screening method for extracting genes with similar spatial distribution using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200.
  • the present invention provides a tissue image (TI) of different tissues in contrast to the two-dimensional images (T1, ..., TK) generated for the tissue images (TI) of different tissues in the two-dimensional image generating device 200.
  • TI tissue image
  • T1, ..., TK tissue images
  • the spatial transcriptome information analysis device and analysis method using the same infer information in the empty space between spots without transcript information to select gene sets showing similar expression patterns in space or to determine gene expression between different tissues. This has the advantage of facilitating aspect comparison and providing better biological and functional understanding and new insights.
  • the present invention can generate a two-dimensional image of transcript distribution information (gene expression pattern) by inferring gene expression values in an empty space without transcript information based on gene expression and spatial information in tissues. , You can also find genes that are spatially distributed similarly and select genes that show similar expression patterns from desired genes or characteristics.
  • the present invention obtains genetic information spatially associated with a specific desired target substance or molecule, or transfers transcriptome distribution information (gene expression pattern) imaged as a two-dimensional image to different spaces for comparison between different tissues.
  • transcriptome distribution information gene expression pattern
  • the present invention can be used as a method to compare changes in transcriptome distribution information (gene expression pattern) caused by diseases or drugs, etc. between different tissues, and can be actively used and applied in various pathophysiology research and new drug development. You can.
  • Figure 1 is a conceptual diagram showing a spatial transcriptome information analysis system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the spatial transcriptome information analysis device of FIG. 1.
  • FIG. 3 is a flow chart showing a spatial transcriptome information analysis method performed in the spatial transcriptome information analysis system of FIG. 1.
  • Figure 4 is a conceptual diagram showing spots constituting spatial transcriptome data.
  • Figure 5 is a diagram showing a visualization image in which spatial transcriptome data is visualized in a conventional manner.
  • FIG. 6 is a diagram illustrating the process of clustering reconstructed data in the spatial transcriptome information analysis device of FIG. 2.
  • Figure 7 is a diagram explaining the principle of reconstructing spatial transcriptome data into reconstruction data.
  • Figure 8 is a diagram showing a visualization image visualizing reconstructed data.
  • Figure 9 is a diagram showing genes clustered based on the similarity of characteristic values of reconstructed data.
  • Figure 10 is a diagram illustrating the spatial expression pattern of the clustered gene set.
  • Figure 11 is a graph showing correlation evaluation between genes in the clustered gene set compared to the simulated gene set.
  • Figure 12 is a graph showing the evaluation of the discrimination power of the clustered gene set compared to the simulated gene set.
  • Figures 13a and 13b are diagrams showing two-dimensional images of the gene set (G) matching the fiber tract of anatomical tissue and the corresponding spatial region.
  • Figure 14 is a diagram showing two-dimensional images of genes extracted related to the molecular and pathological characteristics and functions of tissues.
  • Figure 15 is a diagram illustrating clusters with similar gene expression patterns in tissue space.
  • Figure 16 is a diagram showing a two-dimensional image generated for the main gene(s) representing the characteristics of the cluster in Figure 15.
  • Figures 17a and 17b are diagrams showing normalized images obtained by performing spatial normalization on two-dimensional images of five different tissues.
  • Figure 18a is a normalized image showing the gene expression patterns of five tissues with different exposed heme concentrations
  • Figure 18b is a diagram showing spatially similar gene sets selected by analyzing the correlation with the pixel values of the normalized image of Figure 18a. am.
  • the spatial transcriptome information analysis system 1000 uses spatial transcriptome information to generate a two-dimensional image of transcriptome distribution using spatial transcriptome information, or to analyze the gene expression pattern of a tissue using spatial transcriptome information. It may be a system for comparative analysis of gene expression patterns between different tissues.
  • the spatial transcriptome information analysis system 1000 is connected to a user terminal 300 and the user terminal 300 through a network, as shown in FIG. 1, and spatial transcriptome information It may include an image generating device 200 that generates a two-dimensional image of transcript distribution using spatial transcript information.
  • the user terminal 300 corresponds to a computing device connected through a network to the image generating device 200, which will be described later, and can be implemented as, for example, a desktop, laptop, tablet PC, or smartphone, and can be implemented as an image generating device. (200) and may include a network interface for network connection and a user input/output interface for user input/output.
  • the user terminal 300 may correspond to a mobile terminal and may be connected to the image generating device 200 through cellular communication or Wi-Fi communication.
  • the user terminal 300 may correspond to a desktop and may be connected to the image creation device 200 through the Internet.
  • the image generating device 200 is connected to the user terminal 300 through a network and can receive requests or commands from the user terminal 300 or transmit requests or commands to the user terminal 300.
  • location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information (R 1 , ...) corresponding to each of the plurality of spots (P 1 , ..., P N) . , R N ) various configurations are possible as a server for generating two-dimensional images (T 1 , ..., T K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data.
  • the spatial transcriptome data includes location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and the plurality of spots (P 1 , ..., P N and N are natural numbers and may be total data consisting of transcript information (R 1 , ..., R N ) corresponding to each spot (total number of spots).
  • the spots (P 1 , ..., P N ) refer to small areas on the tissue image (TI), and each spot (P 1 , ..., P N ) contains transcript information (R 1 , ..., R) as gene expression information. N ) can correspond to each.
  • Spatial transcriptome data ⁇ (Pn, Rn)
  • the transcript information may include information about the expression level of each of a plurality of transcripts (A 1 , ..., A M , M is the total number of transcripts).
  • information about the expression level of each transcript may be information about the expression level of each gene.
  • the plurality of spots (P 1 , ..., P N ) are spaced apart from each other, and between the spots (P 1 , ..., P N ) is an empty space (V, area).
  • transcript information (R 1 , ..., R N ) in the empty space (V) between the spots (P 1 , ..., P N ) is unknown, so biological understanding and visual interpretation using spatial transcriptome data is limited. This occurs.
  • Figure 5 shows spatial transcriptome data by drawing a circle (polygons such as hexagons are also possible) around the midpoint of each spot (P 1 , ..., P N ) and varying the color or density according to the transcript expression level (gene expression level). This is a drawing visualized as an image, and was created by visualizing spatial transcriptome data using conventional technology.
  • the reconstructed data is data that reconstructs spatial transcriptome data, and the empty space (V) between the plurality of spots (P 1 , ..., P N ) without transcript information (R 1 , ..., R N ) is It may be data reconstructed to be interpolated.
  • the principle of reconstructing the spatial transcriptome data into reconstruction data is to infer the transcriptome information of the empty space (V) between the plurality of spots (P 1 , ..., P N ).
  • the reconstruction data is such that the expression level of each of the plurality of transcripts (A 1 , ..., A M ) is the central coordinate (C 1 , ..., C) of the plurality of spots (P 1 , ..., P N ). It may be data reconstructed by assuming that it is distributed according to a continuous probability distribution centered on N ).
  • the reconstructed data may include transcript distribution information, which may mean the expression level (gene expression level) of each transcript (A 1 , ..., A M ).
  • the expression levels of the plurality of transcripts (A 1 ,..., AM ) follow a continuous probability distribution centered on the central coordinates (C 1 ,..., C N ) of the plurality of spots (P 1 ,..., P N ). Assuming that they are distributed, by summing up each spot (P 1 , ..., P N ), transcript distribution information for each transcript (A 1 , ..., A M ) can be obtained as reconstruction data.
  • the continuous probability distribution may be a normal distribution with the central coordinates (C 1 , ..., C N ) as the median and a preset dispersion value, but is not limited thereto.
  • Figure 7 is a schematic diagram showing the principle of reconstructing spatial transcriptome data into reconstruction data, and the reconstruction data may be data continuously distributed from an image perspective.
  • the transcript expression level (gene expression level) from a specific spot (Pn) stochastically follows a spatial continuous probability distribution (ex, normal distribution) (i.e., the center of the spot (Pn)
  • a spatial continuous probability distribution ex, normal distribution
  • the transcript expression level (gene expression level) obtained from the spot (Pn) decreases as the distance from the coordinate (Cn) decreases, and through the process of adding this for all spots (P 1 , ..., P N )
  • an image can be obtained by reconstructing spatial transcriptome data composed of coordinates into a dense two-dimensional matrix.
  • the image generating device 200 is a two-dimensional image (T 1 , ..., T K , K is the number of two-dimensional images) that visualizes the reconstructed data by varying the color or density according to the transcript expression level (gene expression level). ) can be created.
  • T 1 a two-dimensional image
  • T K the number of two-dimensional images
  • One two-dimensional image can be created per transcript (gene), and each of more than 20,000 genes can be displayed as a two-dimensional image.
  • the two-dimensional images (T 1 , ..., T K ) can be generated for each transcript (A 1 , ..., A M ).
  • M two-dimensional images (T 1 , ..., T M ) corresponding to M transcripts (A 1 , ..., A M ) may be generated.
  • one two-dimensional image (T 1 , ..., T M ) includes transcript distribution information for several transcripts (A 1 , ..., A M ) is also possible.
  • Figure 8 is an example of a two-dimensional image generated by the image generating device 200, in which the reconstructed data is visualized as a two-dimensional image by varying the color or density for each location depending on the transcript expression level (gene expression level). It is a drawing.
  • Figure 8 is a two-dimensional image (T 1 , ..., T K ) visualizing the reconstructed data that reconstructs the spatial transcriptome data in Figure 5, and shows the transcript expression level (gene expression level) in two dimensions in pixel units. It shows the results expressed in matrix form and as an image.
  • FIG. 5 The image can be reconstructed and appear as a two-dimensional image as shown in Figure 8.
  • the two-dimensional images (T 1 , ..., T K ) generated through the image generating device 200 have continuous transcript distribution information, they can be effectively used for biological understanding and visual interpretation of tissues.
  • the image generating device 200 receives reconstruction data from an external database (DB) or user terminal 300 to generate two-dimensional images (T 1 , ..., T K ), or receives spatial transcriptome data. It can be reconstructed using reconstruction data.
  • DB external database
  • T K two-dimensional images
  • generating a two-dimensional image by reconstructing spatial transcriptome data into reconstruction data means changing the data structure of the spatial transcriptome data to the level of a multidimensional image.
  • a method can be proposed that enables clustering of transcript distribution information (gene expression information) and spatial comparison of different tissues, and it can provide a basic technology that can solve existing unresolved problems.
  • the present invention is very useful in that it can be a useful technology not only for companies that have spatial transcriptome data production and analysis technology, but also for companies that can develop new drugs using the derived candidate substances (markers).
  • the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 can be used for genetic screening to extract genes with similar spatial distribution.
  • the present invention can lead to a method that can cluster genes with similar images, that is, similar spatial gene expression, for tens of thousands of two-dimensional images.
  • the genetic screening method performed using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 is a two-dimensional image generated by the two-dimensional image generating device 200 (T This is a method of extracting genes with similar spatial distribution using 1 , ..., T K ).
  • the genetic screening method includes a feature value extraction step of extracting feature values of two-dimensional images (T 1 , ..., T K ), and generating a cluster (CLT) by clustering based on the similarity of the feature values. It may include a clustering step and a gene extraction step to derive a gene set (G) associated with the cluster (CLT).
  • the characteristic values show the image characteristics of two-dimensional images (T 1 , ..., T K ), and may be data reduced from the reconstructed data to low-dimensional data.
  • dimension reduction algorithms PCA, LDA, etc.
  • ANN artificial neural network model
  • the artificial neural network model can be an artificial neural network model (ANN) that is trained in an unsupervised manner using reconstruction data as learning data and can output characteristic values for two-dimensional images (T 1 , ..., T K ). there is.
  • the artificial neural network model includes a first neural network (ANNa) that compresses the reconstructed data into low-dimensional data, and a second neural network (ANNb) that restores the compressed low-dimensional data to the original dimension and outputs the reconstructed data. ) may include.
  • the characteristic value may be a latent vector value expressed as the low-dimensional data.
  • the clustering step is a step of performing clustering using an unsupervised learning-based clustering algorithm, and various clustering algorithms can be used.
  • the clustering algorithm can be a variety of unsupervised learning-based algorithms such as K-mean clustering, ISODATA, Mean shift, Gaussian Mixture Model, DBSCAN, and Self-organizing Map.
  • K-mean clustering ISODATA
  • Mean shift Mean shift
  • Gaussian Mixture Model DBSCAN
  • Self-organizing Map Self-organizing Map
  • the clustering algorithm is K-mean clustering
  • various techniques such as the elbow technique, silhouette technique, and loss function can be used to determine the optimal number of clusters, and is not limited to a specific method.
  • At least one cluster that clusters the two-dimensional images (T 1 , ..., T K ) can be generated.
  • the gene extraction step is a step of deriving a gene set (G) associated with a cluster (CLT), where the gene set (G) is a gene set (G) associated with a two-dimensional image (T 1 , ..., T K ) belonging to the same cluster (CLT). It can refer to a set of genes (transcripts (A 1 , ..., A M )).
  • Genes (transcripts (A 1 , ..., A M )) belonging to the gene set (G) derived from the same cluster (CLT) are genes with similar spatial distribution patterns and have anatomical/pathological/functional similarities. It can be understood as
  • the gene extraction step may further include a step of final selection of genes to be included in the gene set (G) based on the evaluation index for the cluster (CLT).
  • the above evaluation index is a validity index for the cluster (CLT) (an index for quantifying the quality of clustering), including the degree to which the data within the cluster (CLT) are aggregated, the degree of separation between clusters (CLT), and the connectivity within the cluster (CLT).
  • CLT validity index
  • Various indicators such as silhouette values and correlation coefficients can be used as a means to evaluate.
  • the evaluation index is an optimization tool to derive genes (transcriptomes) with more similar spatial distribution. For example, by calculating the silhouette value, only genes (transcriptomes) with positive values are included in the cluster (CLT). You can.
  • the gene extraction step may further include measuring the correlation between gene pairs within a cluster (CLT) by calculating a correlation coefficient in order to utilize the gene expression level (transcript expression level).
  • CLT correlation between gene pairs within a cluster
  • the correlation coefficient is calculated using the Spearman correlation coefficient, and transcripts (genes) that satisfy the correlation coefficient r>0.1 and p-value ⁇ 0.001 are selected, and finally, a gene set (G) for each cluster is created. It can be derived.
  • the statistical significant difference used in the optimization step of the gene set (G) can be based on a commonly used statistical cutoff.
  • a statistically significant difference may be less than or equivalent to a p-value of 0.05, 0.01, 0.005, or 0.001.
  • Figure 9 shows several clusters (CLT) with similar characteristic values by reducing the characteristic values (potential vector values) of the two-dimensional images (T 1 , ..., T K ) for each gene (transcript) to two dimensions.
  • An illustration (tSNE) visualizing the clustered groups is shown.
  • Figure 10 is a visualization of the spatial distribution pattern of genes (e.g., ACTA2, DES, IGHA2, MYH11) belonging to the same cluster (CLT) in Figure 9 and having a similar spatial distribution pattern to the representative image of the cluster (CLT). Represents an image.
  • genes e.g., ACTA2, DES, IGHA2, MYH11
  • Figure 11 is a graph showing correlation evaluation between genes in the gene set (G) within the cluster (CLT) compared to the simulated gene set.
  • the simulation starts from the 2000 gene sets with the greatest deviation (the 2000 genes with the greatest deviation between spots are extracted using spatial transcriptome data with segmentation annotation of lung normal tissue). .
  • a random gene set was created for each cluster (CLT) equal to the total number of genes in each of the seven clusters (CLT) derived according to the present invention.
  • the gene set (G) derived according to the present invention showed higher correlation of genes (transcriptomes) within the cluster (CLT). This shows that the gene set (G) derived according to the present invention has spatially similar expression patterns.
  • Figure 12 shows the results of evaluating the discrimination power of the gene set (G) within the cluster (CLT) compared to the simulated gene set of Figure 11.
  • Figure 12 calculates the signature score for each gene set for all spots by considering the gene set as one signature, and calculates the mean square (MS) and F ratio using an analysis of variance (ANOVA) test for the already known segmentation information. The value was calculated.
  • the signature score of the gene set (G) derived according to the present invention is the signature score of the simulated gene set. It can be seen that the divided areas can be distinguished better compared to . Therefore, it can be seen that the gene set (G) derived according to the present invention is a gene (transcriptome) with a similar spatial distribution pattern and a gene set that is highly concentrated in biological structure or function.
  • Figures 13a and 13b show the spatial distribution pattern of each gene (transcript) as a two-dimensional image using the present invention, and then extracting genes with similar distribution patterns to select a gene set (G) related to anatomical and functional characteristics. It shows one case.
  • Figure 13a is the result of extracting a white matter region containing the fiber tract of the mouse brain fiber tissue when clustering based on the transcriptome data of the spots.
  • Figure 13b shows the genetic screening method for deriving genes with similar spatial distribution patterns according to the present invention applied to the mouse brain spatial transcriptome data of Figure 13a.
  • G a gene set corresponding to one cluster (CLT)
  • CLT cluster
  • the gene set (G) derived through the present invention matched a specific spatial region of the mouse brain, and the characteristics of the genes correspond to genes related to myelination function expressed in the transmission pathway of fibrous tissue, which is an anatomical region of the mouse brain. You can see it happening.
  • the gene set (G) with similar spatial distribution derived using the present invention is enriched in the gene set (G) related to anatomical structure and functional characteristics.
  • Figure 14 shows an example of selecting a gene set (G) related to pathological and functional characteristics by using the present invention to create a two-dimensional image of the spatial distribution pattern for each gene (transcriptome) and then extracting genes with similar distribution patterns. It was done.
  • Figure 14 uses spatial transcriptome data of the mouse brain (Buzzi et al., 2021), which is public data, and this spatial transcriptome data shows that after exposing the mouse brain to various concentrations of heme, heme Gene sets that can explain molecular pathological characteristics such as exposure have been disclosed.
  • a gene set (G) with a similar spatial distribution pattern was extracted using the genetic screening method according to the present invention, and the heme exposure suggested by Buzzi was extracted from the gene set (G) of one cluster (CLT). It was confirmed that 15 genes out of the top 20 signature genes were extracted. Therefore, it can be seen that the gene set (G) with similar spatial distribution derived using the present invention can be used as a gene set (G) with molecular pathological and functional characteristics.
  • the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 are used to compare and analyze gene expression patterns between tissue images (TI) of different tissues. It can be used for comparative analysis of gene expression.
  • the comparative analysis method of gene expression between tissues performed using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device (200) is 2 generated by the two-dimensional image generating device (200). This is a method of comparing and analyzing gene expression patterns between tissue images (TI) of different tissues using dimensional images (T 1 , ..., T K ).
  • the comparative analysis method of gene expression between tissues performs spatial normalization on the two-dimensional images (T 1 , ..., T K ) generated for each of the different tissue images (TI) to produce a spatial normalized image (A spatial normalization step of generating S 1 , ..., S K ), and comparing the spatial normalization images (S1, ..., SK) to different tissue images (TI). It may include a comparative analysis step of comparing and analyzing the gene expression patterns between pixels for each pixel.
  • the two-dimensional image (T 1 , ..., T K ) is an image visualizing the spatial distribution pattern using reconstruction data, and is an image visualizing the spatial distribution pattern for a specific gene (transcript) or a spatial distribution pattern for a specific gene (transcript). It may be an image that visualizes the distribution pattern that is the sum of the expression levels of genes belonging to clusters with similar distribution patterns.
  • genes belonging to clusters with similar spatial distribution patterns are clustered after the spots (P 1 , ..., P N ) constituting the spatial transcriptome data are clustered, and then the characteristics of the clustered spots (P 1 , ..., P N ) are determined. It may be a gene selected as a major gene or a gene with a similar spatial distribution pattern extracted by the genetic screening method described above.
  • Figure 15 shows clusters with similar spatial gene expression patterns to be distinguished
  • Figure 16 shows two-dimensional images (T 1 , ..., T K ) generated for the main gene(s) showing the characteristics of the cluster in Figure 15. It shows two-dimensional images (T 1 , ..., T K ) generated for each of four different tissues.
  • Figure 16 is two-dimensional images (T 1 , ..., T K ) of four different tissues, mutual spatial comparison is difficult, but the present invention provides mutual comparison between two-dimensional images (T 1 , ..., T K ). Comparison between different organizations can be made possible by normalizing to make it possible.
  • Spatial normalization for the two-dimensional images is not limited to a specific method, and as an example, the symmetric image normalization method (SyN) can be applied.
  • Figure 17a shows normalized images (S 1 , ..., S K ) generated by normalizing the two-dimensional images (T 1 , ..., T K ) of five different tissues, respectively. It can be seen that the dog's tissues can be compared and analyzed for each pixel.
  • Figure 17b is also a normalized image (S 1 ,..., S K ) obtained by normalizing the two-dimensional images (T 1 , ..., T K ) of five different tissues in Figure 17a, through which the inter-tissue tissue for a single gene or gene set can be determined. Comparative analysis of mutual expression patterns may become possible.
  • Figure 18a uses spatial transcriptome data (Buzzi et al., 2021) from five different mouse brains exposed to different heme concentrations.
  • Figure 18a is also a diagram showing that spatial transcriptome data in different spaces can be compared through spatial normalization.
  • the method for comparative analysis of gene expression between tissues can compare and analyze spatial gene expression patterns of different tissues by comparing normalized images (S 1 , ..., S K ) of different tissues for each pixel.
  • the above-described genetic screening method and the comparative analysis method of gene expression between tissues may be performed in a separate computing device or may be performed in the two-dimensional image generating device 200 described above.
  • the above-described two-dimensional image generation method, genetic screening method, and inter-tissue gene expression comparative analysis method can be implemented as a program executable on a computer.
  • the spatial transcriptome information analysis system 1000 includes a user terminal 300 and spatial transcriptome information connected to the user terminal 300 through a network. It may include an analysis device 100.
  • the spatial transcriptome information analysis device 100 is an analysis device for performing at least one analysis method among the above-described two-dimensional image generation method, genetic screening method, and inter-tissue gene expression comparative analysis method, and is used to analyze spatial transcriptome information.
  • a system capable of integrated analysis can be provided.
  • the spatial transcriptome information analysis device 100 includes location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and the plurality of spots (P 1 , ..., P N ), an information receiving unit 110 that receives spatial transcriptome data consisting of transcript information (R 1 , ..., R N ) corresponding to each of the transcript information (R 1 , ..., R A data reconstruction unit 120 that calculates reconstruction data by reconstructing the spatial transcriptome data so that the empty space between the plurality of spots (P 1 , ..., P N ) without N ) is interpolated, and based on the reconstruction data It may include a transcriptome information analysis unit 130 that analyzes gene expression patterns.
  • the information receiving unit 110 is configured to receive spatial transcriptome data and can be configured in various ways. It receives spatial transcriptome data from the user terminal 300 or receives spatial transcriptome data from a separate database (DB, 400). Data can be received. The information receiving unit 110 may perform the function of receiving commands or requests from the user terminal 300 as well as receiving spatial transcriptome data.
  • the data reconstruction unit 120 uses the spatial transcriptome data to interpolate empty spaces between the plurality of spots ( P1 , ..., PN ) without the transcript information ( R1 , ..., RN ).
  • Various configurations are possible for calculating reconstructed reconstruction data.
  • the transcriptome information analysis unit 130 can be configured in various configurations to analyze gene expression patterns based on reconstruction data.
  • the transcriptome information analysis unit 130 can use the reconstruction data to analyze gene expression patterns within tissues or to compare and analyze gene expression patterns between different tissues.
  • the gene expression pattern here may be a gene expression pattern of the same tissue or a gene expression pattern of a different tissue.
  • the transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts feature values of the reconstructed data, and a cluster (CLT) that clusters the reconstructed data based on the similarity of feature values. It may include a clustering unit 134 that generates.
  • the feature extraction unit 132 may extract the feature value by reducing the reconstructed data to low-dimensional data.
  • the feature extraction unit 132 may extract the feature value using a dimension reduction algorithm that reduces the reconstructed data to low-dimensional data.
  • the dimensionality reduction algorithm is not limited to a specific algorithm, and examples may include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), etc.
  • PCA Principal Component Analysis
  • LDA Linear Discriminant Analysis
  • the feature extraction unit 132 is a feature extractor that extracts feature values of the reconstructed data, and may include an artificial neural network model (ANN) that compresses the reconstructed data into low-dimensional data.
  • ANN artificial neural network model
  • the characteristic value may be a latent vector value expressed as the low-dimensional data.
  • the clustering unit 134 can perform clustering using an unsupervised learning-based clustering algorithm and derive a gene set (G) associated with the cluster (CLT).
  • the clustering unit 134 may finally select genes to be included in the gene set (G) based on at least one of the silhouette value and correlation coefficient of the cluster (CLT).
  • the spatial transcriptome information analysis device 100 produces a two-dimensional image (T 1 , T 1 , ..., T K ) may additionally include an image generator 140 that generates.
  • the image generator 140 is configured to visualize the spatial distribution of the plurality of transcripts (A 1 , ..., A M ). Gene expression pattern analysis is possible only with reconstructed data, and it is possible in cases where a visualization image is not necessary. Therefore, of course, in this case, the image generator 140 can be omitted.
  • the image generator 140 may be configured identically or similarly to the two-dimensional image generator 200 described in detail above, so detailed description will be omitted to the extent of overlap.
  • the image generator 140 When comparative analysis of gene expression patterns between different tissues is required, the image generator 140 generates the two-dimensional images (T 1 , ..., T K ) for each of the different tissue images (TI). can be created.
  • the spatial transcriptome information analysis device 100 performs spatial normalization on the two-dimensional images (T 1 , ..., T K ) to generate spatial normalized images (S 1 , ..., S K ).
  • a normalization unit 150 may be additionally included.
  • the spatial normalization unit 150 is configured to perform spatial normalization on the two-dimensional images (T 1 , ..., T K ) to generate spatial normalization images (S 1 , ..., S K ), and can be configured in various ways. And, spatial normalization for the two-dimensional images (T 1 , ..., T K ) is not limited to a specific method. As an example of a spatial normalization method, the symmetric image normalization method (SyN) can be applied.
  • the transcriptome information analysis unit 130 compares the spatial normalization images (S 1 , ..., S K ) with respect to the different tissue images (TI) to determine the different tissue images (TI). Gene expression patterns can be compared and analyzed for each pixel.
  • the two-dimensional image spatial normalization method through the spatial normalization unit 150 and the gene expression pattern comparative analysis method through the transcriptome information analysis unit 130 were previously described in detail in the inter-tissue gene expression comparative analysis method in the overlapping range. Detailed explanation is omitted.
  • the spatial transcriptome information analysis device 100 may further include an information transmission unit 160 for transmitting the gene expression pattern analysis results to a database (DB, 400) or a user terminal 300.
  • DB database
  • 400 user terminal
  • the above-mentioned spatial transcriptome information analysis device 100 is a device for providing a method of analyzing spatial transcriptome data by reconstructing it into reconstructed data.
  • the spatial transcriptome information analysis device 100 is an integrated spatial transcriptome using reconstructed data. It can provide a means for analyzing transcriptome information.
  • the spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100 may include at least one of a two-dimensional image generation method using reconstructed data, a genetic screening method, and a comparative analysis method of gene expression between tissues. You can.
  • the spatial transcriptome analysis method is a two-dimensional image generation method, and includes a receiving step (S301) of receiving spatial transcriptome data, and converting the spatial transcriptome data into reconstruction data. It may include a data reconstruction step (S302) of reconstruction and a two-dimensional image generation step (S302) of generating a two-dimensional image visualizing transcript distribution information (gene distribution information) using the reconstruction data.
  • the spatial transcriptome analysis method is a genetic screening method, and includes a reception step (S301) of receiving spatial transcriptome data, and a data reconstruction step of reconstructing the spatial transcriptome data into reconstruction data ( S302) and a gene extraction step (S304) of extracting genes with similar spatial distribution using the reconstruction data.
  • the genetic screening method additionally includes a two-dimensional image generation step (S302) of generating a two-dimensional image (T 1 , ..., T K ) visualizing the transcript distribution information (gene distribution information) using the reconstruction data. It can be included.
  • the spatial transcriptome analysis method according to the present invention is a comparative analysis method of gene expression between tissues, and includes a receiving step (S301) of receiving spatial transcriptome data for each different tissue image (TI), A data reconstruction step (S302) in which spatial transcriptome data is reconstructed into reconstruction data, and a two-dimensional image generation step in which the two-dimensional images (T 1 , ..., T K ) are generated for each of the different tissue images (TI).
  • S302 and a spatial normalization step of performing spatial normalization on the two-dimensional images (T 1 , ..., T K ) to generate spatial normalized images (S 1 , ..., S K ), and different tissue images
  • the spatial transcriptome information analysis method performed using the spatial transcriptome information analysis device 100 described above can be implemented through a computer-executable spatial transcriptome information analysis program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a spatial transcriptome information analysis apparatus and a spatial transcriptome information analysis method using same and to a spatial transcriptome information analysis apparatus using reconstitution data in which spatial transcriptome data is reconstituted to make interpolation in empty spaces lacking transcriptome information on a tissue image (TI) and a spatial transcriptome information analysis method using same. The present invention discloses a spatial transcriptome information analysis apparatus (100) comprising: an information receptor (110) for receiving spatial transcriptome data consisting of location information for a plurality of spots (P1, …, PN) spaced from each other on a tissue image (TI) and transcriptome information (R1, …, RN) respectively corresponding to the plurality of spots (P1, …, PN; a data reconstituting unit (120) for generating reconstituted data in which the spatial transcriptome data is reconstituted to make interpolation in the empty space lacking the transcriptome information (R1, …, RN) between the plurality of spots (P1, …, PN); and a spatial transcriptome information analyzer (130) for analyzing a gene expression pattern on the basis of the reconstituted data.

Description

공간전사체정보 분석장치 및 이를 이용한 분석방법Spatial transcriptome information analysis device and analysis method using the same
본 발명은 공간전사체정보 분석장치 및 이를 이용한 분석방법에 관한 것으로서, 조직이미지 상 전사체정보가 없는 빈 공간이 보간되도록 공간전사체정보를 재구성한 재구성데이터를 이용한 분석장치 및 이를 이용한 분석방법에 관한 것이다.The present invention relates to a spatial transcriptome information analysis device and an analysis method using the same. The present invention relates to an analysis device using reconstructed data that reconstructs spatial transcriptome information so that empty spaces without transcript information in tissue images are interpolated, and an analysis method using the same. It's about.
공간전사체데이터는 공간에 따른 위치정보와 전사체정보(유전자들의 발현정보)를 담은 데이터의 총합을 지칭한다. 공간전사체데이터는 수백에서 수만개의 스팟으로 구성된 데이터로, 상기 스팟은 조직의 아주 작은 부분을 의미한다. 즉, 공간전사체데이터는 조직의 위치 정보와 해당 조직의 유전자들의 발현정보로 구성된 데이터이다.Spatial transcriptome data refers to the sum of data containing spatial location information and transcriptome information (gene expression information). Spatial transcriptome data is data consisting of hundreds to tens of thousands of spots, and the spot refers to a very small part of the tissue. In other words, spatial transcriptome data is data composed of tissue location information and expression information of genes in that tissue.
공간전사체데이터는 수만가지의 유전자 발현정보에 대해 수천-수만 가지의 작은 공간적 영역(스팟) 별 데이터가 분석되어야 하며, 스팟들의 위치정보까지 더해져 있어 적절한 분석 방법이 요구된다.Spatial transcriptome data requires analysis of tens of thousands of small spatial regions (spots) of tens of thousands of gene expression information, and the location information of the spots is also added, so an appropriate analysis method is required.
또한 전사체정보를 가지는 스팟 사이의 간격은 모두 채워진 것이 아니어서 전사체정보가 없는 빈 공간에 대한 유전자발현 정보는 헤아릴 수 없어 생물학적 이해 및 시각적 해석에 제한이 있는 실정이다.In addition, the gaps between spots containing transcript information are not all filled, so gene expression information for empty spaces without transcript information cannot be calculated, which limits biological understanding and visual interpretation.
이에 더하여, 서로 다른 공간상에 위치하는 다른 조직이나 여러 조직 샘플들 사이의 유전자 발현정보를 비교할 때 샘플 간의 위치와 모양이 동일하지 않기 때문에 여러 농도처리를 한 약물 데이터, 발달진행과정 데이터, 조건이 다른 데이터 등 비교분석을 할 때 어려움이 있다.In addition, when comparing gene expression information between different tissues or multiple tissue samples located in different spaces, the location and shape of the samples are not the same, so drug data processed at various concentrations, developmental progress data, and conditions are different. There are difficulties when conducting comparative analysis of other data.
따라서 공간적으로 비슷한 패턴을 보이는 유전자군을 선별하거나 서로 다른 조직 간의 유전자발현의 양상 비교를 가능하게 하는 공간전사체데이터 분석기술의 필요성이 크다.Therefore, there is a great need for spatial transcriptome data analysis technology that allows selection of gene groups showing spatially similar patterns or comparison of gene expression patterns between different tissues.
본 발명의 목적은, 상기와 같은 문제점 및 필요성을 인식하여, 공간전사체정보가 없는 스팟 사이 빈 공간의 정보를 유추하여 공간 상 비슷한 발현패턴을 보이는 유전자셋을 선별하거나 서로 다른 조직 간의 유전자발현의 양상 비교가 용이하게 할 수 있는 공간전사체정보 분석장치 및 이를 이용한 공간전사체정보 분석방법을 제공하는데 있다.The purpose of the present invention is to recognize the problems and needs described above, and to infer information in the empty space between spots without spatial transcriptome information to select gene sets showing similar expression patterns in space or to determine gene expression between different tissues. The goal is to provide a spatial transcriptome information analysis device that can easily compare aspects and a spatial transcriptome information analysis method using the same.
본 발명은 상기와 같은 본 발명의 목적을 달성하기 위하여 창출된 것으로서, 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 공간전사체데이터를 수신하는 정보수신부(110)와; 상기 전사체정보(R1, …, RN)가 없는 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간이 보간되도록 상기 공간전사체데이터를 재구성한 재구성데이터를 산출하는 데이터재구성부(120)와; 상기 재구성데이터를 기초로 유전자발현패턴을 분석하는 전사체정보분석부(130);를 포함하는 공간전사체정보 분석장치(100)를 개시한다.The present invention was created to achieve the object of the present invention as described above, and includes location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and the plurality of spots (P 1 , ..., P N ) and an information receiving unit 110 that receives spatial transcript data consisting of transcript information (R 1 , ..., R N ) corresponding to each of them; Data reconstruction to calculate reconstructed data in which the spatial transcriptome data is reconstructed so that empty spaces between the plurality of spots (P 1 , ..., P N ) without the transcript information (R 1 , ..., R N ) are interpolated. Boo (120) and; Disclosed is a spatial transcriptome information analysis device 100 including a transcriptome information analysis unit 130 that analyzes gene expression patterns based on the reconstructed data.
상기 전사체정보(R1, …, RN)는 복수의 전사체(A1, …, AM)들 각각의 발현량에 대한 정보를 포함할 수 있다.The transcript information (R 1 , …, R N ) may include information about the expression level of each of the plurality of transcripts (A 1 , …, A M ).
상기 유전자발현패턴은, 상기 조직이미지(TI)와 동일한 조직의 유전자발현패턴 또는 다른 조직의 유전자발현패턴일 수 있다.The gene expression pattern may be a gene expression pattern of the same tissue as the tissue image (TI) or a gene expression pattern of a different tissue.
상기 재구성데이터는, 상기 복수의 전사체(A1, …, AM)들 각각에 대해 상기 발현량이 상기 복수의 스팟(P1, …, PN)들의 중앙좌표(C1, …, CN)를 중심으로 연속확률분포를 따라 분포되는 것으로 가정하여 재구성된 전사체분포정보를 포함할 수 있다.The reconstruction data is such that, for each of the plurality of transcripts (A 1 , ..., A M ) , the expression level is determined by the central coordinates (C 1 , ..., C N ) can include reconstructed transcriptome distribution information assuming that it is distributed according to a continuous probability distribution centered on ).
상기 연속확률분포는, 상기 중앙좌표(C1, …, CN)를 중앙값으로 하고 미리 설정된 분산값을 가지는 정규분포일 수 있다.The continuous probability distribution may be a normal distribution with the central coordinates (C 1 , ..., C N ) as the median and a preset dispersion value.
상기 공간전사체정보분석장치(100)는, 상기 전사체분포정보로부터 상기 복수의 전사체(A1, …, AM)들의 공간상 분포를 시각화한 2차원이미지(T1, …, TK)를 생성하는 이미지생성부(140)를 추가로 포함할 수 있다.The spatial transcriptome information analysis device 100 produces a two-dimensional image (T 1 , ..., T K visualizing the spatial distribution of the plurality of transcripts (A 1 , ..., A M ) from the transcript distribution information ) may additionally include an image generator 140 that generates.
상기 전사체정보분석부(130)는, 상기 재구성데이터의 특성값을 추출하는 특성추출부(132)와, 상기 특성값의 유사도를 기준으로 상기 재구성데이터를 군집화한 클러스터(CLT)를 생성하는 군집화부(134)를 포함할 수 있다.The transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts characteristic values of the reconstructed data, and a clustering unit that generates a cluster (CLT) that clusters the reconstructed data based on the similarity of the characteristic values. It may include unit 134.
상기 특성추출부(132)는, 상기 재구성데이터를 저차원데이터로 축소하여 상기 특성값을 추출할 수 있다.The feature extraction unit 132 may extract the feature values by reducing the reconstructed data to low-dimensional data.
상기 특성추출부(132)는, 상기 재구성데이터를 저차원데이터로 압축하는 인공신경망모델을 포함할 수 있다.The feature extraction unit 132 may include an artificial neural network model that compresses the reconstructed data into low-dimensional data.
상기 인공신경망모델은, 상기 재구성데이터를 학습데이터로 할 수 있다.The artificial neural network model may use the reconstruction data as learning data.
상기 특성값은 상기 저차원데이터로 표현되는 잠재벡터값일 수 있다.The characteristic value may be a latent vector value expressed as the low-dimensional data.
상기 군집화부(134)는, 비지도학습 기반의 군집화알고리즘을 이용해 군집화를 수행할 수 있다.The clustering unit 134 may perform clustering using an unsupervised learning-based clustering algorithm.
상기 군집화부(134)는, 상기 클러스터(CLT)에 연관된 유전자셋(G)을 도출할 수 있다.The clustering unit 134 can derive a gene set (G) associated with the cluster (CLT).
상기 군집화부(134)는, 상기 클러스터(CLT)의 실루엣값 및 상관계수 중 적어도 어느 하나를 기초로 상기 유전자셋(G)에 포함될 유전자를 최종 선별할 수 있다.The clustering unit 134 may finally select genes to be included in the gene set (G) based on at least one of the silhouette value and the correlation coefficient of the cluster (CLT).
상기 이미지생성부(140)는, 서로 다른 조직이미지(TI)들 각각에 대해 상기 2차원이미지(T1, …, TK)를 생성할 수 있다.The image generator 140 may generate the two-dimensional images (T 1 , ..., T K ) for each of the different tissue images (TI).
상기 공간전사체정보분석장치(100)는, 상기 2차원이미지(T1, …, TK)에 대해 공간정규화를 수행하여 공간정규화이미지(S1, …, SK)를 생성하는 공간정규화부(150)를 추가로 포함할 수 있다.The spatial transcriptome information analysis device 100 includes a spatial normalization unit that performs spatial normalization on the two-dimensional images (T 1 , …, T K ) to generate spatial normalized images (S 1 , …, S K ). (150) may be additionally included.
상기 전사체정보분석부(130)는, 상기 서로 다른 조직이미지(TI)들에 대해 상기 공간정규화이미지(S1, …, SK)를 상호 비교하여 상기 서로 다른 조직이미지(TI)들 사이의 유전자발현패턴을 비교 분석할 수 있다.The transcriptome information analysis unit 130 compares the spatial normalized images (S 1 , ..., S K ) for the different tissue images (TI) to determine the difference between the different tissue images (TI). Gene expression patterns can be compared and analyzed.
다른 측면에서 본 발명은 공간전사체정보분석장치(100)와; 상기 공간전사체정보 분석장치(100)와 네트워크를 통해 연결되는 사용자단말(300)을 포함하는 것을 특징으로 하는 공간전사체정보 분석시스템(1000)을 개시한다.In another aspect, the present invention includes a spatial transcriptome information analysis device (100); A spatial transcriptome information analysis system (1000) is disclosed, which includes the spatial transcriptome information analysis device (100) and a user terminal (300) connected through a network.
다른 측면에서 본 발명은 공간전사체정보 분석장치(100)를 이용한 공간전사체정보 분석방법을 개시한다.In another aspect, the present invention discloses a spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100.
다른 측면에서 본 발명은 공간전사체정보 분석방법을 수행하기 위한 컴퓨터 실행가능한 공간전사체정보 분석프로그램을 개시한다.In another aspect, the present invention discloses a computer-executable spatial transcriptome information analysis program for performing a spatial transcriptome information analysis method.
다른 측면에서 본 발명은 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 공간전사체데이터를 재구성한 재구성데이터를 이용해 전사체분포에 대한 2차원이미지(T1, …, TK)를 생성하는 이미지생성장치(200)를 개시한다.In another aspect, the present invention provides location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information corresponding to each of the plurality of spots (P 1 , ..., P N ). Disclosed is an image generation device 200 that generates a two-dimensional image (T 1 , …, T K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data composed of (R 1 , …, R N ). do.
상기 재구성데이터는, 상기 복수의 전사체(A1, …, AM)들 각각의 발현량이 상기 복수의 스팟(P1, …, PN)들의 중앙좌표(C1, …, CN)를 중심으로 연속확률분포를 따라 분포되는 것으로 가정하여 재구성된 전사체분포정보를 포함할 수 있다.In the reconstructed data, the expression level of each of the plurality of transcripts (A 1 , ..., AM ) is determined by the central coordinates (C 1 , ..., C N ) of the plurality of spots (P 1 , ..., P N ). It can include reconstructed transcriptome distribution information assuming that it is distributed along a continuous probability distribution with the center.
다른 측면에서 본 발명은 2차원이미지생성장치(200)에서 생성된 상기 2차원이미지(T1, …, TK)를 이용해 공간상 유사한 분포를 가지는 유전자를 추출하는 유전자스크리닝방법을 개시한다.In another aspect, the present invention discloses a genetic screening method for extracting genes with similar spatial distribution using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200.
다른 측면에서 본 발명은 2차원이미지생성장치(200)에서 서로 다른 조직의 조직이미지(TI)에 대해 생성된 상기 2차원이미지(T1, …, TK)를 대비해 상기 서로 다른 조직의 조직이미지(TI) 사이의 유전자발현패턴을 비교 분석하는 조직간 유전자발현 비교분석방법을 개시한다.In another aspect, the present invention provides a tissue image (TI) of different tissues in contrast to the two-dimensional images (T1, ..., TK) generated for the tissue images (TI) of different tissues in the two-dimensional image generating device 200. ) Discloses a method for comparative analysis of gene expression between tissues.
본 발명에 따른 공간전사체정보 분석장치 및 이를 이용한 분석방법은, 전사체정보가 없는 스팟 사이 빈 공간의 정보를 유추하여 공간 상 비슷한 발현패턴을 보이는 유전자셋을 선별하거나 서로 다른 조직 간의 유전자발현의 양상 비교가 용이하게 할 수 있고, 이를 통해 더 나은 생물학적 및 기능적 이해와 새로운 통찰력을 제공할 수 있는 이점이 있다.The spatial transcriptome information analysis device and analysis method using the same according to the present invention infer information in the empty space between spots without transcript information to select gene sets showing similar expression patterns in space or to determine gene expression between different tissues. This has the advantage of facilitating aspect comparison and providing better biological and functional understanding and new insights.
구체적으로 본 발명은, 조직내 유전자발현 및 공간정보를 바탕으로 전사체정보가 없는 빈 공간에서 유전자 발현값을 유추하여, 전사체분포정보(유전자발현패턴)에 대한 2차원이미지를 생성할 수 있고, 또한 공간적으로 유사하게 분포하는 유전자들을 찾아내고 원하는 유전자 또는 특성으로부터 비슷한 발현패턴을 보이는 유전자들을 선택할 수 있다. Specifically, the present invention can generate a two-dimensional image of transcript distribution information (gene expression pattern) by inferring gene expression values in an empty space without transcript information based on gene expression and spatial information in tissues. , You can also find genes that are spatially distributed similarly and select genes that show similar expression patterns from desired genes or characteristics.
또한, 본 발명은 이를 통해 특정 원하는 타겟 물질 또는 분자와 공간적으로 연관된 유전적 정보를 획득하거나 또는 서로 다른 조직간 비교를 위해 2차원이미지로 영상화한 전사체분포정보(유전자발현패턴)를 서로 다른 공간상 존재하는 조직을 정규화 함으로써 서로 다른 조직간 비교를 가능하게 할 수 있다.In addition, the present invention obtains genetic information spatially associated with a specific desired target substance or molecule, or transfers transcriptome distribution information (gene expression pattern) imaged as a two-dimensional image to different spaces for comparison between different tissues. By normalizing existing organizations, it is possible to make comparisons between different organizations possible.
더 나아가, 본 발명은 질병 또는 약제 등에 의한 전사체분포정보(유전자발현패턴)의 변화를 서로 다른 조직 간에도 비교할 수 있는 방법으로 활용할 수 있고, 다양한 병태생리 연구 및 신약개발에 활발하게 활용되고 응용될 수 있다. Furthermore, the present invention can be used as a method to compare changes in transcriptome distribution information (gene expression pattern) caused by diseases or drugs, etc. between different tissues, and can be actively used and applied in various pathophysiology research and new drug development. You can.
도 1은, 본 발명의 일 실시예에 따른 공간전사체정보 분석시스템을 보여주는 개념도이다.Figure 1 is a conceptual diagram showing a spatial transcriptome information analysis system according to an embodiment of the present invention.
도 2는, 도 1의 공간전사체정보 분석장치를 보여주는 블록도이다.FIG. 2 is a block diagram showing the spatial transcriptome information analysis device of FIG. 1.
도 3은, 도 1의 공간전사체정보 분석시스템에서 수행되는 공간전사체정보 분석방법을 보여주는 플로우차트이다.FIG. 3 is a flow chart showing a spatial transcriptome information analysis method performed in the spatial transcriptome information analysis system of FIG. 1.
도 4는, 공간전사체데이터를 구성하는 스팟을 보여주는 개념도이다.Figure 4 is a conceptual diagram showing spots constituting spatial transcriptome data.
도 5는, 공간전사체데이터를 종래 방식으로 시각화한 시각화이미지를 보여주는 도면이다.Figure 5 is a diagram showing a visualization image in which spatial transcriptome data is visualized in a conventional manner.
도 6은, 도 2의 공간전사체정보 분석장치에서 재구성데이터를 군집화하는 과정을 설명하는 도면이다.FIG. 6 is a diagram illustrating the process of clustering reconstructed data in the spatial transcriptome information analysis device of FIG. 2.
도 7은, 공간전사체데이터를 재구성데이터로 재구성하는 원리를 설명하는 도면이다.Figure 7 is a diagram explaining the principle of reconstructing spatial transcriptome data into reconstruction data.
도 8은, 재구성데이터를 시각화한 시각화이미지를 보여주는 도면이다.Figure 8 is a diagram showing a visualization image visualizing reconstructed data.
도 9는, 재구성데이터의 특성값의 유사도를 기준으로 군집화된 유전자를 보여주는 도면이다.Figure 9 is a diagram showing genes clustered based on the similarity of characteristic values of reconstructed data.
도 10은, 군집화된 유전자셋의 공간 상 발현패턴을 이미지화한 도면이다.Figure 10 is a diagram illustrating the spatial expression pattern of the clustered gene set.
도 11은, 시뮬레이션 유전자셋과 비교하여 군집화된 유전자셋에서의 유전자 간 상관관계 평가를 도시한 그래프이다.Figure 11 is a graph showing correlation evaluation between genes in the clustered gene set compared to the simulated gene set.
도 12는, 시뮬레이션 유전자셋과 비교하여 군집화된 유전자셋의 분별력 평가를 도시한 그래프이다.Figure 12 is a graph showing the evaluation of the discrimination power of the clustered gene set compared to the simulated gene set.
도 13a 및 도 13b는, 해부학 상 조직의 섬유조직의 전달경로(fiber tract)와 해당 공간영역과 일치하는 유전자셋(G)의 2차원이미지들을 보여주는 도면이다.Figures 13a and 13b are diagrams showing two-dimensional images of the gene set (G) matching the fiber tract of anatomical tissue and the corresponding spatial region.
도 14는, 조직의 분자적 병리학적 특징 및 기능과 관련되어 추출된 유전자들의 2차원이미지들을 보여주는 도면이다.Figure 14 is a diagram showing two-dimensional images of genes extracted related to the molecular and pathological characteristics and functions of tissues.
도 15는, 조직 공간 상 유전자 발현패턴이 유사한 군집(Cluster)이 구분되도록 도시한 도면이다.Figure 15 is a diagram illustrating clusters with similar gene expression patterns in tissue space.
도 16은, 도 15의 군집의 특성을 나타내는 주요 유전자(들)에 대해 생성된 2차원이미지를 보여주는 도면이다.Figure 16 is a diagram showing a two-dimensional image generated for the main gene(s) representing the characteristics of the cluster in Figure 15.
도 17a 및 도 17b는, 서로 다른 5개의 조직의 2차원이미지에 대한 공간정규화를 시행한 정규화이미지를 보여주는 도면이다.Figures 17a and 17b are diagrams showing normalized images obtained by performing spatial normalization on two-dimensional images of five different tissues.
도 18a은, 노출된 헴 농도가 다른 5개의 조직의 유전자 발현패턴을 보여주는 정규화이미지이며, 도 18b는 도 18a의 정규화이미지의 픽셀 값으로 상관관계를 분석하여 선별된 공간적으로 유사한 유전자셋을 보여주는 도면이다.Figure 18a is a normalized image showing the gene expression patterns of five tissues with different exposed heme concentrations, and Figure 18b is a diagram showing spatially similar gene sets selected by analyzing the correlation with the pixel values of the normalized image of Figure 18a. am.
이하 본 발명에 따른 공간전사체정보 분석시스템(1000)에 관하여 첨부된 도면을 참조하여 설명하면 다음과 같다.Hereinafter, the spatial transcriptome information analysis system 1000 according to the present invention will be described with reference to the attached drawings.
상기 공간전사체정보 분석시스템(1000)은 공간전사체정보를 이용해 공간전사체정보를 이용해 전사체분포에 대한 2차원이미지를 생성하거나, 공간전사체정보를 이용해 조직의 유전자발현패턴을 분석하거나 또는 서로 다른 조직 간 유전자발현패턴을 비교 분석하기 위한 시스템일 수 있다.The spatial transcriptome information analysis system 1000 uses spatial transcriptome information to generate a two-dimensional image of transcriptome distribution using spatial transcriptome information, or to analyze the gene expression pattern of a tissue using spatial transcriptome information. It may be a system for comparative analysis of gene expression patterns between different tissues.
예로서, 본 발명에 따른 공간전사체정보 분석시스템(1000)은, 도 1에 도시된 바와 같이, 사용자단말(300)과, 상기 사용자단말(300)과 네트워크를 통해 연결되며, 공간전사체정보를 이용해 공간전사체정보를 이용해 전사체분포에 대한 2차원이미지를 생성하는 이미지생성장치(200)를 포함할 수 있다.As an example, the spatial transcriptome information analysis system 1000 according to the present invention is connected to a user terminal 300 and the user terminal 300 through a network, as shown in FIG. 1, and spatial transcriptome information It may include an image generating device 200 that generates a two-dimensional image of transcript distribution using spatial transcript information.
상기 사용자단말(300)은, 후술하는 이미지생성장치(200)와 네트워크를 통해 연결되는 컴퓨팅 장치에 해당하고, 예를 들어, 데스크톱, 노트북, 태블릿 PC 또는 스마트폰으로 구현될 수 있으며, 이미지생성장치(200)와 네트워크 연결을 위한 네트워크 인터페이스 및 사용자입출력을 위한 사용자입력/출력 인터페이스를 포함할 수 있다.The user terminal 300 corresponds to a computing device connected through a network to the image generating device 200, which will be described later, and can be implemented as, for example, a desktop, laptop, tablet PC, or smartphone, and can be implemented as an image generating device. (200) and may include a network interface for network connection and a user input/output interface for user input/output.
예로서, 상기 사용자단말(300)은, 모바일 단말에 해당할 수 있고, 이미지생성장치(200)와 셀룰러 통신 또는 와이파이 통신을 통해 연결될 수 있다.For example, the user terminal 300 may correspond to a mobile terminal and may be connected to the image generating device 200 through cellular communication or Wi-Fi communication.
다른 예로서, 상기 사용자단말(300)은, 데스크톱에 해당할 수 있고, 이미지생성장치(200)와 인터넷을 통해 연결될 수 있다.As another example, the user terminal 300 may correspond to a desktop and may be connected to the image creation device 200 through the Internet.
상기 이미지생성장치(200)는, 사용자단말(300)과 네트워크를 통해 연결되며, 상기 사용자단말(300)로부터의 요청이나 명령을 수신하거나 또는 사용자단말(300)로 요청이나 명령을 송신할 수 있으며, 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 공간전사체데이터를 재구성한 재구성데이터를 이용해 전사체분포에 대한 2차원이미지(T1, …, TK)를 생성하기 위한 서버로서 다양한 구성이 가능하다.The image generating device 200 is connected to the user terminal 300 through a network and can receive requests or commands from the user terminal 300 or transmit requests or commands to the user terminal 300. , location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information (R 1 , ...) corresponding to each of the plurality of spots (P 1 , ..., P N) . , R N ), various configurations are possible as a server for generating two-dimensional images (T 1 , ..., T K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data.
상기 공간전사체데이터는, 도 4에 도시된 바와 같이, 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN, N은 자연수로서 스팟의 총 개수)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 전체 데이터일 수 있다.As shown in FIG. 4, the spatial transcriptome data includes location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and the plurality of spots (P 1 , ..., P N and N are natural numbers and may be total data consisting of transcript information (R 1 , ..., R N ) corresponding to each spot (total number of spots).
상기 스팟(P1, …, PN)은, 조직이미지(TI) 상 작은 영역을 의미하며 각 스팟(P1, …, PN)에 유전자 발현정보로서 전사체정보(R1, …, RN)가 각각 대응될 수 있다.The spots (P 1 , ..., P N ) refer to small areas on the tissue image (TI), and each spot (P 1 , ..., P N ) contains transcript information (R 1 , ..., R) as gene expression information. N ) can correspond to each.
공간전사체데이터 = {(Pn, Rn)|1≤n≤N, N은 자연수로서 스팟의 총 개수}Spatial transcriptome data = {(Pn, Rn)|1≤n≤N, N is a natural number, total number of spots}
여기서, 전사체정보(R1, …, RN)는, 복수의 전사체(A1, …, AM, M은 전사체 총 개수)들 각각의 발현량에 대한 정보를 포함할 수 있다. 여기서, 각 전사체(A1, …, AM) 발현량에 대한 정보는 각 유전자 발현량에 대한 정보일 수 있다.Here, the transcript information (R 1 , …, R N ) may include information about the expression level of each of a plurality of transcripts (A 1 , …, A M , M is the total number of transcripts). Here, information about the expression level of each transcript (A 1 , ..., A M ) may be information about the expression level of each gene.
상기 복수의 스팟(P1, …, PN)들은 서로 이격되어 있고 스팟(P1, …, PN) 사이 사이는 전사체정보(R1, …, RN)가 없는 빈 공간(V, 영역)일 수 있다.The plurality of spots (P 1 , ..., P N ) are spaced apart from each other, and between the spots (P 1 , ..., P N ) is an empty space (V, area).
즉, 스팟(P1, …, PN) 사이 빈 공간(V)에서의 전사체정보(R1, …, RN)는 알 수 없기 때문에 공간전사체데이터를 이용한 생물학적 이해 및 시각적 해석에 제한이 발생된다.In other words, the transcript information (R 1 , …, R N ) in the empty space (V) between the spots (P 1 , …, P N ) is unknown, so biological understanding and visual interpretation using spatial transcriptome data is limited. This occurs.
도 5는 각 스팟(P1, …, PN)의 중점 주변으로 원(육각형 등의 다각형도 가능)을 그리고 전사체 발현량(유전자 발현량)에 따라 색상이나 농도를 달리 하여 공간전사체데이터를 이미지로 시각화한 도면으로, 종래 기술을 통해 공간전사체데이터를 시각화하는 방식으로 생성된 것이다.Figure 5 shows spatial transcriptome data by drawing a circle (polygons such as hexagons are also possible) around the midpoint of each spot (P 1 , ..., P N ) and varying the color or density according to the transcript expression level (gene expression level). This is a drawing visualized as an image, and was created by visualizing spatial transcriptome data using conventional technology.
스팟(P1, …, PN) 사이에는 전사체 발현량(유전자 발현량)에 대한 정보가 없으므로, 이미지 관점에서는 희박하게(sparse) 분포되는 데이터이다.Since there is no information on the transcript expression level (gene expression level) between the spots (P 1 , ..., P N ), the data is sparsely distributed from an image perspective.
상기 재구성데이터는, 공간전사체데이터를 재구성한 데이터로, 전사체정보(R1, …, RN)가 없는 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간(V)이 보간되도록 재구성된 데이터일 수 있다.The reconstructed data is data that reconstructs spatial transcriptome data, and the empty space (V) between the plurality of spots (P 1 , ..., P N ) without transcript information (R 1 , ..., R N ) is It may be data reconstructed to be interpolated.
상기 공간전사체데이터를 재구성데이터로 재구성하는 원리는, 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간(V)의 전사체정보를 유추하는 것이다.The principle of reconstructing the spatial transcriptome data into reconstruction data is to infer the transcriptome information of the empty space (V) between the plurality of spots (P 1 , ..., P N ).
예로서, 상기 재구성데이터는, 상기 복수의 전사체(A1, …, AM)들 각각의 발현량이 상기 복수의 스팟(P1, …, PN)들의 중앙좌표(C1, …, CN)를 중심으로 연속확률분포를 따라 분포되는 것으로 가정하여 재구성된 데이터일 수 있다.As an example, the reconstruction data is such that the expression level of each of the plurality of transcripts (A 1 , ..., A M ) is the central coordinate (C 1 , ..., C) of the plurality of spots (P 1 , ..., P N ). It may be data reconstructed by assuming that it is distributed according to a continuous probability distribution centered on N ).
상기 재구성데이터는 전사체분포정보를 포함할 수 있는데, 전사체분포정보란 각 전사체(A1, …, AM)의 발현량(유전자 발현량)을 의미할 수 있다.The reconstructed data may include transcript distribution information, which may mean the expression level (gene expression level) of each transcript (A 1 , ..., A M ).
상기 복수의 전사체(A1, …, AM)들의 발현량이 상기 복수의 스팟(P1, …, PN)들의 중앙좌표(C1, …, CN)를 중심으로 연속확률분포를 따라 분포되는 것으로 가정하고, 각 스팟(P1, …, PN)에 대해 모두 합산하면 각 전사체(A1, …, AM) 별 전사체분포정보가 재구성데이터로서 얻어질 수 있다.The expression levels of the plurality of transcripts (A 1 ,..., AM ) follow a continuous probability distribution centered on the central coordinates (C 1 ,..., C N ) of the plurality of spots (P 1 ,..., P N ). Assuming that they are distributed, by summing up each spot (P 1 , …, P N ), transcript distribution information for each transcript (A 1 , …, A M ) can be obtained as reconstruction data.
상기 연속확률분포는, 상기 중앙좌표(C1, …, CN)를 중앙값으로 하고 미리 설정된 분산값을 가지는 정규분포일 수 있으나, 이에 한정되는 것은 아니다.The continuous probability distribution may be a normal distribution with the central coordinates (C 1 , ..., C N ) as the median and a preset dispersion value, but is not limited thereto.
도 7은 공간전사체데이터를 재구성데이터로 재구성하는 원리를 보여주는 모식도로서, 재구성데이터는 이미지 관점에서 연속적으로 분포되는 데이터일 수 있다. 도 7을 참조하면, 특정 스팟(Pn)에서 나오는 전사체 발현량(유전자 발현량)이 확률적으로 공간상 연속확률분포(ex, 정규분포)를 따른다고 가정(즉, 스팟(Pn)의 중앙좌표(Cn) 로부터 멀어질수록 스팟(Pn)에서 획득한 전사체 발현량(유전자 발현량)이 떨어진다고 가정)하고 모든 스팟(P1, …, PN)에 대해 이를 더하는 과정을 통해, 희소한 좌표로 구성된 공간전사체데이터를 밀집한 2차원 행렬로 재구성하여 이미지를 획득할 수 있음을 알 수 있다.Figure 7 is a schematic diagram showing the principle of reconstructing spatial transcriptome data into reconstruction data, and the reconstruction data may be data continuously distributed from an image perspective. Referring to Figure 7, it is assumed that the transcript expression level (gene expression level) from a specific spot (Pn) stochastically follows a spatial continuous probability distribution (ex, normal distribution) (i.e., the center of the spot (Pn) Assuming that the transcript expression level (gene expression level) obtained from the spot (Pn) decreases as the distance from the coordinate (Cn) decreases, and through the process of adding this for all spots (P 1 , ..., P N ), rare It can be seen that an image can be obtained by reconstructing spatial transcriptome data composed of coordinates into a dense two-dimensional matrix.
상기 이미지생성장치(200)는, 전사체 발현량(유전자 발현량)에 따라 색상이나 농도를 달리 하여 재구성데이터를 시각화한 2차원이미지(T1, …, TK, K는 2차원이미지의 개수)를 생성할 수 있다. 전사체(유전자) 1개당 하나의 2차원이미지가 만들어질 수 있고, 2만개 이상의 유전자 각각에 대하여 2차원 이미지로 나타날 수 있다.The image generating device 200 is a two-dimensional image (T 1 , ..., T K , K is the number of two-dimensional images) that visualizes the reconstructed data by varying the color or density according to the transcript expression level (gene expression level). ) can be created. One two-dimensional image can be created per transcript (gene), and each of more than 20,000 genes can be displayed as a two-dimensional image.
즉, 상기 2차원이미지(T1, …, TK)는, 각 전사체(A1, …, AM) 마다 생성될 수 있다. 예로서, M개의 전사체(A1, …, AM)에 대해 각각 대응되는 M개의 2차원이미지(T1, …, TM)가 생성될 수 있다. 하나의 2차원이미지(T1, …, TM)가 여러 개의 전사체(A1, …, AM)에 대한 전사체분포정보를 포함하는 실시예도 가능함은 물론이다.That is, the two-dimensional images (T 1 , …, T K ) can be generated for each transcript (A 1 , …, A M ). As an example, M two-dimensional images (T 1 , ..., T M ) corresponding to M transcripts (A 1 , ..., A M ) may be generated. Of course, an embodiment in which one two-dimensional image (T 1 , …, T M ) includes transcript distribution information for several transcripts (A 1 , …, A M ) is also possible.
도 8은, 상기 이미지생성장치(200)에서 생성된 2차원이미지의 일 예로서, 전사체 발현량(유전자 발현량)에 따라 위치 별 색상이나 농도를 달리 하여 재구성데이터를 2차원 이미지로 시각화한 도면이다. 특히, 도 8은 도 5에서의 공간전사체데이터를 재구성한 재구성데이터를 시각화한 2차원이미지(T1, …, TK)이며, 전사체 발현량(유전자 발현량)을 픽셀단위의 2차원 행렬형태로 나타내어 이미지로 표현한 결과를 보여준다.Figure 8 is an example of a two-dimensional image generated by the image generating device 200, in which the reconstructed data is visualized as a two-dimensional image by varying the color or density for each location depending on the transcript expression level (gene expression level). It is a drawing. In particular, Figure 8 is a two-dimensional image (T 1 , ..., T K ) visualizing the reconstructed data that reconstructs the spatial transcriptome data in Figure 5, and shows the transcript expression level (gene expression level) in two dimensions in pixel units. It shows the results expressed in matrix form and as an image.
본 발명을 통한 방법을 통해 스팟(P1, …, PN)의 전사체정보(유전자발현 정보)가 2차원 공간상에서 확률분포값으로 나타난다는 가정을 통해 2차원 공간으로 재구성하게 될 경우 도 5의 이미지가 도 8 과 같은 2차원 이미지로 재구성되어 나타날 수 있는 것이다.When the transcript information (gene expression information) of the spot (P 1 , ..., P N ) is reconstructed in a two-dimensional space through the method of the present invention by assuming that it appears as a probability distribution value in a two-dimensional space, FIG. 5 The image can be reconstructed and appear as a two-dimensional image as shown in Figure 8.
상기 이미지생성장치(200)를 통해 생성된 2차원이미지(T1, …, TK)는 연속적인 전사체분포정보를 가지고 있기 때문에, 조직의 생물학적 이해 및 시각적 해석에 효과적으로 활용될 수 있다.Since the two-dimensional images (T 1 , ..., T K ) generated through the image generating device 200 have continuous transcript distribution information, they can be effectively used for biological understanding and visual interpretation of tissues.
상기 이미지생성장치(200)는, 2차원이미지(T1, …, TK)를 생성하기 위해 외부의 데이터베이스(DB) 또는 사용자단말(300)로부터 재구성데이터를 전달받거나, 또는 공간전사체데이터를 재구성데이터로 재구성할 수 있다.The image generating device 200 receives reconstruction data from an external database (DB) or user terminal 300 to generate two-dimensional images (T 1 , ..., T K ), or receives spatial transcriptome data. It can be reconstructed using reconstruction data.
본 발명에서 공간전사체데이터를 재구성데이터로 재구성하여 2차원이미지를 생성하는 것은 공간전사체데이터를 다차원이미지 수준으로 데이터 구조를 변경한다는 것을 의미한다. 이를 통해 전사체분포정보(유전자 발현정보)에 대한 군집화와 서로 다른 조직에 대한 공간적 비교를 가능하게 하는 방법이 제시될 수 있고, 기존 해결되지 못한 문제를 풀 수 있는 기반기술을 제공할 수 있다. 즉, 본 발명은 공간전사체데이터 생산 및 분석기술을 가지고 있는 기업뿐만 아니라, 도출된 후보 물질 (마커)을 이용하여 신약을 개발할 수 있는 기업에 모두 유용한 기술이 될 수 있다는 측면에서 매우 유용하다.In the present invention, generating a two-dimensional image by reconstructing spatial transcriptome data into reconstruction data means changing the data structure of the spatial transcriptome data to the level of a multidimensional image. Through this, a method can be proposed that enables clustering of transcript distribution information (gene expression information) and spatial comparison of different tissues, and it can provide a basic technology that can solve existing unresolved problems. In other words, the present invention is very useful in that it can be a useful technology not only for companies that have spatial transcriptome data production and analysis technology, but also for companies that can develop new drugs using the derived candidate substances (markers).
활용 예로서, 상기 2차원이미지생성장치(200)에서 생성된 상기 2차원이미지(T1, …, TK)는 공간상 유사한 분포를 가지는 유전자를 추출하는 유전자스크리닝에 활용될 수 있다. 즉, 본 발명은 수만개의 2차원이미지에 대해 유사한 이미지, 즉 유사한 공간 유전자 발현을 갖는 유전자끼리 클러스터링 할 수 있는 방법으로 이어질 수 있다.As an example of use, the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 can be used for genetic screening to extract genes with similar spatial distribution. In other words, the present invention can lead to a method that can cluster genes with similar images, that is, similar spatial gene expression, for tens of thousands of two-dimensional images.
상기 2차원이미지생성장치(200)에서 생성된 상기 2차원이미지(T1, …, TK)를 이용해 수행되는 유전자스크리닝방법은, 2차원이미지생성장치(200)에서 생성된 2차원이미지(T1, …, TK)들을 이용해 공간상 유사한 분포를 가지는 유전자를 추출하는 방법이다.The genetic screening method performed using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 is a two-dimensional image generated by the two-dimensional image generating device 200 (T This is a method of extracting genes with similar spatial distribution using 1 , …, T K ).
보다 구체적으로 상기 유전자스크리닝방법은, 2차원이미지(T1, …, TK)들의 특성값을 추출하는 특성값 추출단계와, 상기 특성값의 유사도를 기준으로 군집화하여 클러스터(CLT)를 생성하는 군집화단계와, 클러스터(CLT)에 연관된 유전자셋(G)을 도출하는 유전자추출단계를 포함할 수 있다.More specifically, the genetic screening method includes a feature value extraction step of extracting feature values of two-dimensional images (T 1 , ..., T K ), and generating a cluster (CLT) by clustering based on the similarity of the feature values. It may include a clustering step and a gene extraction step to derive a gene set (G) associated with the cluster (CLT).
상기 특성값은 2차원이미지(T1, …, TK)들의 이미지특성을 보여주는 것으로, 상기 재구성데이터를 저차원데이터로 축소한 데이터일 수 있으며, 예로서, 차원축소알고리즘(PCA, LDA 등) 또는 인공신경망모델(ANN)을 이용해 추출될 수 있다.The characteristic values show the image characteristics of two-dimensional images (T 1 , ..., T K ), and may be data reduced from the reconstructed data to low-dimensional data. For example, dimension reduction algorithms (PCA, LDA, etc.) Alternatively, it can be extracted using an artificial neural network model (ANN).
상기 인공신경망모델(ANN)은 재구성데이터를 학습데이터로 비지도 방식으로 훈련되어 2차원이미지(T1, …, TK)들에 대한 특성값을 출력할 수 있는 인공신경망모델(ANN)일 수 있다.The artificial neural network model (ANN) can be an artificial neural network model (ANN) that is trained in an unsupervised manner using reconstruction data as learning data and can output characteristic values for two-dimensional images (T 1 , ..., T K ). there is.
예로서, 상기 인공신경망모델(ANN)은 상기 재구성데이터를 저차원데이터로 압축하는 제1신경망(ANNa)과 압축된 저차원데이터를 원본차원으로 복원하여 상기 재구성데이터를 출력하는 제2신경망(ANNb)을 포함할 수 있다.For example, the artificial neural network model (ANN) includes a first neural network (ANNa) that compresses the reconstructed data into low-dimensional data, and a second neural network (ANNb) that restores the compressed low-dimensional data to the original dimension and outputs the reconstructed data. ) may include.
이때, 상기 특성값은 상기 저차원데이터로 표현되는 잠재벡터값일 수 있다.At this time, the characteristic value may be a latent vector value expressed as the low-dimensional data.
상기 군집화단계는, 비지도학습 기반의 군집화알고리즘을 이용해 군집화를 수행하는 단계로서, 다양한 군집화알고리즘이 활용될 수 있다.The clustering step is a step of performing clustering using an unsupervised learning-based clustering algorithm, and various clustering algorithms can be used.
예로서, 상기 군집화알고리즘은, K-mean clustering, ISODATA, Mean shift, Gaussian Mixture Model, DBSCAN, Self-organizing Map 등 다양한 비지도학습 기반 알고리즘이 가능하다. 이때, 최적의 클러스터 개수는 종래 알려진 다양한 기법으로 산출될 수 있다.For example, the clustering algorithm can be a variety of unsupervised learning-based algorithms such as K-mean clustering, ISODATA, Mean shift, Gaussian Mixture Model, DBSCAN, and Self-organizing Map. At this time, the optimal number of clusters can be calculated using various conventionally known techniques.
일 실시예로서, 상기 군집화알고리즘이 K-mean clustering인 경우, 최적의 클러스터 개수는 엘보우 기법, 실루엣 기법, 손실함수 등 다양한 기법이 활용될 수 있고, 특정 방법에 한정되지 않는다.As an example, when the clustering algorithm is K-mean clustering, various techniques such as the elbow technique, silhouette technique, and loss function can be used to determine the optimal number of clusters, and is not limited to a specific method.
상기 군집화알고리즘을 통해, 상기 2차원이미지(T1, …, TK)들을 군집화한 적어도 1개 이상의 클러스터(CLT)가 생성될 수 있다.Through the clustering algorithm, at least one cluster (CLT) that clusters the two-dimensional images (T 1 , ..., T K ) can be generated.
상기 유전자추출단계는 클러스터(CLT)에 연관된 유전자셋(G)을 도출하는 단계로서, 여기서 유전자셋(G)이란 동일한 클러스터(CLT)에 속하는 2차원이미지(T1, …, TK)와 연관된 유전자(전사체(A1, …, AM))들의 집합을 의미할 수 있다.The gene extraction step is a step of deriving a gene set (G) associated with a cluster (CLT), where the gene set (G) is a gene set (G) associated with a two-dimensional image (T 1 , ..., T K ) belonging to the same cluster (CLT). It can refer to a set of genes (transcripts (A 1 , …, A M )).
동일한 클러스터(CLT)에서 도출된 유전자셋(G)에 속하는 유전자(전사체(A1, …, AM))들은 공간 상 분포패턴이 유사한 유전자들로서 해부학적/병리학적/기능적 유사성을 가지는 유전자들로 이해될 수 있다.Genes (transcripts (A 1 , ..., A M )) belonging to the gene set (G) derived from the same cluster (CLT) are genes with similar spatial distribution patterns and have anatomical/pathological/functional similarities. It can be understood as
또한, 상기 유전자추출단계는 클러스터(CLT)에 대한 평가지표를 기초로 상기 유전자셋(G)에 포함될 유전자를 최종 선별하는 단계를 추가로 포함할 수 있다.In addition, the gene extraction step may further include a step of final selection of genes to be included in the gene set (G) based on the evaluation index for the cluster (CLT).
상기 평가지표는, 클러스터(CLT)에 대한 유효성 지표(클러스터링의 품질을 정량화 하기 위한 지표)로서, 클러스터(CLT) 내 데이터들이 뭉쳐진 정도, 클러스터(CLT) 간 분리된 정도, 클러스터(CLT) 내 연결성을 평가할 수 있는 수단으로 실루엣값이나 상관계수 등 다양한 지표가 활용될 수 있다.The above evaluation index is a validity index for the cluster (CLT) (an index for quantifying the quality of clustering), including the degree to which the data within the cluster (CLT) are aggregated, the degree of separation between clusters (CLT), and the connectivity within the cluster (CLT). Various indicators such as silhouette values and correlation coefficients can be used as a means to evaluate.
평가지표는 좀 더 공간적 분포가 유사한 유전자(전사체)를 도출하기 위한 최적화 수단으로, 예로서 실루엣(silhouette) 값을 계산하여 양의 값을 지니는 유전자(전사체)만을 클러스터(CLT)에 포함시킬 수 있다.The evaluation index is an optimization tool to derive genes (transcriptomes) with more similar spatial distribution. For example, by calculating the silhouette value, only genes (transcriptomes) with positive values are included in the cluster (CLT). You can.
또한, 상기 유전자추출단계는, 유전자 발현량(전사체 발현량)을 활용하기 위하여 클러스터(CLT) 내의 유전자 쌍의 상관관계를 상관계수를 계산하여 측정하는 단계를 더 포함할 수 있다.In addition, the gene extraction step may further include measuring the correlation between gene pairs within a cluster (CLT) by calculating a correlation coefficient in order to utilize the gene expression level (transcript expression level).
예로서, 상관계수를 스피어만 상관계수(Spearman correlation coefficient)로 계산하여 상관계수 r>0.1과 p-value<0.001을 만족하는 전사체(유전자)를 선별하여 최종적으로 클러스터별 유전자셋(G)을 도출할 수 있다.As an example, the correlation coefficient is calculated using the Spearman correlation coefficient, and transcripts (genes) that satisfy the correlation coefficient r>0.1 and p-value<0.001 are selected, and finally, a gene set (G) for each cluster is created. It can be derived.
여기서 유전자셋(G)의 최적화 단계에서 사용되는 통계학적 유의차는 통상적으로 사용되고 있는 통계학적 컷오프를 기준으로 할 수 있다. 예를 들어, 통계학적 유의차는 0.05, 0.01, 0.005, 0.001 보다 적거나 이와 등가의 p-value일 수 있다.Here, the statistical significant difference used in the optimization step of the gene set (G) can be based on a commonly used statistical cutoff. For example, a statistically significant difference may be less than or equivalent to a p-value of 0.05, 0.01, 0.005, or 0.001.
관련하여 도 9는, 각 유전자(전사체)에 대한 2차원이미지(T1, …, TK)들의 특성값(잠재백터값)을 2차원으로 축소하여 특성값이 유사한 몇 개의 클러스터(CLT)로 군집화된 그룹을 시각화한 그림(tSNE)을 나타낸다.In relation to this, Figure 9 shows several clusters (CLT) with similar characteristic values by reducing the characteristic values (potential vector values) of the two-dimensional images (T 1 , ..., T K ) for each gene (transcript) to two dimensions. An illustration (tSNE) visualizing the clustered groups is shown.
도 10은, 도 9에서 동일한 클러스터(CLT)에 속하여 클러스터(CLT)의 대표이미지와 공간 상 분포패턴이 유사한 유전자들(예로서, ACTA2, DES, IGHA2, MYH11)의 공간 상 분포패턴을 시각화한 이미지를 나타낸다.Figure 10 is a visualization of the spatial distribution pattern of genes (e.g., ACTA2, DES, IGHA2, MYH11) belonging to the same cluster (CLT) in Figure 9 and having a similar spatial distribution pattern to the representative image of the cluster (CLT). Represents an image.
도 11은, 시뮬레이션 유전자셋과 비교하여 클러스터(CLT) 내의 유전자셋(G)에서의 유전자 간 상관관계 평가를 도시한 그래프이다.Figure 11 is a graph showing correlation evaluation between genes in the gene set (G) within the cluster (CLT) compared to the simulated gene set.
도 11에서 시뮬레이션은 가장 편차가 큰 2000개의 유전자셋(폐의 정상조직의 분할정보(segmentation annotation)가 있는 공간전사체데이터를 사용하여 스팟 간에 가장 편차가 큰 2000개의 유전자를 추출함)에서 시작한다. 본 발명에 따라 도출된 7개의 각 클러스터(CLT)의 유전자의 총 수만큼 해당 클러스터(CLT)별로 임의의 유전자셋을 만들어 진행하였다. 저차원데이터로 압축된 특성값을 군집화하여 공간적 분포가 유사한 유전자(전사체)를 찾아낸 유전자셋(G)과 시뮬레이션하여 임의로 뽑은 유전자셋과의 클러스터(CLT) 내 유전자쌍의 상관관계를 비교하였을 때, 본 발명에 따라 도출된 유전자셋(G)이 클러스터(CLT) 내 유전자(전사체)들의 상관관계가 더 높음을 보여주었다. 이는 본 발명에 따라 도출된 유전자셋(G)이 공간적으로 비슷한 발현패턴을 가지고 있음을 보여주는 것이다.In Figure 11, the simulation starts from the 2000 gene sets with the greatest deviation (the 2000 genes with the greatest deviation between spots are extracted using spatial transcriptome data with segmentation annotation of lung normal tissue). . A random gene set was created for each cluster (CLT) equal to the total number of genes in each of the seven clusters (CLT) derived according to the present invention. When comparing the correlation of gene pairs within a cluster (CLT) with a gene set (G) that found genes (transcriptomes) with similar spatial distribution by clustering feature values compressed with low-dimensional data and a gene set randomly selected through simulation. , the gene set (G) derived according to the present invention showed higher correlation of genes (transcriptomes) within the cluster (CLT). This shows that the gene set (G) derived according to the present invention has spatially similar expression patterns.
도 12는 도 11의 시뮬레이션 유전자셋과 비교하여 클러스터(CLT) 내 유전자셋(G)의 분별력을 평가한 결과를 도시한 것이다. 도 12는 유전자셋을 하나의 시그니처(signature)로 간주하여 모든 스팟에 대해 각 유전자셋 별 시그니처 스코어를 계산하고 이미 알고 있는 분할 정보에 대해 분산분석(ANOVA) 검정으로 평균제곱(MS)과 F 비율값을 계산하였다. 본 발명에 따라 도출된 유전자셋(G)의 시그니처 스코어와 시뮬레이션된 유전자셋들에 대해 시그니처 스코어를 비교했을 때 본 발명에 따라 도출된 유전자셋(G)의 시그니처 스코어가 시뮬레이션된 유전자셋의 시그니처 스코어에 비해 분할된 영역들을 더 잘 분별할 수 있는 것을 볼 수 있다. 따라서 본 발명에 따라 도출된 유전자셋(G)은 공간적 분포패턴이 유사한 유전자(전사체)이자 생물학적 구조 또는 기능에 고도로 농축된 유전자셋임을 알 수 있다.Figure 12 shows the results of evaluating the discrimination power of the gene set (G) within the cluster (CLT) compared to the simulated gene set of Figure 11. Figure 12 calculates the signature score for each gene set for all spots by considering the gene set as one signature, and calculates the mean square (MS) and F ratio using an analysis of variance (ANOVA) test for the already known segmentation information. The value was calculated. When comparing the signature scores of the gene set (G) derived according to the present invention and the signature scores of the simulated gene sets, the signature score of the gene set (G) derived according to the present invention is the signature score of the simulated gene set. It can be seen that the divided areas can be distinguished better compared to . Therefore, it can be seen that the gene set (G) derived according to the present invention is a gene (transcriptome) with a similar spatial distribution pattern and a gene set that is highly concentrated in biological structure or function.
도 13a 내지 도 13b는, 본 발명을 이용해 유전자(전사체) 별 공간상 분포패턴을 2차원이미지화 한 후 유사한 분포패턴을 가지는 유전자들을 추출하여 해부학적 및 기능적 특징과 관련된 유전자셋(G)을 선별한 사례를 도시한 것이다.Figures 13a and 13b show the spatial distribution pattern of each gene (transcript) as a two-dimensional image using the present invention, and then extracting genes with similar distribution patterns to select a gene set (G) related to anatomical and functional characteristics. It shows one case.
먼저, 도 13a에 도시된 영역은 스팟들의 전사체데이터를 바탕으로 클러스터링 하였을 때 마우스 뇌의 섬유조직의 전달경로(fiber tract)를 포함하는 백질영역이 추출된 결과이다. 도 13b는 본 발명에 따른 공간적 분포패턴이 유사한 유전자를 도출하는 유전자스크리닝방법을 도 13a의 마우스 뇌의 공간전사체 데이터에 적용한 것으로, 그 결과 하나의 클러스터(CLT)에 해당하는 유전자셋(G)에 속하는 유전자들의 발현패턴이 도 13a의 마우스 뇌의 섬유조직의 전달경로(fiber tract)를 포함하는 백질영역과 일치하였다. 또한 해당 유전자셋(G)에서 유전자 온톨로지 분석을 진행하였을 때 수초화(Myelination) 관련 유전자군(GO:0042552)이 확인되었다. First, the region shown in Figure 13a is the result of extracting a white matter region containing the fiber tract of the mouse brain fiber tissue when clustering based on the transcriptome data of the spots. Figure 13b shows the genetic screening method for deriving genes with similar spatial distribution patterns according to the present invention applied to the mouse brain spatial transcriptome data of Figure 13a. As a result, a gene set (G) corresponding to one cluster (CLT) The expression patterns of genes belonging to were consistent with the white matter region containing the fiber tract of the mouse brain in Figure 13a. Additionally, when gene ontology analysis was performed on the gene set (G), a gene group related to myelination (GO: 0042552) was identified.
따라서 본 발명을 통해 도출된 유전자셋(G)은 마우스 뇌의 특정 공간영역과 일치하였으며, 해당 유전자들의 특징은 마우스 뇌의 해부학적 영역인 섬유조직의 전달경로에서 발현되는 수초화 기능 관련 유전자들에 해당하는 것을 볼 수 있다.Therefore, the gene set (G) derived through the present invention matched a specific spatial region of the mouse brain, and the characteristics of the genes correspond to genes related to myelination function expressed in the transmission pathway of fibrous tissue, which is an anatomical region of the mouse brain. You can see it happening.
이를 통해 본 발명을 사용하여 도출된 공간적 분포가 유사한 유전자셋(G)은 해부학적 구조 및 기능적 특징과 관련된 유전자셋(G)이 농축되어 있는 것을 확인할 수 있다.Through this, it can be confirmed that the gene set (G) with similar spatial distribution derived using the present invention is enriched in the gene set (G) related to anatomical structure and functional characteristics.
도 14는 본 발명을 이용해 유전자(전사체) 별 공간상 분포패턴을 2차원이미지화 한 후 유사한 분포패턴을 가지는 유전자들을 추출하여 병리학적 및 기능적 특징과 관련된 유전자셋(G)을 선별한 사례를 도시한 것이다.Figure 14 shows an example of selecting a gene set (G) related to pathological and functional characteristics by using the present invention to create a two-dimensional image of the spatial distribution pattern for each gene (transcriptome) and then extracting genes with similar distribution patterns. It was done.
도 14는 공개된 데이터인 마우스 뇌의 공간전사체데이터(Buzzi et al., 2021)를 사용하였고, 이 공간전사체데이터는 마우스 뇌를 헴(heme)의 여러 가지 농도에 따라 노출시킨 후, 헴 노출과 같은 분자적 병리학적 특징을 설명해줄 수 있는 유전자셋이 공개되어 있다. Figure 14 uses spatial transcriptome data of the mouse brain (Buzzi et al., 2021), which is public data, and this spatial transcriptome data shows that after exposing the mouse brain to various concentrations of heme, heme Gene sets that can explain molecular pathological characteristics such as exposure have been disclosed.
마우스 뇌의 공간전사체데이터를 사용하여 본 발명에 따른 유전자스크리닝방법으로 공간적 분포패턴 유사한 유전자셋(G)을 추출하였고, 하나의 클러스터(CLT)의 유전자셋(G)에서 Buzzi가 제시한 헴 노출 시그니처 상위 20개의 유전자 중 15개의 유전자가 추출된 것을 확인하였다. 따라서 본 발명을 사용하여 도출된 공간적 분포가 유사한 유전자셋(G)은 분자적 병리학적 및 기능적 특징을 가진 유전자셋(G)으로 활용될 수 있음을 알 수 있다.Using the spatial transcriptome data of the mouse brain, a gene set (G) with a similar spatial distribution pattern was extracted using the genetic screening method according to the present invention, and the heme exposure suggested by Buzzi was extracted from the gene set (G) of one cluster (CLT). It was confirmed that 15 genes out of the top 20 signature genes were extracted. Therefore, it can be seen that the gene set (G) with similar spatial distribution derived using the present invention can be used as a gene set (G) with molecular pathological and functional characteristics.
다른 예로서, 상기 2차원이미지생성장치(200)에서 생성된 상기 2차원이미지(T1, …, TK)는 서로 다른 조직의 조직이미지(TI) 사이의 유전자발현패턴을 비교 분석하는 조직간 유전자발현 비교분석에 활용될 수 있다.As another example, the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 are used to compare and analyze gene expression patterns between tissue images (TI) of different tissues. It can be used for comparative analysis of gene expression.
상기 2차원이미지생성장치(200)에서 생성된 상기 2차원이미지(T1, …, TK)를 이용해 수행되는 조직간 유전자발현 비교분석방법은, 2차원이미지생성장치(200)에서 생성된 2차원이미지(T1, …, TK)들을 이용해 서로 다른 조직의 조직이미지(TI) 사이의 유전자발현패턴을 비교 분석하는 방법이다.The comparative analysis method of gene expression between tissues performed using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device (200) is 2 generated by the two-dimensional image generating device (200). This is a method of comparing and analyzing gene expression patterns between tissue images (TI) of different tissues using dimensional images (T 1 , …, T K ).
보다 구체적으로 상기 조직간 유전자발현 비교분석방법은, 서로 다른 조직이미지(TI)들 각각에 대해 생성된 2차원이미지(T1, …, TK)들에 대해 공간정규화를 수행하여 공간정규화이미지(S1, …, SK)를 생성하는 공간정규화단계와, 서로 다른 조직이미지(TI)들에 대해 상기 공간정규화이미지(S1, …, SK)를 상호 비교하여 상기 서로 다른 조직이미지(TI)들 사이의 유전자발현패턴을 각 픽셀마다 비교 분석하는 비교분석단계를 포함할 수 있다.More specifically, the comparative analysis method of gene expression between tissues performs spatial normalization on the two-dimensional images (T 1 , ..., T K ) generated for each of the different tissue images (TI) to produce a spatial normalized image ( A spatial normalization step of generating S 1 , …, S K ), and comparing the spatial normalization images (S1, …, SK) to different tissue images (TI). It may include a comparative analysis step of comparing and analyzing the gene expression patterns between pixels for each pixel.
여기서 상기 2차원이미지(T1, …, TK)는, 재구성데이터를 이용해 공간 상 분포패턴을 시각화한 이미지로서, 특정 유전자(전사체)에 대한 공간 상 분포패턴을 시각화한 이미지이거나 또는 공간 상 분포패턴이 유사한 군집에 속하는 유전자들의 발현량을 합산한 분포패턴을 시각화한 이미지일 수 있다. Here, the two-dimensional image (T 1 , ..., T K ) is an image visualizing the spatial distribution pattern using reconstruction data, and is an image visualizing the spatial distribution pattern for a specific gene (transcript) or a spatial distribution pattern for a specific gene (transcript). It may be an image that visualizes the distribution pattern that is the sum of the expression levels of genes belonging to clusters with similar distribution patterns.
여기서, 상기 공간 상 분포패턴이 유사한 군집에 속하는 유전자들은 공간전사체데이터를 구성하는 스팟(P1, …, PN)이 군집화된 후 군집화된 스팟(P1, …, PN)의 특성을 나타내는 주요 유전자로 선정된 유전자이거나 또는 상술한 유전자스크리닝방법에 의해 추출된 유사한 공간상 분포 패턴을 가지는 유전자일 수 있다.Here, genes belonging to clusters with similar spatial distribution patterns are clustered after the spots (P 1 , ..., P N ) constituting the spatial transcriptome data are clustered, and then the characteristics of the clustered spots (P 1 , ..., P N ) are determined. It may be a gene selected as a major gene or a gene with a similar spatial distribution pattern extracted by the genetic screening method described above.
도 15는 공간 상 유전자 발현패턴이 유사한 군집이 구분되도록 도시한 것이고, 도 16은 도 15의 군집의 특성을 나타내는 주요 유전자(들)에 대해 생성된 2차원이미지(T1, …, TK)로서, 서로 다른 4개의 조직 각각에 대해 생성되는 2차원이미지(T1, …, TK)를 도시한 것이다.Figure 15 shows clusters with similar spatial gene expression patterns to be distinguished, and Figure 16 shows two-dimensional images (T 1 , ..., T K ) generated for the main gene(s) showing the characteristics of the cluster in Figure 15. It shows two-dimensional images (T 1 , ..., T K ) generated for each of four different tissues.
도 16는 서로 다른 4개의 조직에 대한 2차원이미지(T1, …, TK)이기 때문에, 상호 공간적 비교가 어려우나, 본 발명은 2차원이미지(T1, …, TK) 사이의 상호 비교가 가능하도록 정규화 함으로써 서로 다른 조직간 비교가 가능해질 수 있다.Since Figure 16 is two-dimensional images (T 1 , ..., T K ) of four different tissues, mutual spatial comparison is difficult, but the present invention provides mutual comparison between two-dimensional images (T 1 , ..., T K ). Comparison between different organizations can be made possible by normalizing to make it possible.
상기 2차원이미지(T1, …, TK)에 대한 공간정규화는 특정 방법으로 제한되지 않으며, 예로서 symmetric image normalization method (SyN)가 적용될 수 있다.Spatial normalization for the two-dimensional images (T 1 , ..., T K ) is not limited to a specific method, and as an example, the symmetric image normalization method (SyN) can be applied.
도 17a는 서로 다른 5개의 조직에 대한 2차원이미지(T1, …, TK)를 각각 정규화하여 생성된 정규화이미지(S1, …, SK)를 도시한 것으로, 정규화를 통해 서로 다른 5개의 조직을 각 픽셀 마다 상호 비교 분석할 수 있음을 알 수 있다.Figure 17a shows normalized images (S 1 , ..., S K ) generated by normalizing the two-dimensional images (T 1 , ..., T K ) of five different tissues, respectively. It can be seen that the dog's tissues can be compared and analyzed for each pixel.
도 17b 또한 도 17a의 서로 다른 5개의 조직의 2차원이미지(T1, …, TK)를 정규화한 정규화이미지(S1, …, SK)로서 이를 통해 단일 유전자 또는 유전자셋에 대한 조직 간 상호 발현패턴 비교분석이 가능해질 수 있다.Figure 17b is also a normalized image (S 1 ,..., S K ) obtained by normalizing the two-dimensional images (T 1 , ..., T K ) of five different tissues in Figure 17a, through which the inter-tissue tissue for a single gene or gene set can be determined. Comparative analysis of mutual expression patterns may become possible.
도 18a는 경우 헴 농도에 따라 노출된 서로 다른 5개의 마우스 뇌의 공간전사체 데이터를(Buzzi et al., 2021) 사용한 것이다. 도 18a 또한 공간정규화를 통해 서로 다른 공간에 있는 공간전사체데이터 비교할 수 있음을 보여주는 도면이다.Figure 18a uses spatial transcriptome data (Buzzi et al., 2021) from five different mouse brains exposed to different heme concentrations. Figure 18a is also a diagram showing that spatial transcriptome data in different spaces can be compared through spatial normalization.
도 18a를 통해 서로 다른 공간에 있고 노출된 헴 농도가 다른 5개의 조직(Sham, Heme 50, Heme 125, Heme 500, Heme 1000)에 대해 헴에 해당하는 Hmox1 유전자의 공간적 발현 분포 및 상호간 발현 분포의 차이를 확인할 수 있고. 헴 노출 시그니처의 상위 유전자인 Hmox1, Mt2, Timp1, S100a6를 보았을 때 Buzzi et al.이 언급한 패턴을 5개의 서로 다른 마우스 뇌 데이터에서 확인할 수 있었다.Through Figure 18a, the spatial expression distribution and mutual expression distribution of the Hmox1 gene corresponding to heme for five tissues (Sham, Heme 50, Heme 125, Heme 500, Heme 1000) located in different spaces and having different exposed heme concentrations. You can see the difference. When looking at the top genes of the heme exposure signature, Hmox1, Mt2, Timp1, and S100a6, the pattern mentioned by Buzzi et al. was confirmed in five different mouse brain data.
본 발명에 따른 조직간 유전자발현 비교분석방법은, 서로 다른 조직의 정규화이미지(S1, …, SK)를 각 픽셀마다 비교함으로써 서로 다른 조직의 공간 상 유전자 발현패턴을 비교 분석할 수 있다.The method for comparative analysis of gene expression between tissues according to the present invention can compare and analyze spatial gene expression patterns of different tissues by comparing normalized images (S 1 , ..., S K ) of different tissues for each pixel.
또한, 도 18b를 참조하면, 도 18a에서 헴 노출 시그니처의 상위 유전자인 Hmox1, Mt2, Timp1, S100a6들의 정규화이미지(S1, …, SK)의 픽셀 값을 이용하여 피어슨 상관계수로 상관관계 분석을 수행한 결과 P-value <0.05로 유의하며 R >0.3 이상인 것으로 확인되었다. 이는 정규화이미지(S1, …, SK)(또는 2차원이미지(T1, …, TK))의 픽셀 값의 상관계수를 활용하여 공간적으로 유사한 분포패턴을 보이는 유전자셋을 선별할 수 있음을 보여준다. 즉, 공간적 유사하게 분포하는 유전자군을 선별하기 위해 픽셀(pixel) 값을 사용하여 상관계수를 구하고 유의한 수준의 p 값과 R값으로 컷오프할 수 있다.In addition, referring to Figure 18b, correlation analysis using the Pearson correlation coefficient using the pixel values of the normalized image (S 1 , ..., S K ) of Hmox1, Mt2, Timp1, and S100a6, which are the top genes of the heme exposure signature in Figure 18a. As a result, it was confirmed that P-value <0.05 was significant and R >0.3. This allows you to select gene sets that show spatially similar distribution patterns by utilizing the correlation coefficient of the pixel values of the normalized image (S 1 , …, S K ) (or 2-dimensional image (T 1 , …, T K )). shows. In other words, in order to select gene groups that are spatially distributed similarly, the correlation coefficient can be obtained using pixel values and cutoff with a significant p value and R value.
상술한 유전자스크리닝방법은 및 조직간 유전자발현 비교분석방법은 별도의 컴퓨팅장치에서 수행되거나 또는 상술한 2차원이미지생성장치(200)에서 수행될 수 있다.The above-described genetic screening method and the comparative analysis method of gene expression between tissues may be performed in a separate computing device or may be performed in the two-dimensional image generating device 200 described above.
또한, 상술한 2차원이미지생성방법, 유전자스크리닝방법 및 조직간 유전자발현 비교분석방법은 컴퓨터에서 실행 가능한 프로그램으로 구현될 수 있음은 물론이다.In addition, of course, the above-described two-dimensional image generation method, genetic screening method, and inter-tissue gene expression comparative analysis method can be implemented as a program executable on a computer.
다른 예로서, 본 발명에 따른 공간전사체정보 분석시스템(1000)은, 도 1에 도시된 바와 같이, 사용자단말(300)과, 상기 사용자단말(300)과 네트워크를 통해 연결되는 공간전사체정보 분석장치(100)를 포함할 수 있다.As another example, the spatial transcriptome information analysis system 1000 according to the present invention, as shown in FIG. 1, includes a user terminal 300 and spatial transcriptome information connected to the user terminal 300 through a network. It may include an analysis device 100.
상기 공간전사체정보 분석장치(100)는, 상술한 2차원이미지생성방법, 유전자스크리닝방법 및 조직간 유전자발현 비교분석방법 중 적어도 하나 이상의 분석방법을 수행하기 위한 분석장치로서, 공간전사체정보에 대한 통합분석이 가능한 시스템을 제공할 수 있다.The spatial transcriptome information analysis device 100 is an analysis device for performing at least one analysis method among the above-described two-dimensional image generation method, genetic screening method, and inter-tissue gene expression comparative analysis method, and is used to analyze spatial transcriptome information. A system capable of integrated analysis can be provided.
도 2에 도시된 바와 같이, 상기 공간전사체정보 분석장치(100)는, 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 공간전사체데이터를 수신하는 정보수신부(110)와, 상기 전사체정보(R1, …, RN)가 없는 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간이 보간되도록 상기 공간전사체데이터를 재구성한 재구성데이터를 산출하는 데이터재구성부(120)와, 상기 재구성데이터를 기초로 유전자발현패턴을 분석하는 전사체정보분석부(130)를 포함할 수 있다.As shown in FIG. 2, the spatial transcriptome information analysis device 100 includes location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and the plurality of spots (P 1 , …, P N ), an information receiving unit 110 that receives spatial transcriptome data consisting of transcript information (R 1 , …, R N ) corresponding to each of the transcript information (R 1 , …, R A data reconstruction unit 120 that calculates reconstruction data by reconstructing the spatial transcriptome data so that the empty space between the plurality of spots (P 1 , ..., P N ) without N ) is interpolated, and based on the reconstruction data It may include a transcriptome information analysis unit 130 that analyzes gene expression patterns.
상기 정보수신부(110)는, 공간전사체데이터를 수신하기 위한 구성으로 다양한 구성이 가능하며, 사용자단말(300)로부터 공간전사체데이터를 수신하거나 또는 별도의 데이터베이스(DB, 400)로부터 공간전사체데이터를 수신할 수 있다. 상기 정보수신부(110)는 공간전사체데이터 수신뿐만 아니라 사용자단말(300)로부터의 명령이나 요청을 수신하는 기능도 수행할 수 있다.The information receiving unit 110 is configured to receive spatial transcriptome data and can be configured in various ways. It receives spatial transcriptome data from the user terminal 300 or receives spatial transcriptome data from a separate database (DB, 400). Data can be received. The information receiving unit 110 may perform the function of receiving commands or requests from the user terminal 300 as well as receiving spatial transcriptome data.
상기 데이터재구성부(120)는, 상기 전사체정보(R1, …, RN)가 없는 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간이 보간되도록 상기 공간전사체데이터를 재구성한 재구성데이터를 산출하기 위한 구성으로 다양한 구성이 가능하다.The data reconstruction unit 120 uses the spatial transcriptome data to interpolate empty spaces between the plurality of spots ( P1 , ..., PN ) without the transcript information ( R1 , ..., RN ). Various configurations are possible for calculating reconstructed reconstruction data.
상기 재구성데이터를 산출하는 원리에 대해서는 앞서 2차원이미지생성장치(200)를 설명하며 상세히 기술하였으므로, 중첩되는 부분에 대해서는 생략하기로 한다.The principle of calculating the reconstructed data has been described in detail when explaining the two-dimensional image generating device 200, so overlapping parts will be omitted.
상기 전사체정보분석부(130)는, 재구성데이터를 기초로 유전자발현패턴을 분석하는 구성으로 다양한 구성이 가능하다.The transcriptome information analysis unit 130 can be configured in various configurations to analyze gene expression patterns based on reconstruction data.
상기 전사체정보분석부(130)는 재구성데이터를 이용해 조직 내 유전자발현패턴을 분석하거나 또는 서로 다른 조직들 사이의 유전자발현패턴을 비교 분석할 수 있다.The transcriptome information analysis unit 130 can use the reconstruction data to analyze gene expression patterns within tissues or to compare and analyze gene expression patterns between different tissues.
즉, 여기서 유전자발현패턴이란 동일한 조직의 유전자발현패턴이거나 또는 다른 조직의 유전자발현패턴일 수 있다.That is, the gene expression pattern here may be a gene expression pattern of the same tissue or a gene expression pattern of a different tissue.
상기 전사체정보분석부(130)는, 도 2를 참조하면, 재구성데이터의 특성값을 추출하는 특성추출부(132)와, 특성값의 유사도를 기준으로 상기 재구성데이터를 군집화한 클러스터(CLT)를 생성하는 군집화부(134)를 포함할 수 있다.Referring to FIG. 2, the transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts feature values of the reconstructed data, and a cluster (CLT) that clusters the reconstructed data based on the similarity of feature values. It may include a clustering unit 134 that generates.
예로서, 상기 특성추출부(132)는, 상기 재구성데이터를 저차원데이터로 축소하여 상기 특성값을 추출할 수 있다.For example, the feature extraction unit 132 may extract the feature value by reducing the reconstructed data to low-dimensional data.
예로서, 상기 특성추출부(132)는, 상기 재구성데이터를 저차원데이터로 축소하는 차원축소알고리즘을 이용해 상기 특성값을 추출할 수 있다.As an example, the feature extraction unit 132 may extract the feature value using a dimension reduction algorithm that reduces the reconstructed data to low-dimensional data.
상기 차원축소알고리즘은, 특정 알고리즘에 한정되지 않으며, 예로서 PCA(Principal Component Analysis), LDA(Linear Discriminant Analysis) 등을 포함할 수 있다.The dimensionality reduction algorithm is not limited to a specific algorithm, and examples may include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), etc.
다른 예로서, 도 6을 참조하면, 상기 특성추출부(132)는 상기 재구성데이터의 특성값을 추출하는 특성추출기로서, 재구성데이터를 저차원데이터로 압축하는 인공신경망모델(ANN)을 포함할 수 있다.As another example, referring to FIG. 6, the feature extraction unit 132 is a feature extractor that extracts feature values of the reconstructed data, and may include an artificial neural network model (ANN) that compresses the reconstructed data into low-dimensional data. there is.
상기 특성값은 상기 저차원데이터로 표현되는 잠재벡터값일 수 있다.The characteristic value may be a latent vector value expressed as the low-dimensional data.
상기 군집화부(134)는, 비지도학습 기반의 군집화알고리즘을 이용해 군집화를 수행할 수 있고, 상기 클러스터(CLT)에 연관된 유전자셋(G)을 도출할 수 있다.The clustering unit 134 can perform clustering using an unsupervised learning-based clustering algorithm and derive a gene set (G) associated with the cluster (CLT).
또한, 상기 군집화부(134)는, 상기 클러스터(CLT)의 실루엣값 및 상관계수 중 적어도 어느 하나를 기초로 상기 유전자셋(G)에 포함될 유전자를 최종 선별할 수 있다.Additionally, the clustering unit 134 may finally select genes to be included in the gene set (G) based on at least one of the silhouette value and correlation coefficient of the cluster (CLT).
한편, 상기 공간전사체정보 분석장치(100)는, 재구성데이터의 전사체분포정보로부터 상기 복수의 전사체(A1, …, AM)들의 공간상 분포를 시각화한 2차원이미지(T1, …, TK)를 생성하는 이미지생성부(140)를 추가로 포함할 수 있다.Meanwhile, the spatial transcriptome information analysis device 100 produces a two-dimensional image (T 1 , T 1 , ..., T K ) may additionally include an image generator 140 that generates.
상기 이미지생성부(140)는 상기 복수의 전사체(A1, …, AM)들의 공간상 분포를 시각화하기 위한 구성으로, 재구성데이터 만으로도 유전자발현패턴 분석이 가능하고 시각화 이미지가 불필요한 경우도 가능하므로, 이 경우 이미지생성부(140)는 생략될 수 있음은 물론이다.The image generator 140 is configured to visualize the spatial distribution of the plurality of transcripts (A 1 , ..., A M ). Gene expression pattern analysis is possible only with reconstructed data, and it is possible in cases where a visualization image is not necessary. Therefore, of course, in this case, the image generator 140 can be omitted.
여기서 이미지생성부(140)는 앞서 상세히 설명한 2차원이미지생성장치(200)와 동일하거나 유사하게 구성될 수 있으므로, 중첩되는 범위에서 자세한 설명은 생략하도록 한다.Here, the image generator 140 may be configured identically or similarly to the two-dimensional image generator 200 described in detail above, so detailed description will be omitted to the extent of overlap.
서로 다른 조직 사이의 유전자발현패턴에 대한 비교분석이 필요한 경우, 상기 이미지생성부(140)는, 서로 다른 조직이미지(TI)들 각각에 대해 상기 2차원이미지(T1, …, TK)를 생성할 수 있다.When comparative analysis of gene expression patterns between different tissues is required, the image generator 140 generates the two-dimensional images (T 1 , ..., T K ) for each of the different tissue images (TI). can be created.
이때, 상기 공간전사체정보분석장치(100)는, 상기 2차원이미지(T1, …, TK)에 대해 공간정규화를 수행하여 공간정규화이미지(S1, …, SK)를 생성하는 공간정규화부(150)를 추가로 포함할 수 있다.At this time, the spatial transcriptome information analysis device 100 performs spatial normalization on the two-dimensional images (T 1 , ..., T K ) to generate spatial normalized images (S 1 , ..., S K ). A normalization unit 150 may be additionally included.
상기 공간정규화부(150)는 상기 2차원이미지(T1, …, TK)에 대해 공간정규화를 수행하여 공간정규화이미지(S1, …, SK)를 생성하기 위한 구성으로 다양한 구성이 가능하며, 상기 2차원이미지(T1, …, TK)에 대한 공간정규화는 특정 방법으로 제한되지 않는다. 예로서 공간정규화 방법으로, symmetric image normalization method (SyN)가 적용될 수 있다.The spatial normalization unit 150 is configured to perform spatial normalization on the two-dimensional images (T 1 , …, T K ) to generate spatial normalization images (S 1 , …, S K ), and can be configured in various ways. And, spatial normalization for the two-dimensional images (T 1 , ..., T K ) is not limited to a specific method. As an example of a spatial normalization method, the symmetric image normalization method (SyN) can be applied.
이때, 상기 전사체정보분석부(130)는, 상기 서로 다른 조직이미지(TI)들에 대해 상기 공간정규화이미지(S1, …, SK)를 상호 비교하여 상기 서로 다른 조직이미지(TI)들 사이의 유전자발현패턴을 각 픽셀마다 비교 분석할 수 있다.At this time, the transcriptome information analysis unit 130 compares the spatial normalization images (S 1 , ..., S K ) with respect to the different tissue images (TI) to determine the different tissue images (TI). Gene expression patterns can be compared and analyzed for each pixel.
상기 공간정규화부(150)를 통한 2차원이미지 공간정규화 방식 및 전사체정보분석부(130)를 통한 유전자발현패턴 비교분석 방법에 대해서는 앞서 조직간 유전자발현 비교분석방법에서 상세히 설명한 바 중첩되는 범위에서 자세한 설명은 생략한다.The two-dimensional image spatial normalization method through the spatial normalization unit 150 and the gene expression pattern comparative analysis method through the transcriptome information analysis unit 130 were previously described in detail in the inter-tissue gene expression comparative analysis method in the overlapping range. Detailed explanation is omitted.
또한, 상기 공간전사체정보 분석장치(100)는, 유전자발현패턴 분석 결과를 데이터베이스(DB, 400) 또는 사용자단말(300)로 송신하기 위한 정보송신부(160)를 더 포함할 수 있다.In addition, the spatial transcriptome information analysis device 100 may further include an information transmission unit 160 for transmitting the gene expression pattern analysis results to a database (DB, 400) or a user terminal 300.
상술한 공간전사체정보 분석장치(100)는 공간전사체데이터를 재구성데이터로 재구성하여 분석하는 방법을 제공하기 위한 장치로서, 상기 공간전사체정보 분석장치(100)는 재구성데이터를 이용한 통합적인 공간전사체정보 분석수단을 제공할 수 있다.The above-mentioned spatial transcriptome information analysis device 100 is a device for providing a method of analyzing spatial transcriptome data by reconstructing it into reconstructed data. The spatial transcriptome information analysis device 100 is an integrated spatial transcriptome using reconstructed data. It can provide a means for analyzing transcriptome information.
즉, 상기 공간전사체정보 분석장치(100)를 이용한 공간전사체정보 분석방법은, 재구성데이터를 이용하는 2차원이미지생성방법, 유전자스크리닝방법, 및 조직 간 유전자발현 비교분석방법 중 적어도 하나를 포함할 수 있다.That is, the spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100 may include at least one of a two-dimensional image generation method using reconstructed data, a genetic screening method, and a comparative analysis method of gene expression between tissues. You can.
도 3을 참조하면, 일 예로서, 본 발명에 따른 공간전사체 분석방법은, 2차원이미지생성방법이며, 공간전사체데이터를 수신하는 수신단계(S301)와, 공간전사체데이터를 재구성데이터로 재구성하는 데이터재구성단계(S302)와, 재구성데이터를 이용해 전사체 분포정보(유전자 분포정보)를 시각화한 2차원이미지를 생성하는 2차원이미지생성단계(S302)를 포함할 수 있다.Referring to Figure 3, as an example, the spatial transcriptome analysis method according to the present invention is a two-dimensional image generation method, and includes a receiving step (S301) of receiving spatial transcriptome data, and converting the spatial transcriptome data into reconstruction data. It may include a data reconstruction step (S302) of reconstruction and a two-dimensional image generation step (S302) of generating a two-dimensional image visualizing transcript distribution information (gene distribution information) using the reconstruction data.
또한, 다른 예로서, 본 발명에 따른 공간전사체 분석방법은, 유전자스크리닝방법이며, 공간전사체데이터를 수신하는 수신단계(S301)와, 공간전사체데이터를 재구성데이터로 재구성하는 데이터재구성단계(S302)와, 재구성데이터를 이용해 공간상 유사한 분포를 가지는 유전자를 추출하는 유전자추출단계(S304)를 포함할 수 있다. 이때, 상기 유전자스크리닝방법은, 재구성데이터를 이용해 전사체 분포정보(유전자 분포정보)를 시각화한 2차원이미지(T1, …, TK)를 생성하는 2차원이미지생성단계(S302)를 추가로 포함할 수 있다.In addition, as another example, the spatial transcriptome analysis method according to the present invention is a genetic screening method, and includes a reception step (S301) of receiving spatial transcriptome data, and a data reconstruction step of reconstructing the spatial transcriptome data into reconstruction data ( S302) and a gene extraction step (S304) of extracting genes with similar spatial distribution using the reconstruction data. At this time, the genetic screening method additionally includes a two-dimensional image generation step (S302) of generating a two-dimensional image (T 1 , ..., T K ) visualizing the transcript distribution information (gene distribution information) using the reconstruction data. It can be included.
또한, 다른 예로서, 본 발명에 따른 공간전사체 분석방법은, 조직 간 유전자발현 비교분석방법이며, 서로 다른 조직이미지(TI) 각각에 대해 공간전사체데이터를 수신하는 수신단계(S301)와, 공간전사체데이터를 재구성데이터로 재구성하는 데이터재구성단계(S302)와, 서로 다른 조직이미지(TI)들 각각에 대해 상기 2차원이미지(T1, …, TK)를 생성하는 2차원이미지생성단계(S302)와, 상기 2차원이미지(T1, …, TK)에 대해 공간정규화를 수행하여 공간정규화이미지(S1, …, SK)를 생성하는 공간정규화단계와, 서로 다른 조직이미지(TI)들에 대해 공간정규화이미지(S1, …, SK)를 상호 비교하여 상기 서로 다른 조직이미지(TI)들 사이의 유전자발현패턴을 각 픽셀마다 비교 분석하는 비교분석단계(S305)를 포함할 수 있다.In addition, as another example, the spatial transcriptome analysis method according to the present invention is a comparative analysis method of gene expression between tissues, and includes a receiving step (S301) of receiving spatial transcriptome data for each different tissue image (TI), A data reconstruction step (S302) in which spatial transcriptome data is reconstructed into reconstruction data, and a two-dimensional image generation step in which the two-dimensional images (T 1 , ..., T K ) are generated for each of the different tissue images (TI). (S302) and a spatial normalization step of performing spatial normalization on the two-dimensional images (T 1 , …, T K ) to generate spatial normalized images (S 1 , …, S K ), and different tissue images ( It includes a comparative analysis step (S305) of comparing and analyzing the gene expression patterns between the different tissue images (TI) for each pixel by comparing the spatial normalized images (S 1 , ..., S K ) for the TIs. can do.
상술한 공간전사체정보 분석장치(100)를 이용해 수행되는 공간전사체정보 분석방법은 컴퓨터 실행가능한 공간전사체정보 분석프로그램을 통해 구현될 수 있다.The spatial transcriptome information analysis method performed using the spatial transcriptome information analysis device 100 described above can be implemented through a computer-executable spatial transcriptome information analysis program.
이상은 본 발명에 의해 구현될 수 있는 바람직한 실시예의 일부에 관하여 설명한 것에 불과하므로, 주지된 바와 같이 본 발명의 범위는 위의 실시예에 한정되어 해석되어서는 안 될 것이며, 위에서 설명된 본 발명의 기술적 사상과 그 근본을 함께하는 기술적 사상은 모두 본 발명의 범위에 포함된다고 할 것이다. Since the above is only a description of some of the preferred embodiments that can be implemented by the present invention, as is well known, the scope of the present invention should not be construed as limited to the above embodiments, and the scope of the present invention described above Both the technical idea and the technical idea underlying it will be said to be included in the scope of the present invention.

Claims (20)

  1. 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 공간전사체데이터를 수신하는 정보수신부(110)와;Location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information (R 1 , ..., corresponding to each of the plurality of spots (P 1 , ..., P N )) An information reception unit 110 that receives spatial transcriptome data consisting of R N );
    상기 전사체정보(R1, …, RN)가 없는 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간이 보간되도록 상기 공간전사체데이터를 재구성한 재구성데이터를 산출하는 데이터재구성부(120)와;Data reconstruction to calculate reconstructed data in which the spatial transcriptome data is reconstructed so that empty spaces between the plurality of spots (P 1 , ..., P N ) without the transcript information (R 1 , ..., R N ) are interpolated. Boo (120) and;
    상기 재구성데이터를 기초로 유전자발현패턴을 분석하는 전사체정보분석부(130);를 포함하며,It includes a transcriptome information analysis unit 130 that analyzes gene expression patterns based on the reconstruction data,
    상기 전사체정보(R1, …, RN)는 복수의 전사체(A1, …, AM)들 각각의 발현량에 대한 정보를 포함하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The transcriptome information (R 1 , …, R N ) is a spatial transcriptome information analysis device (100) characterized in that it includes information on the expression level of each of the plurality of transcripts (A 1 , …, A M ). ).
  2. 청구항 1에 있어서,In claim 1,
    상기 유전자발현패턴은, 상기 조직이미지와 동일한 조직의 유전자발현패턴 또는 다른 조직의 유전자발현패턴인 것을 특징으로 하는 공간전사체정보 분석장치(100).The spatial transcriptome information analysis device (100), wherein the gene expression pattern is a gene expression pattern of the same tissue as the tissue image or a gene expression pattern of a different tissue.
  3. 청구항 1에 있어서,In claim 1,
    상기 재구성데이터는, 상기 복수의 전사체(A1, …, AM)들 각각에 대해 상기 발현량이 상기 복수의 스팟(P1, …, PN)들의 중앙좌표(C1, …, CN)를 중심으로 연속확률분포를 따라 분포되는 것으로 가정하여 재구성된 전사체분포정보를 포함하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The reconstruction data is such that, for each of the plurality of transcripts (A 1 , ..., A M ) , the expression level is determined by the central coordinates (C 1 , ..., C N A spatial transcriptome information analysis device (100) characterized by including transcriptome distribution information reconstructed by assuming that it is distributed along a continuous probability distribution centered on ).
  4. 청구항 2에 있어서,In claim 2,
    상기 연속확률분포는, 상기 중앙좌표(C1, …, CN)를 중앙값으로 하고 미리 설정된 분산값을 가지는 정규분포인 것을 특징으로 하는 공간전사체정보 분석장치(100).The continuous probability distribution is a spatial transcriptome information analysis device (100), characterized in that the central coordinates (C 1 , ..., C N ) are the central values and a normal distribution with a preset variance value.
  5. 청구항 3에 있어서,In claim 3,
    상기 공간전사체정보분석장치(100)는, 상기 전사체분포정보로부터 상기 복수의 전사체(A1, …, AM)들의 공간상 분포를 시각화한 2차원이미지(T1, …, TK)를 생성하는 이미지생성부(140)를 추가로 포함하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The spatial transcriptome information analysis device 100 produces a two-dimensional image (T 1 , ..., T K visualizing the spatial distribution of the plurality of transcripts (A 1 , ..., A M ) from the transcript distribution information ) Spatial transcriptome information analysis device (100), characterized in that it additionally includes an image generator (140) that generates.
  6. 청구항 1에 있어서,In claim 1,
    상기 전사체정보분석부(130)는, 상기 재구성데이터의 특성값을 추출하는 특성추출부(132)와, 상기 특성값의 유사도를 기준으로 상기 재구성데이터를 군집화한 클러스터(CLT)를 생성하는 군집화부(134)를 포함하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts characteristic values of the reconstructed data, and a clustering unit that generates a cluster (CLT) that clusters the reconstructed data based on the similarity of the characteristic values. A spatial transcriptome information analysis device (100) comprising a unit (134).
  7. 청구항 6에 있어서,In claim 6,
    상기 특성추출부(132)는, 상기 재구성데이터를 저차원데이터로 축소하여 상기 특성값을 추출하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The feature extraction unit 132 is a spatial transcriptome information analysis device 100, characterized in that the feature value is extracted by reducing the reconstructed data to low-dimensional data.
  8. 청구항 6에 있어서,In claim 6,
    상기 특성추출부(132)는, 상기 재구성데이터를 저차원데이터로 압축하는 인공신경망모델을 포함하며,The feature extraction unit 132 includes an artificial neural network model that compresses the reconstructed data into low-dimensional data,
    상기 인공신경망모델은, 상기 재구성데이터를 학습데이터로 하는 것을 특징으로 하는 공간전사체정보분석장치(100).The artificial neural network model is a spatial transcriptome information analysis device (100) characterized in that the reconstruction data is used as learning data.
  9. 청구항 8에 있어서,In claim 8,
    상기 특성값은 상기 저차원데이터로 표현되는 잠재벡터값인 것을 특징으로 하는 공간전사체정보 분석장치(100).The spatial transcriptome information analysis device (100), wherein the characteristic value is a latent vector value expressed by the low-dimensional data.
  10. 청구항 6에 있어서,In claim 6,
    상기 군집화부(134)는, 비지도학습 기반의 군집화알고리즘을 이용해 군집화를 수행하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The clustering unit 134 is a spatial transcriptome information analysis device 100, characterized in that clustering is performed using an unsupervised learning-based clustering algorithm.
  11. 청구항 6에 있어서,In claim 6,
    상기 군집화부(134)는, 상기 클러스터(CLT)에 연관된 유전자셋(G)을 도출하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The clustering unit 134 is a spatial transcriptome information analysis device 100, characterized in that the gene set (G) associated with the cluster (CLT) is derived.
  12. 청구항 11에 있어서,In claim 11,
    상기 군집화부(134)는, 상기 클러스터(CLT)의 실루엣값 및 상관계수 중 적어도 어느 하나를 기초로 상기 유전자셋(G)에 포함될 유전자를 최종 선별하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The clustering unit 134 is a spatial transcriptome information analysis device characterized in that it finally selects genes to be included in the gene set (G) based on at least one of the silhouette value and the correlation coefficient of the cluster (CLT). 100).
  13. 청구항 5에 있어서,In claim 5,
    상기 이미지생성부(140)는, 서로 다른 조직이미지(TI)들 각각에 대해 상기 2차원이미지(T1, …, TK)를 생성하며,The image generator 140 generates the two-dimensional images (T 1 , ..., T K ) for each of the different tissue images (TI),
    상기 공간전사체정보분석장치(100)는, 상기 2차원이미지(T1, …, TK)에 대해 공간정규화를 수행하여 공간정규화이미지(S1, …, SK)를 생성하는 공간정규화부(150)를 추가로 포함하며,The spatial transcriptome information analysis device 100 includes a spatial normalization unit that performs spatial normalization on the two-dimensional images (T 1 , …, T K ) to generate spatial normalized images (S 1 , …, S K ). Additionally comprising (150),
    상기 전사체정보분석부(130)는, 상기 서로 다른 조직이미지(TI)들에 대해 상기 공간정규화이미지(S1, …, SK)를 상호 비교하여 상기 서로 다른 조직이미지(TI)들 사이의 유전자발현패턴을 비교 분석하는 것을 특징으로 하는 공간전사체정보 분석장치(100).The transcriptome information analysis unit 130 compares the spatial normalized images (S 1 , ..., S K ) for the different tissue images (TI) to determine the difference between the different tissue images (TI). A spatial transcriptome information analysis device (100) characterized by comparative analysis of gene expression patterns.
  14. 청구항 1 내지 청구항 13 중 어느 하나의 항에 따른 공간전사체정보분석장치(100)와;A spatial transcriptome information analysis device (100) according to any one of claims 1 to 13;
    상기 공간전사체정보 분석장치(100)와 네트워크를 통해 연결되는 사용자단말(300)을 포함하는 것을 특징으로 하는 공간전사체정보 분석시스템(1000).A spatial transcriptome information analysis system (1000) comprising a user terminal (300) connected to the spatial transcriptome information analysis device (100) through a network.
  15. 청구항 1 내지 청구항 13 중 어느 하나의 항에 따른 공간전사체정보 분석장치(100)를 이용한 공간전사체정보 분석방법.A spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100 according to any one of claims 1 to 13.
  16. 청구항 15에 따른 공간전사체정보 분석방법을 수행하기 위한 컴퓨터 실행가능한 공간전사체정보 분석프로그램.A computer-executable spatial transcriptome information analysis program for performing the spatial transcriptome information analysis method according to claim 15.
  17. 조직이미지(TI) 상 이격된 복수의 스팟(P1, …, PN)들의 위치정보와 상기 복수의 스팟(P1, …, PN)들 마다 대응되는 전사체정보(R1, …, RN)로 구성된 공간전사체데이터를 재구성한 재구성데이터를 이용해 전사체분포에 대한 2차원이미지(T1, …, TK)를 생성하는 이미지생성장치(200)로서,Location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information (R 1 , ..., corresponding to each of the plurality of spots (P 1 , ..., P N )) An image generation device 200 that generates a two-dimensional image (T 1 , ..., T K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data composed of R N ),
    상기 재구성데이터는, 상기 전사체정보(R1, …, RN)가 없는 상기 복수의 스팟(P1, …, PN)들 사이 빈 공간이 보간되도록 재구성된 데이터인 것을 특징으로 하는 2차원이미지생성장치(200).The reconstructed data is two-dimensional data, characterized in that the empty space between the plurality of spots (P 1 , ..., P N ) without the transcript information (R 1 , ..., R N ) is interpolated. Image generating device (200).
  18. 청구항 17에 있어서,In claim 17,
    상기 재구성데이터는, 상기 복수의 전사체(A1, …, AM)들 각각의 발현량이 상기 복수의 스팟(P1, …, PN)들의 중앙좌표(C1, …, CN)를 중심으로 연속확률분포를 따라 분포되는 것으로 가정하여 재구성된 전사체분포정보를 포함하는 것을 특징으로 하는 2차원이미지생성장치(200).In the reconstructed data, the expression level of each of the plurality of transcripts (A 1 , ..., AM ) is determined by the central coordinates (C 1 , ..., C N ) of the plurality of spots (P 1 , ..., P N ). A two-dimensional image generating device (200) characterized by including transcriptome distribution information reconstructed by assuming that it is distributed along a continuous probability distribution centered on the center.
  19. 청구항 17 및 청구항 18 중 어느 하나의 항에 따른 2차원이미지생성장치(200)에서 생성된 상기 2차원이미지(T1, …, TK)를 이용해 공간상 유사한 분포를 가지는 유전자를 추출하는 유전자스크리닝방법.Genetic screening for extracting genes with similar spatial distribution using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 according to any one of claims 17 and 18. method.
  20. 청구항 17 및 청구항 18 중 어느 하나의 항에 따른 2차원이미지생성장치(200)에서 서로 다른 조직의 조직이미지(TI)에 대해 생성된 상기 2차원이미지(T1, …, TK)를 대비해 상기 서로 다른 조직의 조직이미지(TI) 사이의 유전자발현패턴을 비교 분석하는 조직간 유전자발현 비교분석방법.In comparison with the two-dimensional images (T1, ..., TK) generated for the tissue images (TI) of different tissues in the two-dimensional image generating device (200) according to any one of claims 17 and 18, the different A comparative analysis method of gene expression between tissues that compares and analyzes gene expression patterns between tissue images (TI).
PCT/KR2022/005223 2022-04-06 2022-04-11 Spatial transcriptome information analysis apparatus and analysis method using same WO2023195564A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020220042884A KR102483745B1 (en) 2022-04-06 2022-04-06 Spatial transcriptome analysis apparatus and method using the same
KR10-2022-0042884 2022-04-06

Publications (1)

Publication Number Publication Date
WO2023195564A1 true WO2023195564A1 (en) 2023-10-12

Family

ID=84924945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/005223 WO2023195564A1 (en) 2022-04-06 2022-04-11 Spatial transcriptome information analysis apparatus and analysis method using same

Country Status (2)

Country Link
KR (1) KR102483745B1 (en)
WO (1) WO2023195564A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102170297B1 (en) * 2019-12-16 2020-10-26 주식회사 루닛 Method and system for providing interpretation information on pathomics data
US20210199584A1 (en) * 2019-12-17 2021-07-01 Applied Materials, Inc. System and method for acquisition and processing of multiplexed fluorescence in-situ hybridization images
KR20220033484A (en) * 2019-06-14 2022-03-16 바이오 래드 래버러토리스 인코오포레이티드 Systems and Methods for Automated Single Cell Processing and Analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220033484A (en) * 2019-06-14 2022-03-16 바이오 래드 래버러토리스 인코오포레이티드 Systems and Methods for Automated Single Cell Processing and Analysis
KR102170297B1 (en) * 2019-12-16 2020-10-26 주식회사 루닛 Method and system for providing interpretation information on pathomics data
US20210199584A1 (en) * 2019-12-17 2021-07-01 Applied Materials, Inc. System and method for acquisition and processing of multiplexed fluorescence in-situ hybridization images

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BERGENSTRÅHLE LUDVIG; HE BRYAN; BERGENSTRÅHLE JOSEPH; ABALO XESÚS; MIRZAZADEH REZA; THRANE KIM; JI ANDREW L.; ANDERSSON ALMA; LARS: "Super-resolved spatial transcriptomics by deep data fusion", NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP US, NEW YORK, vol. 40, no. 4, 29 November 2021 (2021-11-29), New York, pages 476 - 479, XP037799136, ISSN: 1087-0156, DOI: 10.1038/s41587-021-01075-3 *
DONG KANGNING, ZHANG SHIHUA: "Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder", NATURE COMMUNICATIONS, vol. 13, no. 1, 1 April 2022 (2022-04-01), pages 1739, XP093096032, DOI: 10.1038/s41467-022-29439-6 *

Also Published As

Publication number Publication date
KR102483745B1 (en) 2023-01-04

Similar Documents

Publication Publication Date Title
WO2019235828A1 (en) Two-face disease diagnosis system and method thereof
WO2020045848A1 (en) System and method for diagnosing disease using neural network performing segmentation
WO2019172498A1 (en) Computer-aided diagnosis system for providing tumor malignancy and basis of malignancy inference and method therefor
WO2019245182A1 (en) Method, device, and program for matching procurement bidding information
WO2010041836A2 (en) Method of detecting skin-colored area using variable skin color model
WO2022005090A1 (en) Method and apparatus for providing diagnosis result
WO2012050252A1 (en) System and method for automatically generating a mass classifier using a dynamic combination of classifiers
WO2021125744A1 (en) Method and system for providing interpretation information on pathomics data
WO2021010671A9 (en) Disease diagnosis system and method for performing segmentation by using neural network and unlocalized block
WO2023195564A1 (en) Spatial transcriptome information analysis apparatus and analysis method using same
WO2020145606A1 (en) Method for analyzing cell image by using artificial neural network, and device for processing cell image
WO2022124724A1 (en) Machine-learning-based prognosis prediction method and device therefor
WO2021060748A1 (en) Connectivity learning device and connectivity learning method
WO2022158628A1 (en) System for determining defect in display panel on basis of machine learning model
WO2023234730A1 (en) Patch level severity determination method, slide level severity determination method, and computing system for performing same
WO2020032560A2 (en) Diagnosis result generation system and method
WO2023167448A1 (en) Method and apparatus for analyzing pathological slide image
WO2022225308A1 (en) Method for analyzing microbial interaction networks from microbiome data using non-negative matrix factorization
WO2019231162A1 (en) Image segmentation method and device
WO2022240266A1 (en) Ai-based character generation system and method
WO2022050624A1 (en) System for analyzing and evaluating gut microbiome and evaluation method therefor
WO2019117400A1 (en) Gene network construction apparatus and method
WO2021256578A1 (en) Apparatus and method for automatically generating image caption
WO2021225422A1 (en) Method and apparatus for providing information associated with immune phenotypes for pathology slide image
WO2020145605A1 (en) Special microscope image generation method and image processing device using artificial neural network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22936618

Country of ref document: EP

Kind code of ref document: A1