WO2023195564A1

WO2023195564A1 - Spatial transcriptome information analysis apparatus and analysis method using same

Info

Publication number: WO2023195564A1
Application number: PCT/KR2022/005223
Authority: WO
Inventors: 서미경; 이대승; 최홍윤
Original assignee: 주식회사 포트래이
Priority date: 2022-04-06
Filing date: 2022-04-11
Publication date: 2023-10-12
Also published as: KR102483745B1

Abstract

The present invention relates to a spatial transcriptome information analysis apparatus and a spatial transcriptome information analysis method using same and to a spatial transcriptome information analysis apparatus using reconstitution data in which spatial transcriptome data is reconstituted to make interpolation in empty spaces lacking transcriptome information on a tissue image (TI) and a spatial transcriptome information analysis method using same. The present invention discloses a spatial transcriptome information analysis apparatus (100) comprising: an information receptor (110) for receiving spatial transcriptome data consisting of location information for a plurality of spots (P₁, …, P_N) spaced from each other on a tissue image (TI) and transcriptome information (R₁, …, R_N) respectively corresponding to the plurality of spots (P₁, …, P_N; a data reconstituting unit (120) for generating reconstituted data in which the spatial transcriptome data is reconstituted to make interpolation in the empty space lacking the transcriptome information (R₁, …, R_N) between the plurality of spots (P₁, …, P_N); and a spatial transcriptome information analyzer (130) for analyzing a gene expression pattern on the basis of the reconstituted data.

Description

Spatial transcriptome information analysis device and analysis method using the same

The present invention relates to a spatial transcriptome information analysis device and an analysis method using the same. The present invention relates to an analysis device using reconstructed data that reconstructs spatial transcriptome information so that empty spaces without transcript information in tissue images are interpolated, and an analysis method using the same. It's about.

Spatial transcriptome data refers to the sum of data containing spatial location information and transcriptome information (gene expression information). Spatial transcriptome data is data consisting of hundreds to tens of thousands of spots, and the spot refers to a very small part of the tissue. In other words, spatial transcriptome data is data composed of tissue location information and expression information of genes in that tissue.

Spatial transcriptome data requires analysis of tens of thousands of small spatial regions (spots) of tens of thousands of gene expression information, and the location information of the spots is also added, so an appropriate analysis method is required.

In addition, the gaps between spots containing transcript information are not all filled, so gene expression information for empty spaces without transcript information cannot be calculated, which limits biological understanding and visual interpretation.

In addition, when comparing gene expression information between different tissues or multiple tissue samples located in different spaces, the location and shape of the samples are not the same, so drug data processed at various concentrations, developmental progress data, and conditions are different. There are difficulties when conducting comparative analysis of other data.

Therefore, there is a great need for spatial transcriptome data analysis technology that allows selection of gene groups showing spatially similar patterns or comparison of gene expression patterns between different tissues.

The purpose of the present invention is to recognize the problems and needs described above, and to infer information in the empty space between spots without spatial transcriptome information to select gene sets showing similar expression patterns in space or to determine gene expression between different tissues. The goal is to provide a spatial transcriptome information analysis device that can easily compare aspects and a spatial transcriptome information analysis method using the same.

The present invention was created to achieve the object of the present invention as described above, and includes location information of a plurality of spots (P ₁ , ..., P _N ) spaced apart on a tissue image (TI) and the plurality of spots (P ₁ , ..., P _N ) and an information receiving unit 110 that receives spatial transcript data consisting of transcript information (R ₁ , ..., R _N ) corresponding to each of them; Data reconstruction to calculate reconstructed data in which the spatial transcriptome data is reconstructed so that empty spaces between the plurality of spots (P ₁ , ..., P _N ) without the transcript information (R ₁ , ..., R _N ) are interpolated. Boo (120) and; Disclosed is a spatial transcriptome information analysis device 100 including a transcriptome information analysis unit 130 that analyzes gene expression patterns based on the reconstructed data.

The transcript information (R ₁ , …, R _N ) may include information about the expression level of each of the plurality of transcripts (A ₁ , …, A _M ).

The gene expression pattern may be a gene expression pattern of the same tissue as the tissue image (TI) or a gene expression pattern of a different tissue.

The reconstruction data is such that, for each of the plurality of transcripts (A ₁ , ..., A _M ) _, the expression level is determined by the central coordinates (C ₁ _, ..., C _N ) can include reconstructed transcriptome distribution information assuming that it is distributed according to a continuous probability distribution centered on ).

The continuous probability distribution may be a normal distribution with the central coordinates (C ₁ , ..., C _N ) as the median and a preset dispersion value.

The spatial transcriptome information analysis device 100 produces a two-dimensional image (T ₁ , ..., T _K visualizing the spatial distribution of the plurality of transcripts (A ₁ , ..., A _M ) from the transcript distribution information ) may additionally include an image generator 140 that generates.

The transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts characteristic values of the reconstructed data, and a clustering unit that generates a cluster (CLT) that clusters the reconstructed data based on the similarity of the characteristic values. It may include unit 134.

The feature extraction unit 132 may extract the feature values by reducing the reconstructed data to low-dimensional data.

The feature extraction unit 132 may include an artificial neural network model that compresses the reconstructed data into low-dimensional data.

The artificial neural network model may use the reconstruction data as learning data.

The characteristic value may be a latent vector value expressed as the low-dimensional data.

The clustering unit 134 may perform clustering using an unsupervised learning-based clustering algorithm.

The clustering unit 134 can derive a gene set (G) associated with the cluster (CLT).

The clustering unit 134 may finally select genes to be included in the gene set (G) based on at least one of the silhouette value and the correlation coefficient of the cluster (CLT).

The image generator 140 may generate the two-dimensional images (T ₁ , ..., T _K ) for each of the different tissue images (TI).

The spatial transcriptome information analysis device 100 includes a spatial normalization unit that performs spatial normalization on the two-dimensional images (T ₁ , …, T _K ) to generate spatial normalized images (S ₁ , …, S _K ). (150) may be additionally included.

The transcriptome information analysis unit 130 compares the spatial normalized images (S ₁ , ..., S _K ) for the different tissue images (TI) to determine the difference between the different tissue images (TI). Gene expression patterns can be compared and analyzed.

In another aspect, the present invention includes a spatial transcriptome information analysis device (100); A spatial transcriptome information analysis system (1000) is disclosed, which includes the spatial transcriptome information analysis device (100) and a user terminal (300) connected through a network.

In another aspect, the present invention discloses a spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100.

In another aspect, the present invention discloses a computer-executable spatial transcriptome information analysis program for performing a spatial transcriptome information analysis method.

In another aspect, the present invention provides location information of a plurality of spots (P ₁ , ..., P _N ) spaced apart on a tissue image (TI) and transcript information corresponding to each of the plurality of spots (P ₁ , ..., P _N ). Disclosed is an image generation device 200 that generates a two-dimensional image (T ₁ , …, T _K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data composed of (R ₁ , …, R _N ). do.

In the reconstructed data, the expression level of each of the plurality of transcripts (A ₁ , ..., _AM ) is determined by the central coordinates (C ₁ , ..., C _N ) of the plurality of spots (P ₁ , ..., P _N ). It can include reconstructed transcriptome distribution information assuming that it is distributed along a continuous probability distribution with the center.

In another aspect, the present invention discloses a genetic screening method for extracting genes with similar spatial distribution using the two-dimensional images (T ₁ , ..., T _K ) generated by the two-dimensional image generating device 200.

In another aspect, the present invention provides a tissue image (TI) of different tissues in contrast to the two-dimensional images (T1, ..., TK) generated for the tissue images (TI) of different tissues in the two-dimensional image generating device 200. ) Discloses a method for comparative analysis of gene expression between tissues.

The spatial transcriptome information analysis device and analysis method using the same according to the present invention infer information in the empty space between spots without transcript information to select gene sets showing similar expression patterns in space or to determine gene expression between different tissues. This has the advantage of facilitating aspect comparison and providing better biological and functional understanding and new insights.

Specifically, the present invention can generate a two-dimensional image of transcript distribution information (gene expression pattern) by inferring gene expression values in an empty space without transcript information based on gene expression and spatial information in tissues. , You can also find genes that are spatially distributed similarly and select genes that show similar expression patterns from desired genes or characteristics.

In addition, the present invention obtains genetic information spatially associated with a specific desired target substance or molecule, or transfers transcriptome distribution information (gene expression pattern) imaged as a two-dimensional image to different spaces for comparison between different tissues. By normalizing existing organizations, it is possible to make comparisons between different organizations possible.

Furthermore, the present invention can be used as a method to compare changes in transcriptome distribution information (gene expression pattern) caused by diseases or drugs, etc. between different tissues, and can be actively used and applied in various pathophysiology research and new drug development. You can.

Figure 1 is a conceptual diagram showing a spatial transcriptome information analysis system according to an embodiment of the present invention.

FIG. 2 is a block diagram showing the spatial transcriptome information analysis device of FIG. 1.

FIG. 3 is a flow chart showing a spatial transcriptome information analysis method performed in the spatial transcriptome information analysis system of FIG. 1.

Figure 4 is a conceptual diagram showing spots constituting spatial transcriptome data.

Figure 5 is a diagram showing a visualization image in which spatial transcriptome data is visualized in a conventional manner.

FIG. 6 is a diagram illustrating the process of clustering reconstructed data in the spatial transcriptome information analysis device of FIG. 2.

Figure 7 is a diagram explaining the principle of reconstructing spatial transcriptome data into reconstruction data.

Figure 8 is a diagram showing a visualization image visualizing reconstructed data.

Figure 9 is a diagram showing genes clustered based on the similarity of characteristic values of reconstructed data.

Figure 10 is a diagram illustrating the spatial expression pattern of the clustered gene set.

Figure 11 is a graph showing correlation evaluation between genes in the clustered gene set compared to the simulated gene set.

Figure 12 is a graph showing the evaluation of the discrimination power of the clustered gene set compared to the simulated gene set.

Figures 13a and 13b are diagrams showing two-dimensional images of the gene set (G) matching the fiber tract of anatomical tissue and the corresponding spatial region.

Figure 14 is a diagram showing two-dimensional images of genes extracted related to the molecular and pathological characteristics and functions of tissues.

Figure 15 is a diagram illustrating clusters with similar gene expression patterns in tissue space.

Figure 16 is a diagram showing a two-dimensional image generated for the main gene(s) representing the characteristics of the cluster in Figure 15.

Figures 17a and 17b are diagrams showing normalized images obtained by performing spatial normalization on two-dimensional images of five different tissues.

Figure 18a is a normalized image showing the gene expression patterns of five tissues with different exposed heme concentrations, and Figure 18b is a diagram showing spatially similar gene sets selected by analyzing the correlation with the pixel values of the normalized image of Figure 18a. am.

Hereinafter, the spatial transcriptome information analysis system 1000 according to the present invention will be described with reference to the attached drawings.

The spatial transcriptome information analysis system 1000 uses spatial transcriptome information to generate a two-dimensional image of transcriptome distribution using spatial transcriptome information, or to analyze the gene expression pattern of a tissue using spatial transcriptome information. It may be a system for comparative analysis of gene expression patterns between different tissues.

As an example, the spatial transcriptome information analysis system 1000 according to the present invention is connected to a user terminal 300 and the user terminal 300 through a network, as shown in FIG. 1, and spatial transcriptome information It may include an image generating device 200 that generates a two-dimensional image of transcript distribution using spatial transcript information.

The user terminal 300 corresponds to a computing device connected through a network to the image generating device 200, which will be described later, and can be implemented as, for example, a desktop, laptop, tablet PC, or smartphone, and can be implemented as an image generating device. (200) and may include a network interface for network connection and a user input/output interface for user input/output.

For example, the user terminal 300 may correspond to a mobile terminal and may be connected to the image generating device 200 through cellular communication or Wi-Fi communication.

As another example, the user terminal 300 may correspond to a desktop and may be connected to the image creation device 200 through the Internet.

The image generating device 200 is connected to the user terminal 300 through a network and can receive requests or commands from the user terminal 300 or transmit requests or commands to the user terminal 300. , location information of a plurality of spots (P ₁ , ..., P _N ) spaced apart on a tissue image (TI) and transcript information (R ₁ , ...) corresponding to each of the plurality of spots (P ₁ , ..., P _N) . , R _N ), various configurations are possible as a server for generating two-dimensional images (T ₁ , ..., T _K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data.

As shown in FIG. 4, the spatial transcriptome data includes location information of a plurality of spots (P ₁ , ..., P _N ) spaced apart on a tissue image (TI) and the plurality of spots (P ₁ , ..., P _N and N are natural numbers and may be total data consisting of transcript information (R ₁ , ..., R _N ) corresponding to each spot (total number of spots).

The spots (P ₁ , ..., P _N ) refer to small areas on the tissue image (TI), and each spot (P ₁ , ..., P _N ) contains transcript information (R ₁ , ..., R) as gene expression information. _N ) can correspond to each.

Spatial transcriptome data = {(Pn, Rn)|1≤n≤N, N is a natural number, total number of spots}

Here, the transcript information (R ₁ , …, R _N ) may include information about the expression level of each of a plurality of transcripts (A ₁ , …, A _M , M is the total number of transcripts). Here, information about the expression level of each transcript (A ₁ , ..., A _M ) may be information about the expression level of each gene.

The plurality of spots (P ₁ , ..., P _N ) are spaced apart from each other, and between the spots (P ₁ _, ..., P _N ₎ is an empty space (V, area).

In other words, the transcript information (R ₁ , …, R _N ) in the empty space (V) between the spots (P ₁ , …, P _N ) is unknown, so biological understanding and visual interpretation using spatial transcriptome data is limited. This occurs.

Figure 5 shows spatial transcriptome data by drawing a circle (polygons such as hexagons are also possible) around the midpoint of each spot (P ₁ , ..., P _N ) and varying the color or density according to the transcript expression level (gene expression level). This is a drawing visualized as an image, and was created by visualizing spatial transcriptome data using conventional technology.

Since there is no information on the transcript expression level (gene expression level) between the spots (P ₁ , ..., P _N ), the data is sparsely distributed from an image perspective.

The reconstructed data is data that reconstructs spatial transcriptome data, and the empty space (V) between the plurality of spots (P ₁ , ..., P _N ) without transcript information (R ₁ , ..., R _N ) is It may be data reconstructed to be interpolated.

The principle of reconstructing the spatial transcriptome data into reconstruction data is to infer the transcriptome information of the empty space (V) between the plurality of spots (P ₁ , ..., P _N ).

As an example, the reconstruction data is such that the expression level of each of the plurality of transcripts (A ₁ , ..., A _M ) is the central coordinate (C ₁ , ..., C) of the plurality of spots (P ₁ , ..., P _N ). It may be data reconstructed by assuming that it is distributed according to a continuous probability distribution centered on _N ).

The reconstructed data may include transcript distribution information, which may mean the expression level (gene expression level) of each transcript (A ₁ , ..., A _M ).

The expression levels of the plurality of transcripts (A ₁ ,..., _AM ) follow a continuous probability distribution centered on the central coordinates (C ₁ ,..., C _N ) of the plurality of spots (P ₁ ,..., P _N ). Assuming that they are distributed, by summing up each spot (P ₁ , …, P _N ), transcript distribution information for each transcript (A ₁ , …, A _M ) can be obtained as reconstruction data.

The continuous probability distribution may be a normal distribution with the central coordinates (C ₁ , ..., C _N ) as the median and a preset dispersion value, but is not limited thereto.

Figure 7 is a schematic diagram showing the principle of reconstructing spatial transcriptome data into reconstruction data, and the reconstruction data may be data continuously distributed from an image perspective. Referring to Figure 7, it is assumed that the transcript expression level (gene expression level) from a specific spot (Pn) stochastically follows a spatial continuous probability distribution (ex, normal distribution) (i.e., the center of the spot (Pn) Assuming that the transcript expression level (gene expression level) obtained from the spot (Pn) decreases as the distance from the coordinate (Cn) decreases, and through the process of adding this for all spots (P ₁ , ..., P _N ), rare It can be seen that an image can be obtained by reconstructing spatial transcriptome data composed of coordinates into a dense two-dimensional matrix.

The image generating device 200 is a two-dimensional image (T ₁ , ..., T _K , K is the number of two-dimensional images) that visualizes the reconstructed data by varying the color or density according to the transcript expression level (gene expression level). ) can be created. One two-dimensional image can be created per transcript (gene), and each of more than 20,000 genes can be displayed as a two-dimensional image.

That is, the two-dimensional images (T ₁ , …, T _K ) can be generated for each transcript (A ₁ , …, A _M ). As an example, M two-dimensional images (T ₁ , ..., T _M ) corresponding to M transcripts (A ₁ , ..., A _M ) may be generated. Of course, an embodiment in which one two-dimensional image (T ₁ , …, T _M ) includes transcript distribution information for several transcripts (A ₁ , …, A _M ) is also possible.

Figure 8 is an example of a two-dimensional image generated by the image generating device 200, in which the reconstructed data is visualized as a two-dimensional image by varying the color or density for each location depending on the transcript expression level (gene expression level). It is a drawing. In particular, Figure 8 is a two-dimensional image (T ₁ , ..., T _K ) visualizing the reconstructed data that reconstructs the spatial transcriptome data in Figure 5, and shows the transcript expression level (gene expression level) in two dimensions in pixel units. It shows the results expressed in matrix form and as an image.

When the transcript information (gene expression information) of the spot (P ₁ , ..., P _N ) is reconstructed in a two-dimensional space through the method of the present invention by assuming that it appears as a probability distribution value in a two-dimensional space, FIG. 5 The image can be reconstructed and appear as a two-dimensional image as shown in Figure 8.

Since the two-dimensional images (T ₁ , ..., T _K ) generated through the image generating device 200 have continuous transcript distribution information, they can be effectively used for biological understanding and visual interpretation of tissues.

The image generating device 200 receives reconstruction data from an external database (DB) or user terminal 300 to generate two-dimensional images (T ₁ , ..., T _K ), or receives spatial transcriptome data. It can be reconstructed using reconstruction data.

In the present invention, generating a two-dimensional image by reconstructing spatial transcriptome data into reconstruction data means changing the data structure of the spatial transcriptome data to the level of a multidimensional image. Through this, a method can be proposed that enables clustering of transcript distribution information (gene expression information) and spatial comparison of different tissues, and it can provide a basic technology that can solve existing unresolved problems. In other words, the present invention is very useful in that it can be a useful technology not only for companies that have spatial transcriptome data production and analysis technology, but also for companies that can develop new drugs using the derived candidate substances (markers).

As an example of use, the two-dimensional images (T ₁ , ..., T _K ) generated by the two-dimensional image generating device 200 can be used for genetic screening to extract genes with similar spatial distribution. In other words, the present invention can lead to a method that can cluster genes with similar images, that is, similar spatial gene expression, for tens of thousands of two-dimensional images.

The genetic screening method performed using the two-dimensional images (T ₁ , ..., T _K ) generated by the two-dimensional image generating device 200 is a two-dimensional image generated by the two-dimensional image generating device 200 (T This is a method of extracting genes with similar spatial distribution using ₁ , …, T _K ).

More specifically, the genetic screening method includes a feature value extraction step of extracting feature values of two-dimensional images (T ₁ , ..., T _K ), and generating a cluster (CLT) by clustering based on the similarity of the feature values. It may include a clustering step and a gene extraction step to derive a gene set (G) associated with the cluster (CLT).

The characteristic values show the image characteristics of two-dimensional images (T ₁ , ..., T _K ), and may be data reduced from the reconstructed data to low-dimensional data. For example, dimension reduction algorithms (PCA, LDA, etc.) Alternatively, it can be extracted using an artificial neural network model (ANN).

The artificial neural network model (ANN) can be an artificial neural network model (ANN) that is trained in an unsupervised manner using reconstruction data as learning data and can output characteristic values for two-dimensional images (T ₁ , ..., T _K ). there is.

For example, the artificial neural network model (ANN) includes a first neural network (ANNa) that compresses the reconstructed data into low-dimensional data, and a second neural network (ANNb) that restores the compressed low-dimensional data to the original dimension and outputs the reconstructed data. ) may include.

At this time, the characteristic value may be a latent vector value expressed as the low-dimensional data.

The clustering step is a step of performing clustering using an unsupervised learning-based clustering algorithm, and various clustering algorithms can be used.

For example, the clustering algorithm can be a variety of unsupervised learning-based algorithms such as K-mean clustering, ISODATA, Mean shift, Gaussian Mixture Model, DBSCAN, and Self-organizing Map. At this time, the optimal number of clusters can be calculated using various conventionally known techniques.

As an example, when the clustering algorithm is K-mean clustering, various techniques such as the elbow technique, silhouette technique, and loss function can be used to determine the optimal number of clusters, and is not limited to a specific method.

Through the clustering algorithm, at least one cluster (CLT) that clusters the two-dimensional images (T ₁ , ..., T _K ) can be generated.

The gene extraction step is a step of deriving a gene set (G) associated with a cluster (CLT), where the gene set (G) is a gene set (G) associated with a two-dimensional image (T ₁ , ..., T _K ) belonging to the same cluster (CLT). It can refer to a set of genes (transcripts (A ₁ , …, A _M )).

Genes (transcripts (A ₁ , ..., A _M )) belonging to the gene set (G) derived from the same cluster (CLT) are genes with similar spatial distribution patterns and have anatomical/pathological/functional similarities. It can be understood as

In addition, the gene extraction step may further include a step of final selection of genes to be included in the gene set (G) based on the evaluation index for the cluster (CLT).

The above evaluation index is a validity index for the cluster (CLT) (an index for quantifying the quality of clustering), including the degree to which the data within the cluster (CLT) are aggregated, the degree of separation between clusters (CLT), and the connectivity within the cluster (CLT). Various indicators such as silhouette values and correlation coefficients can be used as a means to evaluate.

The evaluation index is an optimization tool to derive genes (transcriptomes) with more similar spatial distribution. For example, by calculating the silhouette value, only genes (transcriptomes) with positive values are included in the cluster (CLT). You can.

In addition, the gene extraction step may further include measuring the correlation between gene pairs within a cluster (CLT) by calculating a correlation coefficient in order to utilize the gene expression level (transcript expression level).

As an example, the correlation coefficient is calculated using the Spearman correlation coefficient, and transcripts (genes) that satisfy the correlation coefficient r>0.1 and p-value<0.001 are selected, and finally, a gene set (G) for each cluster is created. It can be derived.

Here, the statistical significant difference used in the optimization step of the gene set (G) can be based on a commonly used statistical cutoff. For example, a statistically significant difference may be less than or equivalent to a p-value of 0.05, 0.01, 0.005, or 0.001.

In relation to this, Figure 9 shows several clusters (CLT) with similar characteristic values by reducing the characteristic values (potential vector values) of the two-dimensional images (T ₁ , ..., T _K ) for each gene (transcript) to two dimensions. An illustration (tSNE) visualizing the clustered groups is shown.

Figure 10 is a visualization of the spatial distribution pattern of genes (e.g., ACTA2, DES, IGHA2, MYH11) belonging to the same cluster (CLT) in Figure 9 and having a similar spatial distribution pattern to the representative image of the cluster (CLT). Represents an image.

Figure 11 is a graph showing correlation evaluation between genes in the gene set (G) within the cluster (CLT) compared to the simulated gene set.

In Figure 11, the simulation starts from the 2000 gene sets with the greatest deviation (the 2000 genes with the greatest deviation between spots are extracted using spatial transcriptome data with segmentation annotation of lung normal tissue). . A random gene set was created for each cluster (CLT) equal to the total number of genes in each of the seven clusters (CLT) derived according to the present invention. When comparing the correlation of gene pairs within a cluster (CLT) with a gene set (G) that found genes (transcriptomes) with similar spatial distribution by clustering feature values compressed with low-dimensional data and a gene set randomly selected through simulation. , the gene set (G) derived according to the present invention showed higher correlation of genes (transcriptomes) within the cluster (CLT). This shows that the gene set (G) derived according to the present invention has spatially similar expression patterns.

Figure 12 shows the results of evaluating the discrimination power of the gene set (G) within the cluster (CLT) compared to the simulated gene set of Figure 11. Figure 12 calculates the signature score for each gene set for all spots by considering the gene set as one signature, and calculates the mean square (MS) and F ratio using an analysis of variance (ANOVA) test for the already known segmentation information. The value was calculated. When comparing the signature scores of the gene set (G) derived according to the present invention and the signature scores of the simulated gene sets, the signature score of the gene set (G) derived according to the present invention is the signature score of the simulated gene set. It can be seen that the divided areas can be distinguished better compared to . Therefore, it can be seen that the gene set (G) derived according to the present invention is a gene (transcriptome) with a similar spatial distribution pattern and a gene set that is highly concentrated in biological structure or function.

Figures 13a and 13b show the spatial distribution pattern of each gene (transcript) as a two-dimensional image using the present invention, and then extracting genes with similar distribution patterns to select a gene set (G) related to anatomical and functional characteristics. It shows one case.

First, the region shown in Figure 13a is the result of extracting a white matter region containing the fiber tract of the mouse brain fiber tissue when clustering based on the transcriptome data of the spots. Figure 13b shows the genetic screening method for deriving genes with similar spatial distribution patterns according to the present invention applied to the mouse brain spatial transcriptome data of Figure 13a. As a result, a gene set (G) corresponding to one cluster (CLT) The expression patterns of genes belonging to were consistent with the white matter region containing the fiber tract of the mouse brain in Figure 13a. Additionally, when gene ontology analysis was performed on the gene set (G), a gene group related to myelination (GO: 0042552) was identified.

Therefore, the gene set (G) derived through the present invention matched a specific spatial region of the mouse brain, and the characteristics of the genes correspond to genes related to myelination function expressed in the transmission pathway of fibrous tissue, which is an anatomical region of the mouse brain. You can see it happening.

Through this, it can be confirmed that the gene set (G) with similar spatial distribution derived using the present invention is enriched in the gene set (G) related to anatomical structure and functional characteristics.

Figure 14 shows an example of selecting a gene set (G) related to pathological and functional characteristics by using the present invention to create a two-dimensional image of the spatial distribution pattern for each gene (transcriptome) and then extracting genes with similar distribution patterns. It was done.

Figure 14 uses spatial transcriptome data of the mouse brain (Buzzi et al., 2021), which is public data, and this spatial transcriptome data shows that after exposing the mouse brain to various concentrations of heme, heme Gene sets that can explain molecular pathological characteristics such as exposure have been disclosed.

Using the spatial transcriptome data of the mouse brain, a gene set (G) with a similar spatial distribution pattern was extracted using the genetic screening method according to the present invention, and the heme exposure suggested by Buzzi was extracted from the gene set (G) of one cluster (CLT). It was confirmed that 15 genes out of the top 20 signature genes were extracted. Therefore, it can be seen that the gene set (G) with similar spatial distribution derived using the present invention can be used as a gene set (G) with molecular pathological and functional characteristics.

As another example, the two-dimensional images (T ₁ , ..., T _K ) generated by the two-dimensional image generating device 200 are used to compare and analyze gene expression patterns between tissue images (TI) of different tissues. It can be used for comparative analysis of gene expression.

The comparative analysis method of gene expression between tissues performed using the two-dimensional images (T ₁ , ..., T _K ) generated by the two-dimensional image generating device (200) is 2 generated by the two-dimensional image generating device (200). This is a method of comparing and analyzing gene expression patterns between tissue images (TI) of different tissues using dimensional images (T ₁ , …, T _K ).

More specifically, the comparative analysis method of gene expression between tissues performs spatial normalization on the two-dimensional images (T ₁ , ..., T _K ) generated for each of the different tissue images (TI) to produce a spatial normalized image ( A spatial normalization step of generating S ₁ , …, S _K ), and comparing the spatial normalization images (S1, …, SK) to different tissue images (TI). It may include a comparative analysis step of comparing and analyzing the gene expression patterns between pixels for each pixel.

Here, the two-dimensional image (T ₁ , ..., T _K ) is an image visualizing the spatial distribution pattern using reconstruction data, and is an image visualizing the spatial distribution pattern for a specific gene (transcript) or a spatial distribution pattern for a specific gene (transcript). It may be an image that visualizes the distribution pattern that is the sum of the expression levels of genes belonging to clusters with similar distribution patterns.

Here, genes belonging to clusters with similar spatial distribution patterns are clustered after the spots (P ₁ , ..., P _N ) constituting the spatial transcriptome data are clustered, and then the characteristics of the clustered spots (P ₁ , ..., P _N ) are determined. It may be a gene selected as a major gene or a gene with a similar spatial distribution pattern extracted by the genetic screening method described above.

Figure 15 shows clusters with similar spatial gene expression patterns to be distinguished, and Figure 16 shows two-dimensional images (T ₁ , ..., T _K ) generated for the main gene(s) showing the characteristics of the cluster in Figure 15. It shows two-dimensional images (T ₁ , ..., T _K ) generated for each of four different tissues.

Since Figure 16 is two-dimensional images (T ₁ , ..., T _K ) of four different tissues, mutual spatial comparison is difficult, but the present invention provides mutual comparison between two-dimensional images (T ₁ , ..., T _K ). Comparison between different organizations can be made possible by normalizing to make it possible.

Spatial normalization for the two-dimensional images (T ₁ , ..., T _K ) is not limited to a specific method, and as an example, the symmetric image normalization method (SyN) can be applied.

Figure 17a shows normalized images (S ₁ , ..., S _K ) generated by normalizing the two-dimensional images (T ₁ , ..., T _K ) of five different tissues, respectively. It can be seen that the dog's tissues can be compared and analyzed for each pixel.

Figure 17b is also a normalized image (S ₁ ,..., S _K ) obtained by normalizing the two-dimensional images (T ₁ , ..., T _K ) of five different tissues in Figure 17a, through which the inter-tissue tissue for a single gene or gene set can be determined. Comparative analysis of mutual expression patterns may become possible.

Figure 18a uses spatial transcriptome data (Buzzi et al., 2021) from five different mouse brains exposed to different heme concentrations. Figure 18a is also a diagram showing that spatial transcriptome data in different spaces can be compared through spatial normalization.

Through Figure 18a, the spatial expression distribution and mutual expression distribution of the Hmox1 gene corresponding to heme for five tissues (Sham, Heme 50, Heme 125, Heme 500, Heme 1000) located in different spaces and having different exposed heme concentrations. You can see the difference. When looking at the top genes of the heme exposure signature, Hmox1, Mt2, Timp1, and S100a6, the pattern mentioned by Buzzi et al. was confirmed in five different mouse brain data.

The method for comparative analysis of gene expression between tissues according to the present invention can compare and analyze spatial gene expression patterns of different tissues by comparing normalized images (S ₁ , ..., S _K ) of different tissues for each pixel.

In addition, referring to Figure 18b, correlation analysis using the Pearson correlation coefficient using the pixel values of the normalized image (S ₁ , ..., S _K ) of Hmox1, Mt2, Timp1, and S100a6, which are the top genes of the heme exposure signature in Figure 18a. As a result, it was confirmed that P-value <0.05 was significant and R >0.3. This allows you to select gene sets that show spatially similar distribution patterns by utilizing the correlation coefficient of the pixel values of the normalized image (S ₁ , …, S _K ) (or 2-dimensional image (T ₁ , …, T _K )). shows. In other words, in order to select gene groups that are spatially distributed similarly, the correlation coefficient can be obtained using pixel values and cutoff with a significant p value and R value.

The above-described genetic screening method and the comparative analysis method of gene expression between tissues may be performed in a separate computing device or may be performed in the two-dimensional image generating device 200 described above.

In addition, of course, the above-described two-dimensional image generation method, genetic screening method, and inter-tissue gene expression comparative analysis method can be implemented as a program executable on a computer.

As another example, the spatial transcriptome information analysis system 1000 according to the present invention, as shown in FIG. 1, includes a user terminal 300 and spatial transcriptome information connected to the user terminal 300 through a network. It may include an analysis device 100.

The spatial transcriptome information analysis device 100 is an analysis device for performing at least one analysis method among the above-described two-dimensional image generation method, genetic screening method, and inter-tissue gene expression comparative analysis method, and is used to analyze spatial transcriptome information. A system capable of integrated analysis can be provided.

As shown in FIG. 2, the spatial transcriptome information analysis device 100 includes location information of a plurality of spots (P ₁ , ..., P _N ) spaced apart on a tissue image (TI) and the plurality of spots (P ₁ , …, P _N ), an information receiving unit 110 that receives spatial transcriptome data consisting of transcript information (R ₁ , …, R _N ) corresponding to each of the transcript information (R ₁ , …, R A data reconstruction unit 120 that calculates reconstruction data by reconstructing the spatial transcriptome data so that the empty space between the plurality of spots (P ₁ , ..., P _N ) without _N ) is interpolated, and based on the reconstruction data It may include a transcriptome information analysis unit 130 that analyzes gene expression patterns.

The information receiving unit 110 is configured to receive spatial transcriptome data and can be configured in various ways. It receives spatial transcriptome data from the user terminal 300 or receives spatial transcriptome data from a separate database (DB, 400). Data can be received. The information receiving unit 110 may perform the function of receiving commands or requests from the user terminal 300 as well as receiving spatial transcriptome data.

The data reconstruction unit 120 uses the spatial transcriptome data to interpolate empty spaces between the plurality of spots ( _P1 , ..., _PN ) without the transcript information ( _R1 , ..., _RN ). Various configurations are possible for calculating reconstructed reconstruction data.

The principle of calculating the reconstructed data has been described in detail when explaining the two-dimensional image generating device 200, so overlapping parts will be omitted.

The transcriptome information analysis unit 130 can be configured in various configurations to analyze gene expression patterns based on reconstruction data.

The transcriptome information analysis unit 130 can use the reconstruction data to analyze gene expression patterns within tissues or to compare and analyze gene expression patterns between different tissues.

That is, the gene expression pattern here may be a gene expression pattern of the same tissue or a gene expression pattern of a different tissue.

Referring to FIG. 2, the transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts feature values of the reconstructed data, and a cluster (CLT) that clusters the reconstructed data based on the similarity of feature values. It may include a clustering unit 134 that generates.

For example, the feature extraction unit 132 may extract the feature value by reducing the reconstructed data to low-dimensional data.

As an example, the feature extraction unit 132 may extract the feature value using a dimension reduction algorithm that reduces the reconstructed data to low-dimensional data.

The dimensionality reduction algorithm is not limited to a specific algorithm, and examples may include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), etc.

As another example, referring to FIG. 6, the feature extraction unit 132 is a feature extractor that extracts feature values of the reconstructed data, and may include an artificial neural network model (ANN) that compresses the reconstructed data into low-dimensional data. there is.

The clustering unit 134 can perform clustering using an unsupervised learning-based clustering algorithm and derive a gene set (G) associated with the cluster (CLT).

Additionally, the clustering unit 134 may finally select genes to be included in the gene set (G) based on at least one of the silhouette value and correlation coefficient of the cluster (CLT).

Meanwhile, the spatial transcriptome information analysis device 100 produces a two-dimensional image (T ₁ , _T ₁ , ..., T _K ) may additionally include an image generator 140 that generates.

The image generator 140 is configured to visualize the spatial distribution of the plurality of transcripts (A ₁ , ..., A _M ). Gene expression pattern analysis is possible only with reconstructed data, and it is possible in cases where a visualization image is not necessary. Therefore, of course, in this case, the image generator 140 can be omitted.

Here, the image generator 140 may be configured identically or similarly to the two-dimensional image generator 200 described in detail above, so detailed description will be omitted to the extent of overlap.

When comparative analysis of gene expression patterns between different tissues is required, the image generator 140 generates the two-dimensional images (T ₁ , ..., T _K ) for each of the different tissue images (TI). can be created.

At this time, the spatial transcriptome information analysis device 100 performs spatial normalization on the two-dimensional images (T ₁ , ..., T _K ) to generate spatial normalized images (S ₁ , ..., S _K ). A normalization unit 150 may be additionally included.

The spatial normalization unit 150 is configured to perform spatial normalization on the two-dimensional images (T ₁ , …, T _K ) to generate spatial normalization images (S ₁ , …, S _K ), and can be configured in various ways. And, spatial normalization for the two-dimensional images (T ₁ , ..., T _K ) is not limited to a specific method. As an example of a spatial normalization method, the symmetric image normalization method (SyN) can be applied.

At this time, the transcriptome information analysis unit 130 compares the spatial normalization images (S ₁ , ..., S _K ) with respect to the different tissue images (TI) to determine the different tissue images (TI). Gene expression patterns can be compared and analyzed for each pixel.

The two-dimensional image spatial normalization method through the spatial normalization unit 150 and the gene expression pattern comparative analysis method through the transcriptome information analysis unit 130 were previously described in detail in the inter-tissue gene expression comparative analysis method in the overlapping range. Detailed explanation is omitted.

In addition, the spatial transcriptome information analysis device 100 may further include an information transmission unit 160 for transmitting the gene expression pattern analysis results to a database (DB, 400) or a user terminal 300.

The above-mentioned spatial transcriptome information analysis device 100 is a device for providing a method of analyzing spatial transcriptome data by reconstructing it into reconstructed data. The spatial transcriptome information analysis device 100 is an integrated spatial transcriptome using reconstructed data. It can provide a means for analyzing transcriptome information.

That is, the spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100 may include at least one of a two-dimensional image generation method using reconstructed data, a genetic screening method, and a comparative analysis method of gene expression between tissues. You can.

Referring to Figure 3, as an example, the spatial transcriptome analysis method according to the present invention is a two-dimensional image generation method, and includes a receiving step (S301) of receiving spatial transcriptome data, and converting the spatial transcriptome data into reconstruction data. It may include a data reconstruction step (S302) of reconstruction and a two-dimensional image generation step (S302) of generating a two-dimensional image visualizing transcript distribution information (gene distribution information) using the reconstruction data.

In addition, as another example, the spatial transcriptome analysis method according to the present invention is a genetic screening method, and includes a reception step (S301) of receiving spatial transcriptome data, and a data reconstruction step of reconstructing the spatial transcriptome data into reconstruction data ( S302) and a gene extraction step (S304) of extracting genes with similar spatial distribution using the reconstruction data. At this time, the genetic screening method additionally includes a two-dimensional image generation step (S302) of generating a two-dimensional image (T ₁ , ..., T _K ) visualizing the transcript distribution information (gene distribution information) using the reconstruction data. It can be included.

In addition, as another example, the spatial transcriptome analysis method according to the present invention is a comparative analysis method of gene expression between tissues, and includes a receiving step (S301) of receiving spatial transcriptome data for each different tissue image (TI), A data reconstruction step (S302) in which spatial transcriptome data is reconstructed into reconstruction data, and a two-dimensional image generation step in which the two-dimensional images (T ₁ , ..., T _K ) are generated for each of the different tissue images (TI). (S302) and a spatial normalization step of performing spatial normalization on the two-dimensional images (T ₁ , …, T _K ) to generate spatial normalized images (S ₁ , …, S _K ), and different tissue images ( It includes a comparative analysis step (S305) of comparing and analyzing the gene expression patterns between the different tissue images (TI) for each pixel by comparing the spatial normalized images (S ₁ , ..., S _K ) for the TIs. can do.

The spatial transcriptome information analysis method performed using the spatial transcriptome information analysis device 100 described above can be implemented through a computer-executable spatial transcriptome information analysis program.

Since the above is only a description of some of the preferred embodiments that can be implemented by the present invention, as is well known, the scope of the present invention should not be construed as limited to the above embodiments, and the scope of the present invention described above Both the technical idea and the technical idea underlying it will be said to be included in the scope of the present invention.

Claims

Location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information (R 1 , ..., corresponding to each of the plurality of spots (P 1 , ..., P N )) An information reception unit 110 that receives spatial transcriptome data consisting of R N );

Data reconstruction to calculate reconstructed data in which the spatial transcriptome data is reconstructed so that empty spaces between the plurality of spots (P 1 , ..., P N ) without the transcript information (R 1 , ..., R N ) are interpolated. Boo (120) and;

It includes a transcriptome information analysis unit 130 that analyzes gene expression patterns based on the reconstruction data,

The transcriptome information (R 1 , …, R N ) is a spatial transcriptome information analysis device (100) characterized in that it includes information on the expression level of each of the plurality of transcripts (A 1 , …, A M ). ).
In claim 1,

The spatial transcriptome information analysis device (100), wherein the gene expression pattern is a gene expression pattern of the same tissue as the tissue image or a gene expression pattern of a different tissue.
In claim 1,

The reconstruction data is such that, for each of the plurality of transcripts (A 1 , ..., A M ) , the expression level is determined by the central coordinates (C 1 , ..., C N A spatial transcriptome information analysis device (100) characterized by including transcriptome distribution information reconstructed by assuming that it is distributed along a continuous probability distribution centered on ).
In claim 2,

The continuous probability distribution is a spatial transcriptome information analysis device (100), characterized in that the central coordinates (C 1 , ..., C N ) are the central values and a normal distribution with a preset variance value.
In claim 3,

The spatial transcriptome information analysis device 100 produces a two-dimensional image (T 1 , ..., T K visualizing the spatial distribution of the plurality of transcripts (A 1 , ..., A M ) from the transcript distribution information ) Spatial transcriptome information analysis device (100), characterized in that it additionally includes an image generator (140) that generates.
In claim 1,

The transcriptome information analysis unit 130 includes a feature extraction unit 132 that extracts characteristic values of the reconstructed data, and a clustering unit that generates a cluster (CLT) that clusters the reconstructed data based on the similarity of the characteristic values. A spatial transcriptome information analysis device (100) comprising a unit (134).
In claim 6,

The feature extraction unit 132 is a spatial transcriptome information analysis device 100, characterized in that the feature value is extracted by reducing the reconstructed data to low-dimensional data.
In claim 6,

The feature extraction unit 132 includes an artificial neural network model that compresses the reconstructed data into low-dimensional data,

The artificial neural network model is a spatial transcriptome information analysis device (100) characterized in that the reconstruction data is used as learning data.
In claim 8,

The spatial transcriptome information analysis device (100), wherein the characteristic value is a latent vector value expressed by the low-dimensional data.
In claim 6,

The clustering unit 134 is a spatial transcriptome information analysis device 100, characterized in that clustering is performed using an unsupervised learning-based clustering algorithm.
In claim 6,

The clustering unit 134 is a spatial transcriptome information analysis device 100, characterized in that the gene set (G) associated with the cluster (CLT) is derived.
In claim 11,

The clustering unit 134 is a spatial transcriptome information analysis device characterized in that it finally selects genes to be included in the gene set (G) based on at least one of the silhouette value and the correlation coefficient of the cluster (CLT). 100).
In claim 5,

The image generator 140 generates the two-dimensional images (T 1 , ..., T K ) for each of the different tissue images (TI),

The spatial transcriptome information analysis device 100 includes a spatial normalization unit that performs spatial normalization on the two-dimensional images (T 1 , …, T K ) to generate spatial normalized images (S 1 , …, S K ). Additionally comprising (150),

The transcriptome information analysis unit 130 compares the spatial normalized images (S 1 , ..., S K ) for the different tissue images (TI) to determine the difference between the different tissue images (TI). A spatial transcriptome information analysis device (100) characterized by comparative analysis of gene expression patterns.
A spatial transcriptome information analysis device (100) according to any one of claims 1 to 13;

A spatial transcriptome information analysis system (1000) comprising a user terminal (300) connected to the spatial transcriptome information analysis device (100) through a network.
A spatial transcriptome information analysis method using the spatial transcriptome information analysis device 100 according to any one of claims 1 to 13.
A computer-executable spatial transcriptome information analysis program for performing the spatial transcriptome information analysis method according to claim 15.
Location information of a plurality of spots (P 1 , ..., P N ) spaced apart on a tissue image (TI) and transcript information (R 1 , ..., corresponding to each of the plurality of spots (P 1 , ..., P N )) An image generation device 200 that generates a two-dimensional image (T 1 , ..., T K ) of transcript distribution using reconstructed data that reconstructs spatial transcriptome data composed of R N ),

The reconstructed data is two-dimensional data, characterized in that the empty space between the plurality of spots (P 1 , ..., P N ) without the transcript information (R 1 , ..., R N ) is interpolated. Image generating device (200).
In claim 17,

In the reconstructed data, the expression level of each of the plurality of transcripts (A 1 , ..., AM ) is determined by the central coordinates (C 1 , ..., C N ) of the plurality of spots (P 1 , ..., P N ). A two-dimensional image generating device (200) characterized by including transcriptome distribution information reconstructed by assuming that it is distributed along a continuous probability distribution centered on the center.
Genetic screening for extracting genes with similar spatial distribution using the two-dimensional images (T 1 , ..., T K ) generated by the two-dimensional image generating device 200 according to any one of claims 17 and 18. method.
In comparison with the two-dimensional images (T1, ..., TK) generated for the tissue images (TI) of different tissues in the two-dimensional image generating device (200) according to any one of claims 17 and 18, the different A comparative analysis method of gene expression between tissues that compares and analyzes gene expression patterns between tissue images (TI).