CN115797926A

CN115797926A - Space region typing method and device of mass spectrum imaging graph and electronic equipment

Info

Publication number: CN115797926A
Application number: CN202211431340.0A
Authority: CN
Inventors: 谢桂纲; 陆子含; 黄银
Original assignee: Suzhou Bionovogene Biomedical Technology Co ltd
Current assignee: Suzhou Bionovogene Biomedical Technology Co ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2023-03-14

Abstract

The embodiment of the application provides a method and a device for typing a space region of a mass spectrum imaging graph, electronic equipment and a computer readable storage medium, and relates to the field of image processing. The method comprises the following steps: obtaining a feature vector of the pixel point according to a metabolism detection result of the pixel point; clustering the pixel points according to the characteristic vector of each pixel point to obtain a plurality of cluster clusters, wherein each cluster comprises at least one pixel point; taking the pixel points of the undetermined cluster as target pixel points, and determining the cluster of the target pixel points according to the distance and the composition similarity between the target pixel points and other pixel points in the surrounding preset range and the cluster of the other pixel points; and determining a spatial region typing result according to the clustering cluster where each pixel point is located. The embodiment of the application can enhance the accuracy of clustering, and generate the spatial typing classification result which is low in noise and accords with the real biological tissue morphological classification.

Description

Space region typing method and device of mass spectrum imaging graph and electronic equipment

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for typing a spatial region of a mass spectrometry imaging diagram, an electronic device, and a storage medium.

Background

The mass spectrometry technology is one of the leading-edge technologies in the field of precision instrument analysis, is rapidly developed in the field of clinical detection in recent years, can replace the conventional methodology in multiple fields of biochemical immunity, drug metabolism, microorganisms, pathological diagnosis, molecules and the like, for example, compared with gene sequencing, the mass spectrometry is suitable for detection of various molecules, including biomacromolecules such as nucleic acid, polypeptide and the like, biological micromolecules such as metabolites, hormones, vitamins and the like, and trace inorganic elements, and can realize simultaneous qualitative and quantitative determination of thousands of markers.

The single pixel point is close to a single cell in a mass spectrum imaging graph, each cell comprises a plurality of metabolite ions, the mass spectrum imaging graph is an image formed by stacking different ion layer data, and each ion layer data comprises the signal intensity of one metabolite ion in each cell of the biological sample. The original data of the mass spectrum imaging graph forms a data base for carrying out space metabonomics data analysis, in the corresponding data analysis, a tissue specific space region needs to be automatically divided according to a certain method, and corresponding omics analysis can be carried out according to the divided result, but the currently obtained typing result is not accurate, so that the subsequent omics analysis is influenced.

Disclosure of Invention

Embodiments of the present application provide a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for typing a spatial region of a mass spectrometry imaging map, which can solve the above problems in the prior art. The technical scheme is as follows:

according to an aspect of the embodiments of the present application, there is provided a method for spatial region typing of an imaging mass spectrum, each pixel point in the imaging mass spectrum being used for characterizing a metabolic detection result of a single cell in a biological sample, the metabolic detection result including a signal intensity of at least one metabolite ion, the method including:

obtaining a feature vector of the pixel point according to the metabolism detection result of the pixel point;

clustering the pixel points according to the characteristic vector of each pixel point to obtain a plurality of cluster clusters, wherein each cluster comprises at least one pixel point;

taking pixel points of undetermined belonged clusters as target pixel points, and determining the cluster of the target pixel points according to the distance and composition similarity between the target pixel points and other pixel points in a surrounding preset range and the cluster of the other pixel points;

and determining a space region typing result according to the cluster where each pixel point is located.

As an optional embodiment, determining the cluster where the target pixel point is located according to the distance and the composition similarity between the target pixel point and other pixel points in the surrounding preset range, and the cluster where the other pixel points are located includes:

sequencing the distances between the target pixel point and other pixel points, and determining the sequencing result of other pixel points;

obtaining corresponding weights according to the sorting results of the other pixel points, wherein the weights are in direct proportion to the sorting results, and obtaining the association degrees of the other pixel points and the target pixel point according to the weights and the composition similarity of the other pixel points;

and determining the cluster where the target pixel point is located according to the cluster where other pixel points with the highest relevance degree are located.

As an optional embodiment, obtaining the feature vector of the pixel point according to the metabolic detection result of the pixel point includes:

determining the distribution condition of the signal intensity of the metabolite ions according to the signal intensity of the metabolite ions in each cell, cutting the signal intensity of the metabolite ions according to the distribution condition, and normalizing the cut signal intensity to obtain the updated signal intensity of the metabolite ions;

and taking the updated signal intensity of each metabolite ion corresponding to the pixel point as an initial feature vector of the pixel point, and performing dimensionality reduction on the initial feature vector to obtain the feature vector of the pixel point, wherein the feature vector is used for representing the difference of the metabolite ion composition between the pixel point and other pixel points.

As an optional embodiment, the composition similarity is a similarity between the types of metabolite ions included in the target pixel and other pixels or a similarity between feature vectors of the target pixel and other pixels.

As an alternative embodiment, determining the distribution of the signal intensity of the metabolite ions according to the signal intensity of the metabolite ions in each cell, and clipping the signal intensity of the metabolite ions according to the distribution comprises:

for each metabolite ion, performing data equal-width binning according to the signal intensity of the metabolite ion in each cell, wherein each bin is used for counting the proportion of the number of cells in a signal intensity range to the total number of cells;

and determining the upper limit value of the signal intensity of the metabolite ions according to the proportion corresponding to each box, and cutting the signal intensity of the metabolite ions according to the upper limit value.

As an optional embodiment, clustering each pixel according to a feature vector of each pixel to obtain a plurality of cluster clusters, including:

determining the similarity between the feature vectors of every two pixel points;

obtaining a similarity matrix according to the similarity between the feature vectors of every two pixel points, wherein elements in the similarity matrix are used for representing the similarity between the feature vectors of every two pixel points;

taking the element with the similarity larger than the similarity threshold value in the similarity matrix as a target element, and constructing a relationship network graph according to the target element, wherein two nodes with a connection relationship in the relationship network graph are used for representing two pixel points in one target element;

and clustering the nodes in the relational network graph according to a preset graph clustering algorithm.

As an alternative embodiment, determining the upper limit value of the signal intensity of the metabolite ion according to the proportion corresponding to each bin includes:

and accumulating the proportions corresponding to the sub-boxes according to the sequence of the signal intensity range from small to large, and taking the minimum value of the signal intensity in the sub-box corresponding to the accumulated last proportion as the upper limit value when the accumulated value reaches a preset value.

As an optional embodiment, determining a spatial region classification result according to the traversed cluster where each pixel point is located includes:

reducing the dimension of the feature vector of each pixel point to a three-dimensional feature vector, and determining the position of the pixel point in a three-dimensional virtual space according to the three-dimensional feature vector;

determining the display style of each cluster in the three-dimensional virtual space;

and drawing a three-dimensional effect graph as a space region typing result according to the position of each pixel point in the three-dimensional space system and the display style corresponding to the clustering where the pixel point is located.

According to another aspect of the embodiments of the present application, there is provided a device for spatially classifying a mass spectrometry imaging map, each pixel point in the mass spectrometry imaging map being used for characterizing a metabolic detection result of a single cell in a biological sample, the metabolic detection result including a signal intensity of at least one metabolite ion, the device including:

the characteristic vector extraction module is used for obtaining the characteristic vector of the pixel point according to the metabolism detection result of the pixel point;

the clustering module is used for clustering the pixel points according to the characteristic vector of each pixel point to obtain a plurality of clustering clusters, and each clustering cluster comprises at least one pixel point;

the filling module is used for taking the pixel points of the cluster to which the target pixel point belongs as target pixel points, and determining the cluster to which the target pixel point belongs according to the distance and the composition similarity between the target pixel point and other pixel points in a surrounding preset range and the cluster to which the other pixel points belong;

and the parting module is used for determining a space region parting result according to the cluster where each pixel point is located.

According to another aspect of embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the steps of the method for spatial region typing of an imaging profile of a mass spectrometer provided by the above aspect.

According to a further aspect of embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the method for spatial region typing of a mass spectrometry profile provided by the above aspects.

According to an aspect of embodiments of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method for spatial region typing of a mass spectrometry profile provided by the above aspect.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

according to the embodiment of the application, the characteristic vectors of the pixel points are obtained according to the metabolism detection results of the pixel points, the pixel points are clustered based on the characteristic vectors of the pixel points to obtain a plurality of clustering clusters, and the clustering clusters where the target pixel points are located are determined through clustering clusters where other pixel points in a preset range around the clustering clusters and clustering and composition similarity of other pixel points and the target pixel points, so that the clustering accuracy can be enhanced, and the spatial typing classification results which are low in noise and accord with real biological tissue morphological division are generated.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic diagram of a system architecture for implementing a spatial region classification method for a mass spectrometry imaging chart according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for typing a spatial region of a mass spectrometry imaging chart according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a dimension reduction provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart of a spatial region classification method according to another embodiment of the present application;

FIG. 5 is a schematic structural diagram of a spatial region separation apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Several terms referred to in this application will first be introduced and explained:

and (3) separating metabolite ions with different mass-to-charge ratios (m/z) by a mass analyzer, detecting and recording the metabolite ions by a detector, and representing the metabolite ions in the form of a mass spectrogram after computer processing. In a mass spectrogram, the abscissa represents the mass-to-charge ratio of the metabolite ions, the value of the mass-to-charge ratio increases from left to right, and for ions with single charge, the numerical value represented by the abscissa is the mass of the metabolite ions; the ordinate represents the signal intensity of the ion current.

In the space metabonomics technology, an instrument is used for scanning cells on a biological tissue slice according to a certain space arrangement sequence, and each cell can be scanned to obtain a mass spectrogram. Each mass spectrogram is equivalent to a pixel point on a two-dimensional plane on a mass spectrometry imaging graph. And arranging the mass spectrogram set with the spatial sequence coordinate information obtained by scanning on a two-dimensional plane according to the spatial coordinate information to generate mass spectrum imaging graph data in the spatial metabonomics.

Dimensionality reduction, which is the process of converting a high-dimensional dataset into a comparable low-dimensional space. Common dimensionality reduction methods include principal component analysis, independent component analysis, factor-en analysis, linear discriminant method, and the like.

Clustering, the process of dividing a collection of physical or abstract objects into classes composed of similar objects, is called clustering. The clusters generated by clustering are a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters.

The spatial region classification is to perform region segmentation on biological organ tissues according to certain data difference aiming at a biological organ tissue slice image plane to divide biological tissue regions of different types.

The present application provides a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for typing a spatial region of a mass spectrometry imaging chart, which are intended to solve the above technical problems in the prior art.

The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps, etc. in different embodiments is not repeated.

Fig. 1 is a schematic diagram of a system architecture for implementing a spatial region classification method for a mass spectrometry imaging chart according to an embodiment of the present application, which may be specifically applied to a medical analysis scenario, and the system includes: the system comprises a terminal 101, a mass spectrometer 102 and a network server 103, wherein a user carries out mass spectrometry on a biological sample to be analyzed through the mass spectrometer 102, the mass spectrometer sends a mass spectrometry result, namely a mass spectrometry imaging chart, to the terminal 101, the terminal 101 sends the mass spectrometry imaging chart to the server 103, the server 103 obtains a space region typing result according to the space region typing method of the mass spectrometry imaging chart, the server 103 returns the space region typing result to the terminal 101, and the terminal 101 displays the space region typing result to the user.

In the embodiment of the present application, a method for typing a spatial region of a mass spectrometry imaging diagram is provided, as shown in fig. 2, the method includes S101 to S105.

S101, obtaining a feature vector of the pixel point according to the metabolism detection result of the pixel point.

Because the metabolic detection result comprises the signal intensity of at least one metabolite ion, all metabolite ions existing in the mass spectrum imaging graph can be counted, then each metabolite ion corresponds to one dimension of the feature vector, and the signal intensity of the metabolite ion is used as the feature value of the dimension, so that the feature vector of the pixel point is obtained.

S102, clustering the pixel points according to the characteristic vectors of the pixel points to obtain a plurality of clustering clusters, wherein each clustering cluster comprises at least one pixel point.

After the feature vectors of the pixel points are obtained, clustering can be performed according to the low-dimensional feature vectors, and the embodiment of the present invention does not limit the specific method for clustering, for example, a graph clustering method can be adopted, specifically, row data in a matrix after dimension reduction is used as a data object, and pairwise cos similarity calculation between the row data and the row data is performed to obtain a similarity matrix. After elements lower than the threshold in the similarity matrix are deleted according to a certain threshold, the remaining elements can generate a network graph, and in the network graph, pixel points in mass spectrum imaging data, namely row data in the original matrix, are network nodes. And processing the whole network map data based on a certain network community discovery algorithm to obtain a clustering result of the pixel points.

In an embodiment, after clustering, all the pixels can be traversed, and for each traversed pixel, the cluster of the pixel is updated according to the cluster where other pixels in the preset range around the pixel are located until the cluster of all the pixels is not updated any more.

It should be noted that the clustering result determined by the clustering algorithm may be relatively coarse, and there are some noise pixel points distributed in the neighboring area, scattered, and the cluster category is relatively solved, so in the embodiment of the present application, the cluster where all other pixel points within the preset radius around the pixel point are located is counted for each pixel point, and the cluster with the largest number of times of counting is updated to the cluster of the pixel point. It should be understood that, as traversal is performed, clustering clusters of pixel points may change, in the embodiment of the present application, pixel points may be traversed according to a row sequence, and then pixel points may be traversed according to a column sequence until clustering clusters of all pixel points are not updated, and at this time, a clustering cluster where each pixel point is located is really determined.

S103, taking the pixel points of the undetermined cluster as target pixel points, and determining the cluster of the target pixel points according to the distance and the composition similarity between the target pixel points and other pixel points in the surrounding preset range and the cluster of the other pixel points.

The clustering algorithm cannot guarantee that all the pixel points are clustered, a small number of unsorted pixel points still exist, the pixel points which are not determined to be clustered are used as target pixel points, for each target pixel point, the clustering cluster where the target pixel point is located is determined based on the clustering cluster where other pixel points in the surrounding preset range are located and the clustering and composition similarity of other pixel points and the target pixel point, and the clustering accuracy can be enhanced.

The composition similarity of the embodiment of the present application can be characterized by the similarity of the types of metabolite ions included by the pixel points, for example, a certain target pixel point includes three metabolite ions of a, B, and C, and another certain pixel point includes four metabolite ions of a, B, C, and D, and the similarity between the two is 75%.

And S104, determining a spatial region typing result according to the cluster where each pixel point is located.

After the cluster where each pixel point is located is determined, each pixel point in the mass spectrum imaging graph can be rendered based on the preset display style of each cluster, and therefore a spatial region typing result is obtained. In one embodiment, different colors can be set for different clustering clusters, each pixel point in a mass spectrum imaging graph is rendered according to the color of the clustering cluster where the pixel point is located, the size of a finally obtained control region typing result is consistent with that of the mass spectrum imaging graph, and the user can visually see the difference of regions in a biological sample because each pixel point has color.

On the basis of the foregoing embodiments, as an optional embodiment, determining a cluster in which the target pixel point is located according to the distance and composition similarity between the target pixel point and other pixel points in a surrounding preset range, and the cluster in which the other pixel points are located includes:

s201, sequencing the distances between the target pixel point and other pixel points, and determining the sequencing result of other pixel points;

s202, obtaining corresponding weights according to the sorting results of the other pixel points, wherein the weights are in direct proportion to the sorting results, and obtaining the association degrees of the other pixel points and the target pixel point according to the weights and the composition similarity of the other pixel points;

s203, determining the cluster where the target pixel point is located according to the cluster where other pixel points with the highest association degree are located.

For the target pixel point, the distances between the target pixel point and other pixel points are firstly sequenced, and the sequencing results of other pixel points are determined, it should be understood that the unit distances between 8 pixel points with 1 unit distance around a central pixel point and the central pixel point are all 1, so that the sequencing results of the 8 pixel points are consistent and are listed as 1, the other pixel points with two unit distances around the central pixel point are 16 in total, and the sequencing results of the 16 pixel points are consistent and are listed as 2.

The corresponding weight can be determined after the sequencing result is determined, the reciprocal of the sequencing result can be used as the weight, for example, the sequencing result is 2, the weight is 1/2, and the influence on the clustering of the target pixel point is smaller when the weight of other pixel points which are farther away from the target pixel point is lower.

The method and the device for determining the composition similarity of the target pixel point and the target pixel point comprise the steps of weighting the composition similarity through weight to obtain the association between the other pixel points and the target pixel point, and specifically, determining the association by using a formula of (1/rank (D)). Times, wherein D represents the distance between the target pixel point and the other pixel points, rank (D) represents the sorting result of the distance, and cos represents the composition similarity.

In an embodiment, the cluster where the other pixel point with the highest relevance is located may be used as the cluster where the target pixel point is located.

In an embodiment, the relevance degrees may be sorted in a descending order, if the total number of other pixel points is K, the number of the other pixel points that are K/2 of the top in the queue is taken, and the cluster that occupies the most of the other pixel points is taken as the cluster of the target pixel point.

On the basis of the foregoing embodiments, as an optional embodiment, obtaining the feature vector of the pixel point according to the metabolic detection result of the pixel point includes:

s301, determining the distribution condition of the signal intensity of the metabolite ions according to the signal intensity of the metabolite ions in each cell, cutting the signal intensity of the metabolite ions according to the distribution condition, and normalizing the cut signal intensity to obtain the updated signal intensity of the metabolite ions;

s302, using the updated signal intensity of each metabolite ion corresponding to the pixel point as an initial feature vector of the pixel point, and performing dimensionality reduction on the initial feature vector to obtain the feature vector of the pixel point, wherein the feature vector is used for representing the difference of the metabolite ion composition between the pixel point and other pixel points.

It should be understood that each metabolite ion in the examples of the present application has a unique mass-to-charge ratio. Because a metabolite ion exists in a plurality of cells, the signal intensity of each metabolite ion in each cell can be counted, the distribution condition of the signal intensity of the metabolite ion is determined, and then an abnormal signal which is too high or too low is determined based on the distribution condition, so that the clipped signal intensity can represent the signal distribution characteristics of most pixel points.

The embodiment of the application can count the number of the pixel points in different signal intensity ranges to represent the distribution condition of the metabolite ions, and can further process the pixel points in different signal intensity ranges after counting the number of the pixel points, so that the distribution condition of the metabolite ions is represented by the processing result, and the embodiment of the application is not specifically limited.

After the signal intensity clipping is completed, the embodiment of the present application may perform normalization processing on the signal intensity, for example, the normalization processing may be implemented based on a total peak area normalization method. In one embodiment, for a metabolite ion, the signal intensity of the metabolite ion in each row of pixel points of the mass spectrum imaging graph can be normalized, and finally all the signal intensities distributed in the range of 100 to 1e8 signal intensities are normalized to [0,1 ].

Through the processing of the steps, the metabolic detection result represented by each pixel point comprises the normalized signal intensity (namely, updated signal intensity) of each metabolite ion, and the updated signal intensity of one metabolite ion is used as the feature of one dimension of the pixel point, so that the dimension quantity of the initial feature vector of one pixel point is the quantity of all metabolite ions. Generally, the dimensionality of the initial feature vector is generally more than ten thousand dimensionalities, and belongs to high-dimensionality data, and if the initial feature vector is directly used for carrying out cluster analysis on pixel points, the problem of overfitting is easy to occur, so that the dimensionality reduction of the initial feature vector is considered in the embodiment of the application.

It can be understood that, if a certain metabolite ion does not appear in a pixel, the feature value of the corresponding dimension of the metabolite ion of the pixel is a preset value, for example, 0. For example, if 10000 metabolite ions are present in the mass spectrometry imaging graph and 700 metabolite ions are present in a certain pixel point, the initial feature vector of the pixel point is 10000 dimensions, wherein the feature value of 9300 dimensions is a preset value, and the feature value of 700 dimensions is the updated signal intensity of the corresponding 700 metabolite ions, so that the initial feature vector can be found to be relatively sparse.

The specific method of dimension reduction in the embodiment of the present application is not particularly limited, and may be, for example, unified Manifold Approximation and Projection (UMAP), principal Component Analysis (PCA), or t-Distributed Stochastic Neighbor Embedding (t-SNE).

The feature vector after dimensionality reduction is used for representing the difference between the pixel point and the metabolite ion composition of other pixel points, the signal intensity of one metabolite ion is not simply represented by the feature value of each dimensionality any more, the dimensionality number is greatly reduced, and in one embodiment, the dimensionality can be reduced from ten thousand to 400 dimensions.

Referring to fig. 3, which schematically illustrates a dimension reduction according to an embodiment of the present application, as shown in the figure, mz1, mz2, \8230, mzi represents i metabolite ions, pixel1, pixel2, \8230, and pixel j represents j pixel points in a mass spectrum imaging graph, before dimension reduction, signal intensities of all metabolite ions in all pixel points may form an original matrix, where each element represents a signal intensity of a metabolite ion in one pixel point, and the original matrix is a sparse matrix. The columns of the matrix after dimension reduction do not correspond to metabolite ions any more, but are updated to dim1, dim2 and\8230, dim800, namely characteristic values of 800 dimensions, and although the meaning represented by each column is different, the characteristic values can still be used for clustering among pixel points subsequently.

According to the spatial region typing method, the signal intensity of the metabolite ions is cut, the signal intensity after cutting is subjected to normalization processing, the updated signal intensity of the metabolite ions is obtained, the problem of abnormal signal intensity can be solved, the magnitude difference can be eliminated, the updated signal intensity of each metabolite ion corresponding to the pixel point is used as the initial characteristic vector of the pixel point, dimension reduction processing is carried out on the initial characteristic vector, the characteristic vector of the pixel point is obtained, the foundation is laid for avoiding overfitting of subsequent clustering, and meanwhile the accuracy of clustering can be effectively improved.

On the basis of the above embodiments, as an optional embodiment, the composition similarity is a similarity between the types of metabolite ions included in the target pixel point and other pixel points or a similarity between feature vectors of the target pixel point and other pixel points. It should be noted that the feature vector in the embodiment of the present application refers to a feature vector after the dimension reduction processing.

On the basis of the above embodiments, as an alternative embodiment, determining the distribution of the signal intensity of the metabolite ions according to the signal intensity of the metabolite ions in each cell, and tailoring the signal intensity of the metabolite ions according to the distribution comprises:

For each metabolite ion, a plurality of bins may be set according to the distribution interval of the signal intensity of the metabolite ion in each cell, for example, if the distribution interval of the signal intensity is 10 to 10000, a sub-interval of 0 to 1000 may be set as a first bin, a sub-interval of 1001 to 2000 may be set as a second bin, \8230, a sub-interval of 9001 to 10000 may be set as a last bin, after the bins are set, the sub-distribution interval of which the signal intensity of the metabolite ion belongs to each cell may be counted, after the bins to which the signal intensity of the metabolite ion belongs of all cells are counted, the proportion of the cells in each bin to the total number of cells may be counted, the proportions corresponding to each bin may be accumulated according to the order of smaller to larger sub-distribution intervals corresponding to each bin, and the upper limit value of the signal intensity corresponding to the bin when the probability is accumulated to 0.65 may be obtained according to a preset distribution probability threshold value, for example, 0.65, and all the signal intensities higher than the upper limit value may be clipped to the lower limit value.

On the basis of the above embodiments, as an alternative embodiment, the determining the upper limit value of the signal intensity of the metabolite ion according to the corresponding proportion of each bin includes:

and accumulating the proportions corresponding to the sub-boxes according to the sequence of the signal intensity range from small to large, and taking the minimum value of the original signal value in the sub-box corresponding to the accumulated last proportion as an upper limit value when the accumulated value reaches a preset value.

For example, if there are 6 bins 1 to 6, the sub-distribution section corresponding to the bin 1 to 6 gradually increases, if the proportion of the bin 1 is 15%, the proportion of the bin 2 is 18%, the proportion of the bin 3 is 19%, the proportion of the bin 4 is 23%, the proportion of the bin 5 is 15%, the proportion of the bin 6 is 10%, and if the preset value is 65%, the accumulation is started from the proportion of the bin 1, and when the accumulated value reaches the bin 3, the accumulated proportion reaches 52%, the accumulated proportion does not exceed the preset value, and after the bins 4 are accumulated, the accumulated proportion reaches 75% and exceeds 65%, so the bin 4 is taken as the bin corresponding to the last accumulated proportion, and then the minimum signal intensity in the bin 4 is taken as the upper limit value. Assuming that the upper limit of a metabolite ion is 10000, if the signal intensity of the metabolite ion in a cell is 9000, no adjustment is required, and if the signal intensity of the metabolite ion in a cell is 11000, the signal intensity needs to be updated to 10000.

On the basis of the foregoing embodiments, as an optional embodiment, clustering the pixels according to the feature vectors of the pixels to obtain a plurality of cluster clusters includes:

s401, determining the similarity between the feature vectors of every two pixel points;

s402, obtaining a similarity matrix according to the similarity between the feature vectors of every two pixel points, wherein elements in the similarity matrix are used for representing the similarity between the feature vectors of every two pixel points;

s403, taking the element with the similarity larger than the similarity threshold value in the similarity matrix as a target element, and constructing a relationship network graph according to the target element, wherein two nodes with a connection relationship in the relationship network graph are used for representing two pixel points in one target element;

s404, clustering the nodes in the relational network graph according to a preset graph clustering algorithm.

The method for calculating the similarity between feature vectors in the embodiment of the present application is not particularly limited, and for example, cosine similarity, euclidean distance, mahalanobis distance, and the like may be used. It should be noted that, in an embodiment, the cosine similarity is used to measure the pixel points between the feature vectors in the embodiment of the present application, and the cosine similarity can effectively reduce the misjudgment caused by the adoption of the jaccard similarity.

After the similarity of the feature vectors of two pixel points is obtained, a similarity matrix can be obtained, it can be understood that if the total number of the pixel points is N, the size of the similarity matrix is N × N, and elements in the similarity matrix are used for representing the similarity between the feature vectors of two pixel points.

After the similarity matrix is obtained, each element in the similarity matrix can be screened based on a preset similarity threshold, a target element larger than the similarity threshold is screened from the similarity matrix, each target element corresponds to two pixel points, and a relational network graph can be established based on all the pixel points corresponding to all the target elements. The nodes in the relational network graph are pixel points in the target elements, the coordinates of the corresponding pixel points are recorded by the nodes, the similarity represented by the two nodes with the connection relation is greater than a preset threshold, and the similarity of the two nodes is recorded by the edge connecting the two nodes.

The graph clustering algorithm in the embodiment of the present application is not particularly limited, and may be, for example, a louvain algorithm or other graph node community clustering algorithms.

On the basis of the foregoing embodiments, as an optional embodiment, determining a spatial region classification result according to a cluster in which each traversed pixel point is located includes:

s501, reducing the dimension of the feature vector of each pixel point to a three-dimensional feature vector, and determining the position of the pixel point in a three-dimensional virtual space according to the three-dimensional feature vector;

s502, determining a display style of each cluster in a three-dimensional virtual space;

s503, drawing a three-dimensional effect graph as a space region classification result according to the position of each pixel point in the three-dimensional space system and the display style corresponding to the clustering cluster where the pixel points are located.

The spatial region classification result obtained in the embodiment of the present application is a two-dimensional image, the cells characterized by each pixel point in the two-dimensional image are the same as the cells characterized by the same pixel point on the original mass spectrometry imaging image, and at this time, the user sees the image after the different regions of the biological sample are classified. In the embodiment of the present application, the feature vectors of the pixel points are further reduced to three dimensions, that is, three-dimensional feature vectors, and since the coordinate system of the three-dimensional virtual space is a three-dimensional coordinate system, each dimension of the three-dimensional feature vectors can be corresponding to one coordinate axis in the three-dimensional coordinate system, so that the unique position of each three-dimensional feature vector in the three-dimensional virtual space is determined.

It should be understood that in the scattergram after dimension reduction, the positions of the points are directly related to the differences between the data points and other data points, and the larger the difference between two data points is, the farther the positions are, whereas the higher the similarity between two data points is, the closer the positions are to each other.

On the basis of the foregoing embodiments, as an optional embodiment, updating the cluster of the pixel point according to the cluster where other pixel points in the peripheral preset range of the pixel point are located includes:

determining reference pixel points with highest similarity and preset number in a preset range around the pixel points;

and counting the cluster where each reference pixel point is located, and updating the cluster with the most counted times as the cluster where the pixel point is located.

According to the embodiment of the application, for the currently traversed pixel points, K reference pixel points with the highest similarity (K is a positive integer) in a surrounding preset range can be determined according to the similarity result obtained in the previous step, then the cluster where the K reference pixel points are located is counted, and the cluster with the highest counting frequency is updated to be the cluster where the pixel points are located.

Referring to fig. 4, a schematic flow chart of a spatial region classification method according to another embodiment of the present application is exemplarily shown, and as shown, the method includes:

s601, for each metabolite ion, performing data equal-width box separation according to the signal intensity of the metabolite ion in each cell, wherein each box separation is used for counting the proportion of the number of cells in a signal intensity range to the total number of cells;

s602, determining an upper limit value of the signal intensity of the metabolite ions according to the proportion corresponding to each sub-box, and cutting the signal intensity of the metabolite ions according to the upper limit value;

s603, normalizing the trimmed signal intensity to obtain the updated signal intensity of the metabolite ions;

s604, taking the updated signal intensity of each metabolite ion corresponding to the pixel point as an initial feature vector of the pixel point, and performing dimension reduction processing on the initial feature vector to obtain a feature vector of the pixel point;

s605, determining the similarity between the feature vectors of every two pixel points;

s606, obtaining a similarity matrix according to the similarity between the feature vectors of every two pixel points, wherein elements in the similarity matrix are used for representing the similarity between the feature vectors of every two pixel points;

s607, taking the element with the similarity larger than the similarity threshold value in the similarity matrix as a target element, and constructing a relationship network graph according to the target element, wherein two nodes with a connection relationship in the relationship network graph are used for representing two pixel points in one target element;

s608, clustering each node in the relational network graph according to a preset graph clustering algorithm;

s609, traversing all the pixel points, and determining a preset number of reference pixel points with highest similarity in a preset range around the pixel points for each traversed pixel point;

s610, counting the cluster where each reference pixel point is located, updating the cluster with the most counted times to the cluster where the pixel point is located, and updating the cluster of the pixel point until the cluster of the pixel point is not updated any more;

s611, taking the pixel points of the undetermined cluster as target pixel points, sequencing the distances between the target pixel points and other pixel points, and determining the sequencing result of other pixel points;

s612, obtaining corresponding weights according to the sorting results of other pixel points, wherein the weights are in direct proportion to the sorting results, and obtaining the association degrees of the other pixel points and the target pixel point according to the weights and the composition similarity of the other pixel points;

s613, determining the cluster where the target pixel point is located according to the cluster where other pixel points with the highest association degree are located;

s614, reducing the dimension of the feature vector of each pixel point to a three-dimensional feature vector, and determining the position of the pixel point in a three-dimensional virtual space according to the three-dimensional feature vector;

s615, determining the display style of each clustering cluster in the three-dimensional virtual space, and drawing a three-dimensional effect graph as a space region typing result according to the position of each pixel point in the three-dimensional space system and the display style corresponding to the clustering cluster in which the pixel point is located.

The embodiment of the present application provides a spatial region typing device for a mass spectrometry imaging chart, where each pixel point in the mass spectrometry imaging chart is used to characterize a metabolic detection result of a single cell in a biological sample, and the metabolic detection result includes signal intensity of at least one metabolite ion, as shown in fig. 5, the device may include a feature vector extraction module 501, a clustering module 502, a filling module 503, and a typing module 504, specifically:

a feature vector extraction module 501, configured to obtain a feature vector of the pixel point according to the metabolic detection result of the pixel point;

a clustering module 502, configured to cluster the pixels according to the feature vector of each pixel, to obtain a plurality of clusters, where each cluster includes at least one pixel;

the filling module 503 is configured to use a pixel point of a cluster to which the target pixel point is not determined as a target pixel point, and determine a cluster in which the target pixel point is located according to a distance between the target pixel point and another pixel point in a surrounding preset range, a composition similarity, and a cluster in which the other pixel point is located;

and a typing module 504, configured to determine a spatial region typing result according to the cluster where each pixel point is located.

The apparatus in the embodiment of the present application may execute the method provided in the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus in the embodiments of the present application correspond to the steps in the method in the embodiments of the present application, and for the detailed functional description of the modules in the apparatus, reference may be made to the description in the corresponding method shown in the foregoing, and details are not repeated here.

In an embodiment of the present application, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory, where the processor executes the computer program to implement the steps of the method for spatial region classification of a mass spectrometry imaging chart, and compared with the related art, the method can implement: the method comprises the steps of obtaining a characteristic vector of pixel points according to a metabolism detection result of the pixel points, clustering the pixel points based on the characteristic vector of each pixel point to obtain a plurality of clustering clusters, determining the clustering cluster where a target pixel point is located according to clustering and composition similarity of the clustering cluster where other pixel points in a preset range around the clustering cluster and other pixel points with the target pixel point, enhancing clustering accuracy, and generating a low-noise spatial typing classification result which accords with real biological tissue morphological division.

In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, and is not limited herein.

The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 4001 to execute. The processor 4001 is used to execute computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.

Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, and when being executed by a processor, the computer program may implement the steps and corresponding contents of the foregoing method embodiments.

Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.

The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.

It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as needed, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. Under the scenario that the execution time is different, the execution sequence of the sub-steps or phases may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.

The foregoing is only an optional implementation manner of a part of implementation scenarios in the present application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of the present application are also within the protection scope of the embodiments of the present application without departing from the technical idea of the present application.

Claims

1. A method for typing a spatial region of an image of a mass spectrum, wherein each pixel point in the image of the mass spectrum is used for characterizing a metabolic detection result of a single cell in a biological sample, and the metabolic detection result includes a signal intensity of at least one metabolite ion, the method comprising:

clustering the pixels according to the characteristic vector of each pixel to obtain a plurality of clustering clusters, wherein each clustering cluster comprises at least one pixel;

taking a pixel point of a cluster to which the target pixel point is not determined as a target pixel point, and determining a cluster to which the target pixel point is located according to the distance and composition similarity between the target pixel point and other pixel points in a surrounding preset range and the cluster to which the other pixel points are located;

2. The method according to claim 1, wherein the determining the cluster where the target pixel point is located according to the distance and the composition similarity between the target pixel point and other pixel points in a surrounding preset range, and the cluster where the other pixel points are located comprises:

and determining the cluster where the target pixel point is located according to the cluster where other pixel points with the highest association degree are located.

3. The method according to claim 1, wherein the obtaining the feature vector of the pixel point according to the metabolic detection result of the pixel point comprises:

and taking the updated signal intensity of each metabolite ion corresponding to the pixel point as an initial feature vector of the pixel point, and performing dimension reduction processing on the initial feature vector to obtain the feature vector of the pixel point, wherein the feature vector is used for representing the difference of the metabolite ion composition between the pixel point and other pixel points.

4. The method according to any one of claims 1 to 3, wherein the composition similarity is a similarity between the types of metabolite ions included in the target pixel and other pixels or a similarity between feature vectors of the target pixel and other pixels.

5. The method of claim 3, wherein the determining the distribution of the signal intensity of the metabolite ions according to the signal intensity of the metabolite ions in each cell, and the tailoring of the signal intensity of the metabolite ions according to the distribution comprises:

and determining the upper limit value of the signal intensity of the metabolite ions according to the proportion corresponding to each sub-box, and cutting the signal intensity of the metabolite ions according to the upper limit value.

6. The method of claim 3, wherein the clustering the pixels according to the feature vector of each pixel to obtain a plurality of clusters comprises:

7. The method of claim 5, wherein determining the upper limit value of the signal intensity of the metabolite ion according to the ratio corresponding to each bin comprises:

8. The method of claim 1, wherein the determining a spatial region classification result according to the traversed cluster in which each pixel point is located comprises:

and drawing a three-dimensional effect graph as a space region typing result according to the position of each pixel point in the three-dimensional space system and the display style corresponding to the clustering cluster where the pixel point is located.

9. An apparatus for determining a spatial region of an image of a mass spectrum, wherein each pixel point in the image of the mass spectrum is used to characterize a metabolic detection of a single cell in a biological sample, and the metabolic detection comprises a signal intensity of at least one metabolite ion, the apparatus comprising:

the clustering module is used for clustering the pixels according to the characteristic vector of each pixel to obtain a plurality of clustering clusters, and each clustering cluster comprises at least one pixel;

and the typing module is used for determining a spatial region typing result according to the clustering cluster where each pixel point is located.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to implement the steps of the method for spatial region typing of an image of a mass spectrum of any one of claims 1 to 8.

11. A computer-readable storage medium on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method for spatial region typing of an image for mass spectrometry according to any one of claims 1 to 8.