CN117496279A

CN117496279A - Image classification model building method and device, and classification method, device and system

Info

Publication number: CN117496279A
Application number: CN202410004749.7A
Authority: CN
Inventors: 张睿; 吴红艳; 蔡云鹏; 黎慧君
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2024-01-03
Filing date: 2024-01-03
Publication date: 2024-02-02
Anticipated expiration: 2044-01-03
Also published as: CN117496279B

Abstract

The specification relates to the technical field of medical auxiliary examination, and provides an image classification model building method and device, and a classification method, device and system. The method comprises the following steps: establishing and training a first neural network model according to a single-cell transcriptome sequencing standard data set, a first medical image data set and a first Bulk transcriptome sequencing data set corresponding to the first medical image data set; based on the second medical image data set, performing migration learning according to the characteristic weight parameters in the first neural network model, establishing a second training sample set, and establishing and training to obtain an image classification model for predicting the number, the type and the probability value of target image markers in medical image data. According to the embodiment of the specification, proper genome data can be matched for the medical image, and the medical image and the corresponding genome data are used for acquiring the classification identification information of the image markers in the medical image, so that richer and more complete image examination reference information is provided for medical staff.

Description

Image classification model building method and device, and classification method, device and system

Technical Field

The present disclosure relates to the field of medical auxiliary examination technologies, and in particular, to a method and apparatus for establishing an image classification model, and a classification method, apparatus and system.

Background

The image inspection is an auxiliary medical means for enabling doctors to check the internal condition of the patient body, in principle, the image inspection is carried out by emitting energy (such as X-rays, sound waves, radioactive particles or magnetic fields) in different forms and enabling the energy to pass through the body of the person, the medical image is obtained based on the change of the body tissue to the energy mode, so as to be used for displaying the internal structure and the functional condition of the body, with the development of an artificial intelligence technology, the current artificial intelligence technology can automatically carry out classification identification according to the medical image so as to realize automatic screening inspection of the medical image, but the conventional medical image classification by utilizing the artificial intelligence only utilizes the medical image to establish an image marker label for supervised model training, and the problems of high classification difficulty and low classification accuracy of the medical image are not considered, so that the image classification model is established.

Disclosure of Invention

In view of the fact that the classification and identification of image markers of the current medical image are only realized based on image features of a single dimension, and the internal connection of the medical image and genome data is ignored, so that errors exist in judging the number, the type and the probability value of the image markers in the medical image, the scheme is provided so as to overcome the problems or at least partially solve the problems.

In one aspect, some embodiments of the present disclosure provide a method for creating an image classification model, the method comprising:

receiving a single cell transcriptome sequencing standard dataset, a first medical image dataset and a corresponding first Bulk transcriptome sequencing dataset thereof, and a second medical image dataset;

generating first Bulk single-cell transcriptome sequencing data according to the first Bulk transcriptome sequencing data set and a single-cell transcriptome sequencing standard data set;

performing self-supervision clustering on the first Bulk single-cell transcriptome sequencing data set based on the first Bulk single-cell transcriptome sequencing data to obtain a clustering result;

establishing a first training sample set according to the first medical image data set, the first Bulk single-cell transcriptome sequencing data and the clustering result;

Training a first neural network by using a first training sample set to obtain a first neural network model;

performing migration learning according to characteristic weight parameters in the first neural network model, and determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset;

establishing a second training sample set according to the second medical image data set and second Bulk single-cell transcriptome sequencing data;

and training a second neural network by using the second training sample set to obtain a second neural network model, and taking the trained second neural network model as an image classification model to be used for predicting the number, the type and the probability value of the target image markers in the medical image data.

Further, generating first Bulk single cell transcriptome sequencing data from the first Bulk transcriptome sequencing dataset and the single cell transcriptome sequencing standard dataset, comprising:

inputting the first Bulk transcriptome sequencing dataset into a pre-trained adaptive deconvolution model to obtain target transcriptome features; wherein the adaptive deconvolution model is trained based on the single cell transcriptome sequencing standard dataset and a second Bulk transcriptome sequencing dataset;

Generating first Bulk single cell transcriptome sequencing data according to the target transcriptome feature.

Further, the target transcriptome characteristics include cell type information and ratio information between different cell types.

Further, based on the first Bulk single-cell transcriptome sequencing data, performing self-supervised clustering on the first Bulk transcriptome sequencing data set to obtain a clustering result, including:

establishing a corresponding graph network according to the first Bulk single-cell transcriptome sequencing data;

and performing self-supervision clustering on the first Bulk transcriptome sequencing dataset by using the graph network to obtain a clustering result.

Further, establishing a first training sample set according to the first medical image data set, the first Bulk single-cell transcriptome sequencing data and the clustering result comprises:

and taking the first medical image data set, the first Bulk single-cell transcriptome sequencing data and the clustering result as inputs, and taking the image marker corresponding to the first medical image data set as a target output to establish a first training sample set.

Further, performing migration learning according to the feature weight parameters in the first neural network model, determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset, including:

Establishing a migration mapping function between the first medical image dataset and the first Bulk single-cell transcriptome sequencing data by utilizing the characteristic weight parameters;

and determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image data set according to the migration mapping function.

Further, establishing a second training sample set according to the second medical image data set and second Bulk single cell transcriptome sequencing data, comprising:

and taking the second medical image data set and the second Bulk single-cell transcriptome sequencing data as input, and taking the image marker corresponding to the second medical image data set as target output to establish a second training sample set.

Further, the image markers corresponding to the second medical image dataset are a subset of the image markers corresponding to the first medical image dataset.

In another aspect, some embodiments of the present disclosure further provide an image classification model building apparatus, where the apparatus includes:

the receiving module is used for receiving the single-cell transcriptome sequencing standard data set, the first medical image data set and the corresponding first Bulk transcriptome sequencing data set, and the second medical image data set;

The generation module is used for generating first Bulk single-cell transcriptome sequencing data according to the first Bulk transcriptome sequencing data set and the single-cell transcriptome sequencing standard data set;

the clustering module is used for performing self-supervision clustering on the first Bulk single-cell transcriptome sequencing data set based on the first Bulk single-cell transcriptome sequencing data to obtain a clustering result;

the first sample establishing module is used for establishing a first training sample set according to the first medical image data set, the first Bulk single-cell transcriptome sequencing data and the clustering result;

the training module is used for training the first neural network by using the first training sample set to obtain a first neural network model;

the migration module is used for performing migration learning according to the characteristic weight parameters in the first neural network model and determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image data set;

the second sample establishing module is used for establishing a second training sample set according to the second medical image data set and second Bulk single-cell transcriptome sequencing data;

the modeling module is used for training the second neural network by using the second training sample set to obtain a second neural network model, and taking the trained second neural network model as an image classification model to be used for predicting the number, the type and the probability value of the target image markers in the medical image data.

Based on the same inventive concept, some embodiments of the present disclosure provide an image classification method, which includes:

receiving medical image data to be classified;

inputting the medical image data to be classified into an image classification model trained by the method according to any of the embodiments to obtain a medical image classification result corresponding to the medical image data to be classified; the medical image classification result comprises the number, the type and the probability value of the target image markers in the medical image data to be classified.

Based on the same inventive concept, in another aspect, some embodiments of the present disclosure further provide an image classification apparatus, the apparatus including:

the acquisition module is used for receiving medical image data to be classified;

the classification module is used for inputting the medical image data to be classified into an image classification model trained by the method in any embodiment so as to obtain a medical image classification result corresponding to the medical image data to be classified; the medical image classification result comprises the number, the type and the probability value of the target image markers in the medical image data to be classified.

In another aspect, some embodiments of the present disclosure further provide an image classification system, the system comprising: an image classifier, a medical image imaging device and a display;

the image classifier is respectively connected with the medical image imaging device and the display and is used for executing the image classification method according to any embodiment according to the medical image generated by the medical image imaging device so as to classify the images and obtain the number, the type and the probability value of the image markers in the current medical image;

the display is used for displaying the medical image generated by the medical image imaging equipment and the image classification result of the image classifier.

In another aspect, some embodiments of the present description also provide a computer device including a memory, a processor, and a computer program stored on the memory, which when executed by the processor, performs the instructions of the above method.

In another aspect, some embodiments of the present description also provide a computer storage medium having stored thereon a computer program which, when executed by a processor of a computer device, performs instructions of the above method.

One or more technical solutions provided in some embodiments of the present disclosure at least have the following technical effects:

according to the embodiment of the specification, firstly, a single-cell transcriptome sequencing standard data set, a first medical image data set and a first Bulk transcriptome sequencing data set corresponding to the single-cell transcriptome sequencing standard data set are automatically received, and a second medical image data set, first Bulk transcriptome sequencing data are generated according to the single-cell transcriptome sequencing standard data set and the first Bulk transcriptome sequencing data set, so that single-cell gene expression and formation information in the first Bulk are predicted, then clustering precision is improved by using the first Bulk transcriptome sequencing data, self-supervision clustering is conducted on the first Bulk transcriptome sequencing data set, clustering results are obtained, the first medical image data set, the first Bulk transcriptome sequencing data and the clustering results are used for constructing a first training sample set, input of a first neural network model is constructed from multiple dimensions, the first neural network model is obtained by using the first training sample set, transfer learning is conducted according to characteristic weight parameters of the first neural network model on the basis, the second medical image data set corresponding to the second Bulk transcriptome sequencing data set is obtained, the second medical image data set is predicted according to the second medical image data set, the second medical image data set is predicted according to the image data sequence of the first Bulk transcriptome sequencing data set, the second medical image data set is corresponding to the second medical image data set, and the medical image data is predicted by using the second medical image data set, and the second image data is predicted by using the second neural network model is predicted by the second training model is different according to the first neural network model is different.

The foregoing description is merely an overview of some embodiments of the present disclosure, which may be practiced in accordance with the disclosure of the present disclosure, for the purpose of making the foregoing and other objects, features, and advantages of some embodiments of the present disclosure more readily apparent, and for the purpose of providing a more complete understanding of the present disclosure's technical means.

Drawings

In order to more clearly illustrate some embodiments of the present description or technical solutions in the prior art, the following description will briefly explain the embodiments or drawings needed in the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present description, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art. In the drawings:

FIG. 1 is a schematic diagram of an implementation system of an image classification model building method according to some embodiments of the present disclosure;

FIG. 2 is a flow chart illustrating a method of image classification model creation in some embodiments of the present disclosure;

FIG. 3 is a schematic diagram of steps for generating first Bulk single cell transcriptome sequencing data according to some embodiments of the present disclosure;

FIG. 4 is a schematic diagram of steps for self-supervised clustering in some embodiments of the present disclosure;

FIG. 5 is a schematic diagram illustrating steps of performing transfer learning according to feature weight parameters according to some embodiments of the present disclosure;

FIG. 6 is a flowchart illustrating a method of classifying images according to some embodiments of the present disclosure;

FIG. 7 is a schematic diagram illustrating an image classification system according to some embodiments of the present disclosure;

FIG. 8 is a schematic diagram illustrating an image classification model creation device according to some embodiments of the present disclosure;

fig. 9 is a schematic structural diagram of an image classification device according to some embodiments of the present disclosure;

fig. 10 is a schematic diagram of a computer device provided in some embodiments of the present disclosure.

[ reference numerals description ]

101. A terminal;

102. a server;

701. an image classifier;

702. medical imaging equipment;

703. a display;

801. a receiving module;

802. a generating module;

803. a clustering module;

804. a first sample creation module;

805. a training module;

806. a migration module;

807. a second sample creation module;

808. a modeling module;

901. an acquisition module;

902. a classification module;

1002. a computer device;

1004. a processor;

1006. A memory;

1008. a driving mechanism;

1010. an input/output interface;

1012. an input device;

1014. an output device;

1016. a presentation device;

1018. a graphical user interface;

1020. a network interface;

1022. a communication link;

1024. a communication bus.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in some embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure based on some embodiments in the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims herein and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device. It should be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all conform to the relevant regulations of the relevant laws and regulations.

Fig. 1 is a schematic diagram of an implementation system of a method for creating an image classification model according to an embodiment of the present invention, which may include: the terminal 101 and the server 102 communicate with each other through a network, which may include a local area network (Local Area Network, abbreviated as LAN), a wide area network (Wide Area Network, abbreviated as WAN), the internet, or a combination thereof, and are connected to a website, user equipment (e.g., a computing device), and a back-end system. The staff can send an image classification model establishment request to the server 102 through the terminal 101, after receiving the image classification model establishment request, the server 102 calls a single cell transcriptome sequencing standard data set, a first medical image data set and a corresponding first Bulk transcriptome sequencing data set in a database, and a second medical image data set to perform calculation processing, so as to obtain a modeling result, and sends the modeling result to the terminal 101, so that the staff processes a service according to the modeling result.

In this embodiment of the present disclosure, the server 102 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN, content Delivery Network), and basic cloud computing services such as big data and artificial intelligence platforms.

In an alternative embodiment, terminal 101 may include, but is not limited to, a self-service terminal device, a desktop computer, a tablet computer, a notebook computer, a smart wearable device, and the like. Alternatively, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like. Of course, the terminal 101 is not limited to the above-mentioned electronic device having a certain entity, and may be software running in the above-mentioned electronic device.

In addition, it should be noted that, fig. 1 is only an application environment provided by the present disclosure, and in practical application, a plurality of terminals 101 may also be included, which is not limited in this specification.

Fig. 2 is a flowchart of a method for creating an image classification model according to an embodiment of the present invention, where the method includes the steps described in the examples or the flowcharts, but may include more or less steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When a system or apparatus product in practice is executed, it may be executed sequentially or in parallel according to the method shown in the embodiments or the drawings. As shown in fig. 2, the method may include:

S201: receiving a single cell transcriptome sequencing standard dataset, a first medical image dataset and a corresponding first Bulk transcriptome sequencing dataset thereof, and a second medical image dataset;

s202: generating first Bulk single-cell transcriptome sequencing data according to the first Bulk transcriptome sequencing data set and a single-cell transcriptome sequencing standard data set;

s203: performing self-supervision clustering on the first Bulk single-cell transcriptome sequencing data set based on the first Bulk single-cell transcriptome sequencing data to obtain a clustering result;

s204: establishing a first training sample set according to the first medical image data set, the first Bulk single-cell transcriptome sequencing data and the clustering result;

s205: training a first neural network by using a first training sample set to obtain a first neural network model;

s206: performing migration learning according to characteristic weight parameters in the first neural network model, and determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset;

s207: establishing a second training sample set according to the second medical image data set and second Bulk single-cell transcriptome sequencing data;

s208: and training a second neural network by using the second training sample set to obtain a second neural network model, and taking the trained second neural network model as an image classification model to be used for predicting the number, the type and the probability value of the target image markers in the medical image data.

It will be appreciated that in some embodiments Single-cell RNA-sequencing refers to a novel technique for high throughput sequencing and analysis of RNA at the Single cell level, unlike the results obtained from conventional tissue or cell population sequencing (which is simply the average expression level of a large number of cells), single sequencing is capable of mining specific information deeply. At present, single-cell sequencing has been widely applied to research in the fields of tumor heterogeneity, immune microenvironment, neurologic heterogeneity, immune microenvironment neurology, embryonic developmental cell differentiation and the like, while Bulk transcriptome sequencing data is to measure total RNA (mRNA) of all cells, representing total expression of each gene, and the same kind of cells can have differential expression due to the physiological state in which the same kind of cells are located, so that compared with single-cell transcriptome sequencing data, bulk transcriptome sequencing data cannot capture the differential expression.

Further, in some embodiments, the medical image refers to a technology and a processing procedure for obtaining an internal tissue image of a human body or a part of the human body in a non-invasive manner for medical treatment or medical research, the medical image dataset may be one or more of CT image data, PET image data and NMRI image data, the single cell transcriptome sequencing standard dataset includes a large number of historical single cell transcriptome sequencing data of a patient, the standard is understood to be large in data volume and high in data reliability, the corresponding relation in the first medical image dataset and the corresponding first Bulk transcriptome sequencing dataset thereof refers to performing Bulk transcriptome sequencing on the internal tissue represented by the current first medical image dataset to obtain the corresponding first Bulk transcriptome sequencing data, the correspondence is based on internal tissue, but not specifically for the same patient's internal tissue, in an actual image examination scenario, the second medical image dataset is typically a medical image dataset without corresponding Bulk transcriptome sequencing data and single-cell transcriptome sequencing data, in a prior model training, the second medical image dataset may have a medical image dataset with corresponding Bulk transcriptome sequencing data and single-cell transcriptome sequencing data for iterative verification and testing of the model, by analyzing the relationship between potential images and genomes between the single-cell transcriptome sequencing standard dataset, the first medical image dataset and its corresponding first Bulk transcriptome sequencing dataset, genomic information corresponding to the second medical image dataset may be predicted, to build and train a second neural network model from the second medical image dataset and its corresponding genomic information.

It should be noted that, in some embodiments, the second neural network model is used to predict the number, kind and probability value of the target image markers in the medical image data, specifically, any medical image data may include zero, one or more target image markers and corresponding kind and probability value information, although the output result may show the existence, location and size of the suspected target image markers, since the final image classification model itself outputs a probability evaluation result, the result output by the model itself cannot be used as a direct disease diagnosis, and cannot prove any cause, but is an auxiliary examination information for reference, rather than a diagnostic examination, whether the medical staff specifically performs a tumor diagnosis or needs to perform a puncture biopsy, and by means of examination such as physical examination and assay, in most cases, a biochemical examination such as a biopsy is the only method for diagnosing cancer, and measuring the level of the tumor markers (substances secreted into the blood by some tumors) in the blood may also be used as an additional evidence for supporting or pushing the cancer diagnosis.

Referring to fig. 3, in some embodiments, generating first Bulk single-cell transcriptome sequencing data from a first Bulk transcriptome sequencing dataset and a single-cell transcriptome sequencing standard dataset may include:

S301: inputting the first Bulk transcriptome sequencing dataset into a pre-trained adaptive deconvolution model to obtain target transcriptome features; wherein the adaptive deconvolution model is trained based on the single cell transcriptome sequencing standard dataset and a second Bulk transcriptome sequencing dataset;

s302: generating first Bulk single cell transcriptome sequencing data according to the target transcriptome feature.

It will be appreciated that in some embodiments, the target transcriptome features include cell type information and ratio information between different cell types, and studies indicate that each tumor cell is unique, the gene features of the tumor cells have high heterogeneity between tumor cells (somatic mutation) and within the tumor microenvironment (other cell type infiltration), the first Bulk transcriptome sequencing dataset in this embodiment is used to solve the problem of neutralizing tumor cell heterogeneity when the first medical image dataset is enhanced, specifically, the adaptive deconvolution model is first built for predicting the cell type information of the first Bulk transcriptome sequencing dataset and the ratio information between different cell types using the following steps:

Step a: a subset of single cell transcriptome sequencing standard data matched from the set of single cell transcriptome sequencing standard data according to the data type of the first Bulk transcriptome sequencing data set; the data type is used for determining a specified single cell type; the single cell transcriptome sequencing standard data subset is expressed as a cell type reference matrix based on cell type and transcriptome sequencing expression levels; for the first Bulk transcriptome sequencing dataset, the gene expression information was expressed as a one-dimensional vector based on the transcriptome sequencing expression level (for convenience of subsequent expression, denoted as x _b ）；

Step b: preprocessing the single-cell transcriptome sequencing standard data subset obtained by matching to obtain a transcriptome sequencing average expression matrix; the horizontal and vertical directions of the transcriptome sequencing average expression matrix respectively represent the cell type and the transcriptome sequencing average expression quantity;

step c: establishing a generated countermeasure network, the generated countermeasure network comprising a generator network and a discriminator network; the generator network is used for extracting characteristics in input data and generating pseudo Bulk data based on the input data; the discriminator network is used for judging the pseudo Bulk data generated by the generator network and performing reverse regulation training; wherein the input data is a subset of single cell transcriptome sequencing standard data;

Step d: training is ended when the result of the generation of the penalty function against the network converges.

It can be understood that in some embodiments, the preprocessing is to reduce the dimension of the high-dimension expression data based on the RSVD algorithm, and the reduced dimension single-cell data can be clustered by using a graph-based clustering algorithm PhenoGraph, which is to be understood as an existing open-source single-cell clustering algorithm, and other single-cell clustering algorithms are used, so that the purpose of this step is to accurately cluster the cell types as much as possible, thereby extracting cluster heart characteristics, so as to obtain an average cell gene expression vector of each cell type as a gene expression of the cell type, and extracting (reflecting the average value of cell expression quantity in the cluster) through a plurality of single-cell fusion clusters and cluster heart characteristics, so as to weaken the cell heterogeneity information in each cluster cell type, and express tumor homogeneity information between the cell type reference matrixes.

In some embodiments, the average expression matrix is determined by transcriptome sequencingOne-dimensional vector x _b After that, the existing deconvolution method is the linear equation +.>The solution is performed using the least squares method, but this approach usually assumes that the transcriptome sequencing expression level of each cell type in the first Bulk is the same as the reference single cell transcriptome sequencing standard data subset, however, the actual Bulk samples often differ from the cell type distribution of the reference single cell data set by sampling bias, and furthermore, are constant >The heterogeneity requirement of the tumor environment cannot be met, namely the diversity of single cells in a single cell transcriptome sequencing standard data subset may not meet the variety diversity requirement of cell types in the first Bulk, and the self-adaptive deconvolution model constructed by the application can adaptively correct the cell type distribution difference in Bulk, and fine-tune a cell type reference matrix aiming at different Bulk samples (namely, the first Bulk transcriptome sequencing data set), so that a cell type expression matrix of a more accurate Bulk sample is obtained.

In particular, embodiments of the present description employ samplers based on generating an countermeasure network to be usedAs a hidden layer expression vector is input into the generator, the cell type expression matrix and the cell type proportion vector are sampled adaptively, since Bulk samples do not have corresponding single cell transcriptomesThe invention takes sequencing data information as supervision, and in the training process, the generator is utilized to sample partial single cell gene expression data and add the partial single cell gene expression data to be taken as pseudo Bulk data, and then the sampled single cell proportion is usedAnd expression matrix->As supervisory signal, pseudo Bulk data is directed +.>Is a decomposition process of (2). It should be noted that single cell transcriptome sequencing expression is understood to be a gene expression.

Further, in some embodiments, to maintain numerical stability, the present invention predicts the difference of the pseudo Bulk cell type expression matrix from the overall expression matrix while approximating the predicted distribution to a true distribution using the following loss function:

（1）

wherein,represents the loss function, D represents the arbiter, G represents the generator, E represents the mathematical expectation,represents the proportion of single cells,/->Representing a single cell gene expression matrix,/->Is to generate contracted symbols against the network, referring to the distribution of real data, < >>Representing pseudo Bulk data,/-, for example>Gene expression matrix representing cell type +.>Representing a cell type proportion generator, < > for>And a cell type expression difference matrix generator is represented, and the cell type expression matrix corresponding to the Bulk can be obtained by adding the difference matrix and the average matrix. It should be noted that, the comma in the discriminator D represents that the discriminator accepts two inputs, that is, the cell type proportion and the cell type gene expression after deconvolution, and the two inputs are actually projected to the same dimension inside the discriminator and then spliced.

Further, in some embodiments, in order to generate a cell type proportion and an expression matrix which conform to the original Bulk distribution, so as to achieve fusion of a cell type reference matrix reflecting homogeneity information and Bulk data information reflecting heterogeneity information, the following reconstruction objective function needs to be introduced based on the loss function ：

（2）

Wherein,representing an objective function->Representing a cell type proportion generator, < > for>Representing a cell type expression difference matrix generator, < >>Gene expression matrix representing cell type +.>Representing pseudo Bulk data. After convergence of the generative model training, only the true Bulk sample (i.e., the first Bulk transcriptome sequencing dataset) needs to be entered +.>Can generate a corresponding cell type expression matrix +.>And cell proportion vector->。

Further, in some embodiments, when generating the first Bulk single cell transcriptome sequencing data from the target transcriptome features, finer granularity single cell data is not available, although Bulk deconvolution can resolve different cell types and ratios of Bulk tissue. In order to solve the problem, the invention adopts a condition generation model to learn to generate a cell sample conforming to a cell type under the condition of a given cell type expression vector so as to simulate and obtain single-cell gene expression data, and in order to further learn to realize a decoupled hidden space representation, the invention designs a condition generator based on a beta-VAE model of the decoupled representation so as to facilitate the subsequent analysis of a common mode of cell expression. Specifically, the model will optimize the lower-variant objective function as follows:

（3）

Wherein,for the variation of the lower bound objective function, +.>And->Parameters of encoder and decoder, respectively, of the variant self-encoder model, +.>For passing a single Bulk vector +.>Decoding to obtain a single cell gene expression matrix, < + >>For the gene expression matrix of the cell type, z is the compressed code learned by the model hidden layer (bottleneck layer), i.e.hidden vector,>probability of generating real data samples for a given hidden variable (probability decoder),/for a given hidden variable>For KL divergence, & lt + & gt>Beta is Lagrangian multiplier super-parameter, which is the estimated posterior probability function (probability encoder), and +.>For a mathematical constraint, < >> ，，

Is an isotropic gaussian distribution. Parameters (parameters)The degree of independence of the components of the hidden vector can be controlled to achieve decoupling of the hidden vector.

After training the self-encoder, only the sampled hidden vectors need to be inputAnd corresponding cell type representation->I.e. by decoder->Simulation of the generation of single cell data in corresponding proportion +.>Which means that by a single Bulk vector +.>Decoding to obtain the single cell gene expression matrix.

Referring to fig. 4, in some embodiments, performing self-supervised clustering on the first Bulk transcriptome sequencing dataset based on the first Bulk single cell transcriptome sequencing data to obtain a clustered result may include:

S401: establishing a corresponding graph network according to the first Bulk single-cell transcriptome sequencing data;

s402: and performing self-supervision clustering on the first Bulk transcriptome sequencing dataset by using the graph network to obtain a clustering result.

It will be appreciated that in some embodiments, the conventional graph volume integration algorithm uses the estimated order to perform convolution, so that the representation of the nodes is too smooth or not smooth, which affects the performance of node clustering, so that a corresponding graph network is established according to the first Bulk single-cell transcriptome sequencing data to improve the subsequent clustering effect, specifically, a similarity graph network according to the first Bulk single-cell transcriptome sequencing data is constructed by using the following formula:

（4）

wherein,for the graph network, add->Is Laplace matrix>Transpose of graph network, +.>Is->Attribute map node->Is->The attribute map nodes, f () is the output method,/->Is the set of edges of the graph adjacency matrix.

Further, in some embodiments, X is a gene signature matrixCan be decomposed into +.>Individual plot signals:

（5）

wherein,is an attribute map signal->M represents the number of picture signals and the coefficient +.>Is of the size and the image signalStrength of (3) Proportional to the ratio. The basis vector +.>Is not limited by the smoothness of:

（6）/>

wherein,is a feature vector +>Is used for the smoothness of the steel sheet,∑for the sum formula>Is a feature vector +>Is>The number of elements to be added to the composition,is a feature vector +>Is>Element(s)>Is the +.>Line and->Column element (s)/(S)>Is the firstiDimension of row->Is the firstjThe dimensions of the columns are such that,Tfor matrix transposition +.>Is the firstsLaplace matrix>Is a characteristic value.

The more orders the graph signal is convolved with low pass filtering, the smoother the graph signal, the low pass filtering process using the following formula:

（7）

wherein,Xis a gene characteristic matrix of the cell,for the graph network, add->Is from->Is>A filtered version of the pattern signal, U is the eigenvector matrix, Λ is the diagonal matrix of eigenvalues, p () is a function of filtering the diagonal matrix Λ，mIs constant (I)>For by zooming +.>To retain the low frequency signal and to remove the high frequency signal,/->For coefficients of attribute map signals, ++>Is an attribute map signal.

Further, the order of the graph convolution neural network can be measured through the smoothness degree of the graph signals and the optimal selection can be made, so that the undersmooth and oversmooth in the graph convolution algorithm are eliminated, single cell gene expression data are better represented and clustered, and the low-dimensional hidden variable Gaussian distribution of cells is dynamically adjusted through back propagation by combining an upstream variable self-coding module, so that the batch effect problem of the single cell data is optimized.

Since the clustering algorithm preferentially classifies data into similar-sized clusters, more distant "node pairs" may belong to the same cluster, and more distant "node pairs" may belong to different clusters. On the patient clustering problem, the distance is not the only judgment standard for measuring whether the patients are of the same type, and other strategies are needed for additional supplementary supervision. Aiming at the problems, the system adopts the following formula and adopts a self-supervision mode to improve the clustering precision:

wherein,for loss of clusters, ++>For the loss function within the cluster, +.>For the loss function between clusters, +.>For losses within clusters, ++>Is the loss between clusters.

（9）

（10）

Wherein,for all clusters after clustering, +.>For a cluster of all clusters after clustering, +.>Is the firstiThe nodes of the attribute map are arranged to,is the firstjAttribute map node->To pass through the firstiLine raw data learned features, +.>To pass through the firstjThe characteristics learned by the raw data are listed, and n is a constant. Specifically, the intra-class and inter-class loss functions are defined through formulas 8, 9 and 10, so that the problem of excessively equal data division of the clustering algorithm is solved, and the clustering accuracy is improved through self-supervision continuous optimization of intra-class and inter-class loss judgment.

Further, in some embodiments, establishing a first training sample set from the first medical image dataset, first Bulk single-cell transcriptome sequencing data, and the clustering result comprises:

It may be understood that, in some embodiments, when the first training sample set is established, the image marker corresponding to the first medical image data set is used as a label, in addition, the medical image data and the genome data (i.e. the Bulk single-cell transcriptome sequencing data) are used together to determine the probability, the number and the types of the existence of the label, a clustering result is introduced, the dimension of the input feature can be enriched, the model training precision is improved, then the first neural network is trained by using the first training sample set, the first neural network model is obtained, the feature weight parameter in the iterative model is continuously updated when the first neural network is trained by a back propagation method and the like until the model converges, and it is required to be explained that, since the first Bulk single-cell transcriptome sequencing data and the clustering result are data calculated based on the first Bulk single-cell transcriptome sequencing data set, which belong to the predicted value, the first Bulk single-cell transcriptome sequencing data and the clustering result can be iteratively updated in the model iteration process.

Referring to fig. 5, in some embodiments, performing migration learning according to the feature weight parameters in the first neural network model, determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset may include:

s501: establishing a migration mapping function between the first medical image dataset and the first Bulk single-cell transcriptome sequencing data by utilizing the characteristic weight parameters;

s502: and determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image data set according to the migration mapping function.

It may be understood that, in some embodiments, the feature weight parameter is used to characterize the importance degree of the feature and the influence of the result, by using the feature weight parameter in the first neural network model, the relevance between the first medical image dataset and the first Bulk single-cell transcriptome sequencing data and the cluster result and the image marker corresponding to the first medical image dataset may be determined, and because the cluster result may be understood to be mapped based on the first medical image dataset and the first Bulk single-cell transcriptome sequencing data, the feature weight parameter in the first neural network model may also be understood to determine the relevance between the first medical image dataset and the image marker corresponding to the first medical image dataset, and on this basis, the migration mapping function may be constructed by using the feature weight parameter to perform the migration learning, and in some embodiments, the second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset and the cluster result thereof may also be obtained by using the migration learning mapping.

Further, in some embodiments, establishing a second training sample set from the second medical image dataset and second Bulk single cell transcriptome sequencing data comprises:

It may be understood that, in some embodiments, after the second Bulk single-cell transcriptome sequencing data is obtained, only the second medical image dataset and the second Bulk single-cell transcriptome sequencing data may be input, and the image marker corresponding to the second medical image dataset may be output as a target, instead of using the clustering result as the input of the model when the first training sample set is constructed, in some embodiments, the clustering result of the second medical image dataset, the second Bulk single-cell transcriptome sequencing data and the second Bulk transcriptome sequencing data corresponding to the second medical image dataset may be input when the second training sample set is constructed, further, in some embodiments, the image marker corresponding to the first medical image dataset and the image marker corresponding to the first medical image dataset may be the same or different, and when the same or different needs to be satisfied: the image markers corresponding to the second medical image data set are subsets of the image markers corresponding to the first medical image data set, and the characteristic weight parameters in the trained first neural network model can be used for transferring and learning to obtain second Bulk single-cell transcriptome sequencing data when the image markers corresponding to the second medical image data set are the subsets of the image markers corresponding to the first neural network model, but not simultaneously, the requirement of determining the second Bulk single-cell transcriptome sequencing data corresponding to the second medical image data set can be met based on the trained first neural network model, and Bulk single-cell transcriptome sequencing data corresponding to other medical image data sets can be further generated based on the trained first neural network model, so that the establishment of the classification model of other medical image data sets is used, and the expandability of the establishment of the image classification model is improved.

Based on the same inventive concept, some embodiments of the present disclosure further provide an image classification method, referring to fig. 6, and in some embodiments, the method may include:

s601: receiving medical image data to be classified;

s602: inputting the medical image data to be classified into an image classification model trained by the method according to any of the embodiments to obtain a medical image classification result corresponding to the medical image data to be classified; the medical image classification result comprises the number, the type and the probability value of the target image markers in the medical image data to be classified.

It should be noted that although the operations of the method of the present invention are described in a particular order in the above embodiments and the accompanying drawings, this does not require or imply that the operations must be performed in the particular order or that all of the illustrated operations be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

In correspondence to the above image classification method, some embodiments of the present disclosure further provide an image classification system, as shown in fig. 7, and in some embodiments, the system may include:

An image classifier 701, a medical image imaging device 702, and a display 703;

the image classifier 701 is respectively connected to the medical image imaging device 702 and the display 703, and is configured to perform the image classification method according to any of the foregoing embodiments according to the medical image generated by the medical image imaging device 702, so as to perform image classification, thereby obtaining the number, the type and the probability value of the image markers in the current medical image;

the display 703 is used for displaying the medical image generated by the medical image imaging device 702 and the image classification result of the image classifier 701.

It will be appreciated that in some embodiments, the image classification result of the image classifier 701 includes the number, type, and probability value of the target image markers present in the medical image generated by the medical image imaging device 702, for reference by the medical staff.

Corresponding to the above image classification model building method, some embodiments of the present disclosure further provide an image classification model building apparatus, as shown in fig. 8, and in some embodiments, the apparatus may include:

a receiving module 801, configured to receive a single cell transcriptome sequencing standard dataset, a first medical image dataset and a corresponding first Bulk transcriptome sequencing dataset, and a second medical image dataset;

A generating module 802, configured to generate first Bulk single-cell transcriptome sequencing data according to the first Bulk transcriptome sequencing data set and a single-cell transcriptome sequencing standard data set;

a clustering module 803, configured to perform self-supervised clustering on the first Bulk transcriptome sequencing dataset based on the first Bulk single-cell transcriptome sequencing data, to obtain a clustering result;

a first sample establishing module 804, configured to establish a first training sample set according to the first medical image data set, the first Bulk single-cell transcriptome sequencing data, and the clustering result;

a training module 805, configured to train the first neural network with the first training sample set to obtain a first neural network model;

the migration module 806 is configured to perform migration learning according to the feature weight parameter in the first neural network model, and determine second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset;

a second sample creation module 807 for creating a second training sample set from the second medical image dataset and second Bulk single-cell transcriptome sequencing data;

the modeling module 808 is configured to train the second neural network with the second training sample set to obtain a second neural network model, and use the trained second neural network model as an image classification model to predict the number, the type and the probability value of the target image markers in the medical image data.

In correspondence to the above-mentioned image classification method, some embodiments of the present disclosure further provide an image classification apparatus, as shown in fig. 9, and in some embodiments, the apparatus may include:

an acquisition module 901, configured to receive medical image data to be classified;

the classification module 902 is configured to input the medical image data to be classified into an image classification model trained by the method according to any of the foregoing embodiments, so as to obtain a medical image classification result corresponding to the medical image data to be classified; the medical image classification result comprises the number, the type and the probability value of the target image markers in the medical image data to be classified.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

In the embodiments of the present disclosure, the user information (including, but not limited to, user device information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) are information and data that are authorized by the user and are sufficiently authorized by each party.

Embodiments of the present description also provide a computer device. As shown in fig. 10, in some embodiments of the present description, the computer device 1002 may include one or more processors 1004, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 1002 may further comprise any memory 1006 for storing any kind of information, such as code, settings, data, etc., in a specific embodiment a computer program on the memory 1006 and executable on the processor 1004, which computer program, when executed by the processor 1004, may perform the instructions of the method according to any of the embodiments described above. For example, and without limitation, memory 1006 may include any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may store information using any technique. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 1002. In one case, when the processor 1004 executes associated instructions stored in any memory or combination of memories, the computer device 1002 can perform any of the operations of the associated instructions. The computer device 1002 also includes one or more drive mechanisms 1008, such as a hard disk drive mechanism, an optical disk drive mechanism, and the like, for interacting with any memory.

The computer device 1002 may also include an input/output interface 1010 (I/O) for receiving various inputs (via input device 1012) and for providing various outputs (via output device 1014). One particular output mechanism may include a presentation device 1016 and an associated graphical user interface 1018 (GUI). In other embodiments, input/output interface 1010 (I/O), input device 1012, and output device 1014 may not be included as just one computer device in a network. Computer device 1002 may also include one or more network interfaces 1020 for exchanging data with other devices via one or more communication links 1022. One or more communication buses 1024 couple the above-described components together.

The communication link 1022 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication links 1022 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), computer-readable storage media, and computer program products according to some embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processor to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processor, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processor to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processor to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computer device. Computer readable media, as defined in the specification, does not include transitory computer readable media (transmission media), such as modulated data signals and carrier waves.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present embodiments may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processors that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It should also be understood that, in the embodiments of the present specification, the term "and/or" is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An image classification model building method is characterized by comprising the following steps:

2. The method of claim 1, wherein generating the first Bulk single-cell transcriptome sequencing data from the first Bulk transcriptome sequencing dataset and the single-cell transcriptome sequencing standard dataset comprises:

3. The method of claim 2, wherein the target transcriptome characteristics include cell type information and ratio information between different cell types.

4. The method of claim 1, wherein self-supervised clustering of the first Bulk transcriptome sequencing dataset based on the first Bulk single cell transcriptome sequencing data results in clustered results, comprising:

5. The method of claim 1, wherein establishing a first training sample set from the first medical image dataset, first Bulk single-cell transcriptome sequencing data, and the clustering result comprises:

6. The method of claim 1, wherein performing migration learning according to the feature weight parameters in the first neural network model, determining second Bulk single-cell transcriptome sequencing data corresponding to the second medical image dataset, comprises:

7. The method of claim 5, wherein establishing a second training sample set from the second medical image dataset and second Bulk single cell transcriptome sequencing data comprises:

8. The method of claim 6, wherein the image markers corresponding to the second medical image dataset are a subset of the image markers corresponding to the first medical image dataset.

9. An image classification method, the method comprising:

receiving medical image data to be classified;

inputting the medical image data to be classified into an image classification model trained by the method of any one of claims 1-8 to obtain a medical image classification result corresponding to the medical image data to be classified; the medical image classification result comprises the number, the type and the probability value of the target image markers in the medical image data to be classified.

10. An image classification model building apparatus, the apparatus comprising:

11. An image classification apparatus, the apparatus comprising:

the classification module is used for inputting the medical image data to be classified into an image classification model trained by the method of any one of claims 1-8 so as to obtain a medical image classification result corresponding to the medical image data to be classified; the medical image classification result comprises the number, the type and the probability value of the target image markers in the medical image data to be classified.

12. An image classification system, the system comprising: an image classifier, a medical image imaging device and a display;

The image classifier is respectively connected with the medical image imaging device and the display and is used for executing the method according to the medical image generated by the medical image imaging device so as to classify the images and obtain the number, the type and the probability value of the image markers in the current medical image;

13. A computer device comprising a memory, a processor, and a computer program stored on the memory, characterized in that the computer program, when being executed by the processor, performs the instructions of the method according to any one of claims 1-9.

14. A computer storage medium having stored thereon a computer program, which, when executed by a processor of a computer device, performs the instructions of the method according to any of claims 1-9.