CN113918786A - Intelligent cell subtype judgment method - Google Patents

Intelligent cell subtype judgment method Download PDF

Info

Publication number
CN113918786A
CN113918786A CN202111218373.2A CN202111218373A CN113918786A CN 113918786 A CN113918786 A CN 113918786A CN 202111218373 A CN202111218373 A CN 202111218373A CN 113918786 A CN113918786 A CN 113918786A
Authority
CN
China
Prior art keywords
cell
data
new
cell type
gene expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111218373.2A
Other languages
Chinese (zh)
Inventor
宗杰
张博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Novelbio Co ltd
Original Assignee
Shanghai Novelbio Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Novelbio Co ltd filed Critical Shanghai Novelbio Co ltd
Priority to CN202111218373.2A priority Critical patent/CN113918786A/en
Publication of CN113918786A publication Critical patent/CN113918786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Abstract

The invention discloses an intelligent judgment method for cell subtypes.A database containing various cell types is established according to single cell data of determined cell types, and the single cell data in the database is subjected to cell type classification; calculating according to the single cell data of the determined cell types through the gene expression condition to obtain a characteristic expression scoring model of each cell type; then, new cell data are imported into the characteristic expression scoring model for identification scoring, and the gene expression condition of the new cell is obtained; comparing the obtained gene expression condition of the new cell with single cell data recorded in a database containing various cell types to obtain the cell type of the new cell and finish the classification of the new cell; a perfect database is constructed, input data can be automatically identified and scored through a characteristic expression scoring model, and then different cell types are judged, so that the cell types can be accurately judged.

Description

Intelligent cell subtype judgment method
Technical Field
The invention relates to the technical field of cell classification, in particular to an intelligent judgment method for cell subtypes.
Background
As artificial intelligence techniques mature, machine learning has become more and more widely applied, such as data mining, natural language processing, DNA sequence prediction, and so on. In the field of biological research, a large number of efforts are made to identify and classify biological features, which is very burdensome for researchers. And analysis of cellular features is both fundamental and important in biological features.
At present, the analysis of cells mainly depends on the human eye observation of researchers for classification and identification, and the error caused by fatigue is easy to occur when a large amount of sample data is observed; alternatively, some common cell staining methods are also used to classify and identify cells, but the staining reaction is limited to the characteristics of the cells themselves, so that different cells may have the same or similar color, which is not favorable for research. For example, in disease diagnosis, after a pathological image is stained, a doctor determines whether or not a cell is diseased based on the stained image. However, when a doctor observes a pathological image, diagnosis results are easy to make mistakes due to various problems such as working pressure, visual fatigue and the like, and the doctor improves the diagnosis results and provides an intelligent cell subtype judgment method.
Disclosure of Invention
In order to solve the technical problems, the invention provides the following technical scheme:
the method for intelligently judging the cell subtype comprises the following steps:
step 1, establishing a database containing various cell types according to the single cell data of the determined cell types, and classifying the cell types of the single cell data in the database;
step 2, calculating according to the single cell data of the determined cell types through the gene expression condition to obtain a characteristic expression scoring model of each cell type;
step 3, importing the new cell data into a characteristic expression scoring model for identification scoring to obtain the gene expression condition of the new cell;
step 4, comparing the gene expression condition of the obtained new cell with single cell data in a database containing various cell types to obtain the gene expression information comparison condition of the new cell;
and 5, judging the cell type of the new cell based on the comparison of the gene expression information of the new cell, thereby finishing the classification of the new cell.
The method for obtaining the characteristic expression scoring model of each cell type comprises the steps of obtaining a characteristic expression matrix of each cell type by adopting a cell blast algorithm, finding data identifying the cell type, taking an identification result of the cell type to generate a data set, selecting proper varegees to carry out model training, carrying out unsupervised dimension reduction and fitting into a DIRECTi model, then carrying out function internal automatic data standardization and clustering, projecting the cells onto a cell embedding space, carrying out dimension reduction and visual evaluation on the clustering condition of the same cell type, namely discarding the data which are not clustered together with the same cell type, and carrying out batch correction according to the condition of an original sample; and repeatedly training a plurality of models by evaluating related information, wherein each model has a random seed, and is combined with an original reference data set to form a cell _ blast database, and finally, a proper data set to be tested is loaded. The algorithm obtains an initial matching result of the data set to be tested through effective nearest neighbor search based on Euclidean distance in the potential space.
As a preferred technical solution of the present invention, the method for importing new cell data into a feature expression scoring model for identification and scoring is to merge results of the new cell data in a plurality of models, and filter a p value to obtain a python dist key which is a result of cell name query, where the result gives hit times of a current cell in a potential space, a total value of a current predicted cell type in the space and the p value of the predicted result, and a final cell type prediction result is given by integrating test results.
In a preferred embodiment of the present invention, when a new cell type is obtained, a cell annotation is performed on newly input cell data.
As a preferred technical scheme of the invention, the cell subtype is classified, and the cell subtype is generally classified by a marker gene.
As a preferred technical scheme of the invention, the method for classifying the cell types of the single-cell data in the database is to utilize marker genes to classify the single-cell data.
The invention has the beneficial effects that: the cell subtype intelligent judgment method establishes a database containing various cell types according to the single cell data of the determined cell types, and carries out cell type classification on the single cell data in the database; calculating according to the single cell data of the determined cell types through the gene expression condition to obtain a characteristic expression scoring model of each cell type; then, importing the data of the new cell into a characteristic expression scoring model for identification scoring to obtain the gene expression condition of the new cell; further comparing the gene expression condition of the new cell with the single cell data in the database containing various cell types to obtain the cell type of the new cell, thereby completing the classification of the new cell; a perfect database is constructed, input data can be automatically identified and scored through a characteristic expression scoring model, different cell types are judged and marked, and therefore the cell types can be accurately judged.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of an intelligent determination method for cell subtype according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1:
an intelligent judgment method for cell subtypes is disclosed, as shown in figure 1, a database containing a plurality of cell types is constructed based on the previous single cell data of determined cell types, and further, the data of each cell in the database is divided to determine the classification of the cell types;
determining single cell data of a cell type, calculating the cell gene expression condition by adopting a cell blast algorithm, selecting proper varegees for model training, carrying out unsupervised dimension reduction and fitting to form a DIRECTi model, then carrying out automatic data standardization and clustering in a function, projecting the cells onto a cell embedding space, carrying out dimension reduction and visual evaluation on the clustering condition of the same cell type, and carrying out batch correction according to the original sample condition; and evaluating relevant information, repeatedly training a plurality of models, and finally loading a proper data set to be tested. The algorithm obtains an initial matching result of the data set to be detected through effective nearest neighbor search based on Euclidean distance in a potential space, and a characteristic expression scoring model of each cell type is obtained;
importing the data of the new cells into the characteristic expression scoring model for identification scoring, merging results of the data of the newly input cells in the multiple models, and filtering the p value to obtain the gene expression condition of the new cells;
comparing the gene expression condition of the new cell with single cell data in a database containing various cell types to obtain the gene expression information comparison condition of the new cell;
and judging the cell type of the new cell based on the comparison of the gene expression information of the new cell, thereby completing the classification of the new cell. The result is that a perfect database is constructed, the input data can be automatically identified and scored through a characteristic expression scoring model, then different cell types are judged and marked, and therefore the cell types can be accurately judged.
After a new cell type is obtained, a cell annotation is performed on the newly entered cell data.
The cell subtype classification is generally carried out by a marker gene.
Cell population identification based on multiple datasets: when a plurality of data sets are identified, different data sets judge different cells, some specially judge fibroblasts, some judge immune cells, some judge T cells, B cells, dendritic cells, macrophages, epithelial cells, fibroblasts, endothelial cells, natural killer cells, erythrocytes, adipocytes, stem cells, malignant cells and the like, the different data sets have different functions, 2-3 data sets can judge simultaneously, after judgment is finished, scores of different data sets are different, for example, in the A data set, T cells and B cells are successfully judged, CD4 and CD8 exist in the T cells, and in the other B data set, for example, epithelial cells and fibroblasts are successfully judged.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. An intelligent cell subtype judgment method is characterized in that: comprises the following steps of (a) carrying out,
step 1, establishing a database containing various cell types according to the existing single cell data of the determined cell types, and classifying the cell types of the single cell data in the database;
step 2, calculating according to the single cell data of the determined cell types through the gene expression condition to obtain a characteristic expression scoring model of each cell type;
step 3, importing the new cell data into a characteristic expression scoring model for identification scoring to obtain the gene expression condition of the new cell;
step 4, comparing the gene expression condition of the obtained new cell with single cell data in a database containing various cell types to obtain the gene expression information comparison condition of the new cell;
and 5, judging the cell type of the new cell based on the comparison of the gene expression information of the new cell, thereby finishing the classification of the new cell.
2. The method for intelligently determining cell subtypes according to claim 1, wherein the method for obtaining the feature expression scoring model of each cell type is obtained by adopting a cell blast algorithm to obtain the feature expression matrix of each cell type, and the specific process is that data identifying the cell type are found, a data set is generated by taking the identification result of the cell type, a suitable varegees is selected for model training, unsupervised dimension reduction and fitting into a DIRECTi model, then automatic data standardization and clustering inside the function are performed, the cells are projected onto a cell embedding space, the clustering condition of the same cell type is visually evaluated, namely the data which are not clustered together in the same cell type are discarded, and batch correction is performed according to the original sample condition; and repeatedly training a plurality of models by evaluating related information, wherein each model has a random seed, and is combined with an original reference data set to form a cell _ blast database, and finally, a proper data set to be tested is loaded. The algorithm obtains an initial matching result of the data set to be tested through effective nearest neighbor search based on Euclidean distance in the potential space.
3. The method for intelligently determining cell subtypes according to claim 1, wherein the method for introducing new cell data into the feature expression scoring model to perform identification scoring is to combine results of the new cell data in a plurality of models and filter a p value to obtain a cell name query result (python distkey), the result gives the hit number of the current cell in a potential space, the total value of the current predicted cell type in the space and the p value of the predicted result, and the final cell type prediction result is given by integrating test results.
4. The method as claimed in claim 1, wherein a cell annotation is performed on newly entered cell data after the cell type is obtained from the newly entered cell.
5. The method according to claim 1, wherein the cell subtypes are classified, and the cell subtypes are classified generally by a marker gene.
6. The method according to claim 1, wherein the classification of the cell types of the single-cell data in the database is performed by using a marker gene.
CN202111218373.2A 2021-10-20 2021-10-20 Intelligent cell subtype judgment method Pending CN113918786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111218373.2A CN113918786A (en) 2021-10-20 2021-10-20 Intelligent cell subtype judgment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111218373.2A CN113918786A (en) 2021-10-20 2021-10-20 Intelligent cell subtype judgment method

Publications (1)

Publication Number Publication Date
CN113918786A true CN113918786A (en) 2022-01-11

Family

ID=79241595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111218373.2A Pending CN113918786A (en) 2021-10-20 2021-10-20 Intelligent cell subtype judgment method

Country Status (1)

Country Link
CN (1) CN113918786A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116364A (en) * 2023-10-25 2023-11-24 智泽童康(广州)生物科技有限公司 Single cell database and associated cell subgroup automatic recommendation method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116364A (en) * 2023-10-25 2023-11-24 智泽童康(广州)生物科技有限公司 Single cell database and associated cell subgroup automatic recommendation method thereof
CN117116364B (en) * 2023-10-25 2024-02-20 智泽童康(广州)生物科技有限公司 Single cell database and associated cell subgroup automatic recommendation method thereof

Similar Documents

Publication Publication Date Title
US8605981B2 (en) Centromere detector and method for determining radiation exposure from chromosome abnormalities
CN109952614B (en) Biological particle classification system and method
CN109300111B (en) Chromosome recognition method based on deep learning
CN113454733A (en) Multi-instance learner for prognostic tissue pattern recognition
CN104809476B (en) A kind of multi-target evolution Fuzzy Rule Classification method based on decomposition
US10769432B2 (en) Automated parameterization image pattern recognition method
CN114283407A (en) Self-adaptive automatic leukocyte segmentation and subclass detection method and system
CN111079620A (en) Leukocyte image detection and identification model construction method based on transfer learning and application
CN103177266A (en) Intelligent stock pest identification system
KR102362872B1 (en) Method for refining clean labeled data for artificial intelligence training
CN110084314A (en) A kind of false positive gene mutation filter method for targeted capture gene sequencing data
CN114596467A (en) Multimode image classification method based on evidence deep learning
CN111652095A (en) CTC image identification method and system based on artificial intelligence
CN110797084A (en) Deep neural network-based cerebrospinal fluid protein prediction method
CN113918786A (en) Intelligent cell subtype judgment method
US20150242676A1 (en) Method for the Supervised Classification of Cells Included in Microscopy Images
CN106326914B (en) A kind of more classification methods of pearl based on SVM
EP4214675A1 (en) Methods and systems for predicting neurodegenerative disease state
CN113160891A (en) Microsatellite instability detection method based on transcriptome sequencing
TW201913565A (en) Evaluation method for embryo images and system thereof
CN108182676A (en) A kind of sperm fragment rate detection method, device, equipment and readable storage medium storing program for executing
CN116665210A (en) Cell classification method and device based on multichannel information fusion
CN116072302A (en) Medical unbalanced data classification method based on biased random forest model
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
KR101913952B1 (en) Automatic Recognition Method of iPSC Colony through V-CNN Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination