CN117198399A - Microsatellite locus, system and kit for predicting MSI state - Google Patents
Microsatellite locus, system and kit for predicting MSI state Download PDFInfo
- Publication number
- CN117198399A CN117198399A CN202311224358.8A CN202311224358A CN117198399A CN 117198399 A CN117198399 A CN 117198399A CN 202311224358 A CN202311224358 A CN 202311224358A CN 117198399 A CN117198399 A CN 117198399A
- Authority
- CN
- China
- Prior art keywords
- microsatellite
- msi
- stability
- sample
- tagged
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108091092878 Microsatellite Proteins 0.000 title claims abstract description 256
- 238000012163 sequencing technique Methods 0.000 claims abstract description 31
- 238000001514 detection method Methods 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000010801 machine learning Methods 0.000 claims abstract description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 28
- 238000012360 testing method Methods 0.000 claims description 25
- 238000004364 calculation method Methods 0.000 claims description 20
- 239000003550 marker Substances 0.000 claims description 15
- 238000007637 random forest analysis Methods 0.000 claims description 11
- 239000003153 chemical reaction reagent Substances 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 238000007635 classification algorithm Methods 0.000 claims description 2
- 238000003066 decision tree Methods 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims description 2
- 238000013479 data entry Methods 0.000 claims 1
- 206010028980 Neoplasm Diseases 0.000 abstract description 16
- 201000011510 cancer Diseases 0.000 abstract description 7
- 208000032818 Microsatellite Instability Diseases 0.000 description 76
- 239000000523 sample Substances 0.000 description 64
- 238000005251 capillar electrophoresis Methods 0.000 description 13
- 238000012217 deletion Methods 0.000 description 10
- 230000037430 deletion Effects 0.000 description 10
- 238000003780 insertion Methods 0.000 description 10
- 230000037431 insertion Effects 0.000 description 10
- 238000007477 logistic regression Methods 0.000 description 10
- 230000008859 change Effects 0.000 description 9
- 108090000623 proteins and genes Proteins 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 239000002773 nucleotide Substances 0.000 description 5
- 125000003729 nucleotide group Chemical group 0.000 description 5
- 210000004027 cell Anatomy 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000033607 mismatch repair Effects 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 238000003908 quality control method Methods 0.000 description 3
- 208000011580 syndromic disease Diseases 0.000 description 3
- 206010009944 Colon cancer Diseases 0.000 description 2
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 2
- 208000005718 Stomach Neoplasms Diseases 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 230000002357 endometrial effect Effects 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 208000032620 x-linked multiple congenital anomalies-neurodevelopmental syndrome Diseases 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 1
- 108020004414 DNA Proteins 0.000 description 1
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 1
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 206010014733 Endometrial cancer Diseases 0.000 description 1
- 206010014759 Endometrial neoplasm Diseases 0.000 description 1
- 208000008051 Hereditary Nonpolyposis Colorectal Neoplasms Diseases 0.000 description 1
- 206010051922 Hereditary non-polyposis colorectal cancer syndrome Diseases 0.000 description 1
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 1
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 201000005027 Lynch syndrome Diseases 0.000 description 1
- 229910015837 MSH2 Inorganic materials 0.000 description 1
- 241000258241 Mantis Species 0.000 description 1
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 1
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 244000153888 Tung Species 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 239000002671 adjuvant Substances 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 230000002496 gastric effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 238000012151 immunohistochemical method Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000007791 liquid phase Substances 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002638 palliative care Methods 0.000 description 1
- 239000012188 paraffin wax Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 108091035233 repetitive DNA sequence Proteins 0.000 description 1
- 102000053632 repetitive DNA sequence Human genes 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application discloses a marked microsatellite locus, a system and a kit for predicting an MSI overall state, and belongs to the technical field of MSI detection. According to the application, a regression model is built through known microsatellite loci, the stability of 22 marked microsatellite loci is predicted, and a machine learning model is further built to build an MSI overall state. By utilizing the application, the MSI overall state prediction can be carried out in different NGS detection platforms and different cancer types through analyzing the second generation sequencing data without matching normal samples, the method is stable and quick, the result is accurate, the repeatability is high, and the detection limitation is reduced.
Description
Technical Field
The application belongs to the technical field of MSI detection, and particularly relates to a labeled microsatellite locus, a labeled microsatellite system and a labeled microsatellite kit for predicting an MSI state.
Background
Microsatellite instability (Microsatellite Instability, MSI), a phenomenon in which the sequence length of a Microsatellite (MS) sequence changes due to insertion or deletion mutations during DNA replication, is often caused by Mismatch repair function (MMR) defects. MS sequences, which are short and repetitive DNA sequences, generally consist of 1-6 nucleotides, are arranged in tandem repeats, and are commonly of the double-base CA/GA/GT or single-base A/T type. The MS sequence can be located in an important non-coding region of a gene or in a coding region of the gene, and the polymorphism is distributed in the whole genome and has large individual difference. Cells can repair regions where mismatches occur during genome replication by mismatch repair elements. In some cases, the mismatch repair element loses its original function, such as MLH1, MSH2, MSH6, etc., and the cell fails to repair the mismatch, and the cell develops an MSI phenotype. Such a change in the number of unit repetitions of the microsatellite may be long or short.
Microsatellite instability is a relatively common phenomenon in tumors. The unstable state of the microsatellite indicates the cause and development of tumors, and can play an important role in auxiliary diagnosis and medication guidance in different cancer types. In general, the state of microsatellite instability can be classified into microsatellite highly unstable (MSI-H), microsatellite low unstable (MSI-L) and microsatellite stable (MSS). Among cancer species such as colorectal cancer, endometrial cancer, gastric cancer, etc., patients with MSI-H status have significant differences in survival, medication preference, palliative treatment prognosis, and the like for both. MSI detection can help doctors fully know cancer types and put forward a correct diagnosis and treatment scheme.
Another important role of MSI detection is to aid in the screening of Cha Linji Syndrome (Lynch Syndrome), a hereditary Syndrome in which the mismatch repair gene undergoes germ line mutations, characterized by colorectal, endometrial, gastric cancers, which occur at a younger age, and related cancer species in the family. MSI status has a certain correlation with Linked syndrome, and most patients with microsatellite stabilization are not Linked syndrome. Traditionally, if a patient has the symptoms described above, a physician will often consider detailed testing. If the immunohistochemical method is used, two experienced pathologists are required to jointly detect the composition, and the accuracy is low. If MSI detection is used, the detection is more convenient and the accuracy is higher. When the patient is judged to be MSI-H, comprehensive genetic diagnosis can be continued to determine whether the patient is the Lin-Chemicals syndrome.
Due to the sequence characteristics of MSI sites, single site sequencing is less reliable, hundreds of MSI sites are usually detected by second generation sequencing-based MSI detection, and microsatellite status can be detected by using a multi-site clone detection and genome sequencing method. The multi-site cloning method is commonly used in the current MSI detection, and by judging whether the lengths of 5-tens of microsatellite regions highly related to the MSI state are changed or not, judging whether the sites are unstable or not according to the change degree, the higher the proportion of unstable sites, the more likely the cells are in the MSI state. The method has low price and short experimental flow. However, many current detection tools that predict MSI status by NGS methods require normal sample pairing, such as mSINGS, MANTIS, etc.
Disclosure of Invention
The application mainly aims at providing a method for predicting the MSI overall state without normal sample pairing aiming at liquid phase capture second generation sequencing microsatellite locus data, which has the advantages of convenient calculation, stability, rapidness, accurate result, high repeatability and reduced detection limitation.
In order to achieve the above purpose, the technical scheme provided by the application is as follows:
a first aspect of the present application provides a tagged microsatellite loci for determining the overall status of MSI of a sample to be tested, said tagged microsatellite loci comprising:
microsatellite (MS) refers to short tandem repeats (Short tandem repeat, STR) in the human genome, or multiple base sequences, including single nucleotide repeats, double nucleotide repeats, and even more nucleotide repeats. Microsatellite instability (Microsatellite instability, MSI) refers to any change in length of a microsatellite in tumor tissue due to insertion or deletion of a repeat unit relative to normal tissue. In the application, the MSI overall state refers to the state of all microsatellites in a sample to be tested; the microsatellite loci are specific microsatellite loci, and the marked microsatellite loci are microsatellites for judging the MSI overall state of the sample to be detected.
Microsatellites comprising other single nucleotide repeats and/or dinucleotide repeats may also be selected or included as marker microsatellite loci in the present application.
In the application, the repetition number is the standard value of the length of the marked microsatellite loci) I.e. the number of repetitions in the table above (standard number of repetitions).
In the present application, since the microsatellite is changed in any length due to insertion or deletion of the repeating unit, the length of the microsatellite is a specific number of repetitions of the repeating unit, i.e., how many times the length is repeated. For example, even if the number of bases of the repeating unit is 2, which is repeated 10 times, the base length is 20, the length of the microsatellite loci is considered to be 10.
For microsatellite loci, there are 3 states of deletion, invariant and insertion per repeat unit, i.e. the number of each repeat unit may be 0, 1, 2. Based on this, for each microsatellite locus, the length is in the range of [0,2 ]]。
In a second aspect, the application provides the use of a detection reagent for labelling microsatellite loci according to the first aspect of the application in the preparation of a kit for determining the overall status of MSI of a sample to be tested.
A third aspect of the application provides a system for determining the MSI overall status of a sample to be tested, comprising the following modules:
the system comprises a marked microsatellite locus data input module, a first data processing module and a second data processing module, wherein the marked microsatellite locus data input module is used for receiving stability data of a marked microsatellite locus of a first aspect of the application of a sample to be detected, and the stability comprises stability and instability;
an MSI global state storage module for storing stability data and MSI global state data for the tagged microsatellite loci of a second population sample, the MSI global state comprising MSI-H, MSI-L and MSS;
the MSI overall state determining module is respectively connected with the data input module and the MSI overall state storage module and is used for constructing a prediction model by utilizing the stability data of the marked microsatellite loci of the second population samples and determining the MSI overall state of the sample to be detected based on the stability data of the marked microsatellite loci of the sample to be detected obtained from the marked microsatellite locus stability prediction module.
In some embodiments of the present application, the stability data of the tagged microsatellite loci refer to whether there is a change in the length of the tagged microsatellite loci, and if a significant change in the length of a tagged microsatellite locus occurs, i.e., the length of the tagged microsatellite locus is significantly changed due to insertion or deletion of a repeat unit, the tagged microsatellite locus is unstable; in contrast, a tagged microsatellite locus is stable if the length of the tagged microsatellite locus is unchanged, i.e., there is no change in length of the tagged microsatellite locus due to insertion or deletion of a repeat unit or if the length change is insignificant even if insertion or deletion of a repeat unit is present.
In the present application, the MSI-H, MSI-L and MSS have the following meanings:
(1) MSS: microsatellite Stability microsatellite stability, i.e. all microsatellite loci are stable.
(2) MSI-L: low-frequency MSI, low-frequency microsatellite instability, i.e., microsatellite loci less than a preset threshold are unstable.
(3) MSI-H: high-frequency MSI, high-frequency microsatellite instability, i.e., microsatellite loci that are not less than a preset threshold are unstable.
In some embodiments of the application, the preset threshold may be an absolute value of the quantity, or may be a proportion, for example 30%.
In some embodiments of the application, the MSI global state determining module, the constructing a predictive model using stability data of the tagged microsatellite loci of the population sample comprises the steps of:
s21, randomly dividing the stability data of the marked microsatellite loci of the second population sample into two groups, wherein one group is a second training set, the other group is a second testing set, and each group comprises the stability data of the marked microsatellite loci of an MSI-H sample, an MSI-L sample and an MSS sample;
s22, constructing an MSI overall state prediction model based on a machine learning algorithm by using the second training set data;
s23, in the second test set, verifying the obtained MSI overall state prediction model.
In some embodiments of the application, the machine learning algorithm is selected from any one of the following algorithms:
random forest algorithm, neural network algorithm, support vector machine algorithm, bayesian classification algorithm, gradient lifting algorithm, K neighbor algorithm and decision tree algorithm.
In some preferred embodiments of the application, the MSI global state prediction model is trained using a random forest model.
In some embodiments of the application, the stability data of the tagged microsatellite loci obtained in the tagged microsatellite locus data input module is obtained by a PCR-based method.
Further, the system further comprises:
the sequencing data input module is used for inputting capturing sequencing data of a target area containing the marked microsatellite loci in the sample to be tested and obtaining the peak number, kurtosis, skewness, standard deviation and standard deviation of the length of the marked microsatellite loci;
the microsatellite locus storage module is used for storing the peak number, kurtosis, skewness, standard deviation and standard deviation of the length of the microsatellite locus of the first group sample and stability data;
a tagged microsatellite locus stability prediction module, coupled to the sequencing data input module, the tagged microsatellite locus storage module, and the tagged microsatellite locus data input module, respectively, for constructing a tagged microsatellite locus stability prediction model using the peak count, kurtosis, skewness, standard deviation, and standard deviation of the first population sample known microsatellite locus length and stability data, and predicting stability of the tagged microsatellite locus based on the peak count, kurtosis, skewness, standard deviation, and standard deviation of the tagged microsatellite locus length obtained from the sequencing data input module, and for outputting stability data of the tagged microsatellite locus to the tagged microsatellite locus data input module.
In some embodiments of the application, the known microsatellite loci comprise:
in some embodiments of the application, the constructing the tagged microsatellite locus stability prediction model using the peak count, kurtosis, skewness, standard deviation and standard deviation of the known microsatellite locus length and stability data comprises the steps of:
s11, randomly dividing the peak number, kurtosis, skewness, standard deviation and stability data of the known microsatellite locus length of a first population sample into two groups, wherein one group is a first training set and the other group is a first test set;
s12, constructing a marker microsatellite locus stability prediction model based on a regression algorithm by using the first training set data;
and S13, in the first test set, verifying the obtained marked microsatellite locus stability prediction model.
In some embodiments of the application, the regression algorithm is selected from any one of the following algorithms: logistic regression algorithm, linear regression algorithm.
In some embodiments of the present application, for any one of the marker microsatellite loci, firstly counting the peak number of the marker microsatellite locus length according to a peak finding algorithm, and respectively calculating the skewness Shew, kurt, standard deviation S and standard deviation P of the marker microsatellite locus length:
the asymmetry of the random variable probability distribution is measured by the skewness, which is a measure of the degree of asymmetry relative to the average value, and the degree and direction of asymmetry of the marker microsatellite locus length distribution can be determined by measuring the skewness coefficient. The offset is measured relative to the normal distribution, which is 0, i.e., if the distribution of the length of the marker microsatellite loci is symmetrical. If the deviation is greater than 0, distributing right deviation, namely distributing a long tail on the right; if the deviation is smaller than 0, the distribution is left-biased, i.e. a long tail is distributed on the left; meanwhile, the larger the absolute value of the skewness, the more serious the shift degree of the distribution is.
The deflection calculation formula is:
wherein,the length of the microsatellite loci for the marker is +.>Number of reads at time,/->=[1,/>];/>The average value of the numbers of reads with different lengths of the marked microsatellite loci is obtained; n is the number of length classes of the tagged microsatellite loci, i.e.how many different lengths are, < >>=2/>;/>Is the standard value of the length of the marked microsatellite loci.
Kurtosis is a statistic of the steep or smooth distribution of research data, and by measuring the kurtosis coefficient, it can be determined whether the length of the marked microsatellite loci is steeper or flatter than that of normal distribution. If kurtosis=3, the kurtosis of the length distribution of the marked microsatellite loci obeys normal distribution; if kurtosis is >3, the kurtosis of the length distribution of the marked microsatellite loci is steep (high-pointed); if kurtosis is <3, the kurtosis of the distribution of the length of the marked microsatellite loci is gentle (short and fat). The kurtosis calculation formula is:
wherein,the length of the microsatellite loci for the marker is +.>Number of reads at time,/->=[1,/>];/>The average value of the numbers of reads with different lengths of the marked microsatellite loci is obtained; n is the length class number of the marked microsatellite loci, ">=2/>,/>Is the standard value of the length of the marked microsatellite loci.
The standard deviation reflects the degree of dispersion of the length of the marked microsatellite loci, and the larger the value is, the more the value is dispersed, namely the larger the difference between different lengths of the marked microsatellite loci is.
The standard deviation calculation formula is:
wherein,the length of the microsatellite loci for the marker is +.>Number of reads at time,/->=[1,/>];/>The average value of the numbers of reads with different lengths of the marked microsatellite loci is obtained; />Is a microsatellite locus length +.>Normalized value of the number of reads at time, < ->;/>Is->Is the average value of (2); n is the labelLength class of microsatellite loci,/->=2/>,/>Is the standard value of the length of the marked microsatellite loci.
Standard deviation refers to the level at which the length of the tagged microsatellite loci deviates from the standard value of the length of the tagged microsatellite loci.
The standard offset calculation formula is:
wherein,the length of the microsatellite loci for the marker is +.>Number of reads at time,/->=[1,/>];/>Is a microsatellite locus length +.>Normalized value of the number of reads at time, < ->The method comprises the steps of carrying out a first treatment on the surface of the n is the length category number of the marked microsatellite loci; />For the standard value of the length of the marked microsatellite loci, < + >>=2/>,/>Is the standard value of the length of the marked microsatellite loci.
In some embodiments of the application, in the sequencing data input module, the number of peaks, kurtosis, skewness, standard deviation, and standard deviation of the tagged microsatellite loci are calculated only if the depth of capture sequencing of the target region reaches 400×. Specifically, the number of reads of different lengths of the tagged microsatellite loci is calculated using the result file of Msisensor 2.
In a fourth aspect, the application provides a kit for determining the overall status of MSI of a sample to be tested, comprising a detection reagent for labelling microsatellite loci according to the first aspect of the application.
In the present application, the "first population sample" and the "second population sample" are only formal regions, wherein the data of the first population sample includes the number of peaks, kurtosis, skewness, standard deviation and standard deviation of the known microsatellite loci length of each sample and stability data for constructing a labeled microsatellite locus stability prediction model based on a regression algorithm; the data of the second population sample includes stability data and MSI global state data of the tagged microsatellite loci for constructing an MSI global state prediction model.
In the present application, the sample to be tested is derived from a human, preferably a tumor sample, such as fresh tissue, tissue paraffin block (FFPE), etc.
The beneficial effects of the application are that
Compared with the prior art, the application has the following beneficial effects:
by utilizing the application, MSI overall state prediction can be performed in different NGS detection platforms and different cancer species through analysis of second generation sequencing data without normal sample pairing.
The method for predicting the MSI overall state is stable and rapid, accurate in result and high in repeatability, and reduces detection limitation.
Drawings
FIG. 1 shows a length distribution of a microsatellite loci in example 3 of the present application.
FIG. 2 shows the MSI-PCR capillary electrophoresis detection structure of a microsatellite loci in example 4 of the present application. A: a tumor sample; b: a blood sample.
Fig. 3 shows the results of stability performance evaluation of the logistic regression model prediction microsatellite loci established in example 4 of the present application.
Fig. 4 shows the result of performance evaluation of the random forest model established in embodiment 5 of the present application to predict the MSI overall state.
Fig. 5 shows the result of verification of the external data set in embodiment 6 of the present application.
Fig. 6 is a schematic diagram of a system for determining the MSI overall status of a sample to be tested constructed in embodiment 7 of the present application.
Detailed Description
Unless otherwise indicated, implied from the context, or common denominator in the art, all parts and percentages in the present application are based on weight and the test and characterization methods used are synchronized with the filing date of the present application. Where applicable, the disclosure of any patent, patent application, or publication referred to in this application is incorporated by reference in its entirety, and the equivalent patents to those cited in this application are incorporated by reference, particularly as if they were set forth in the relevant terms of art. If the definition of a particular term disclosed in the prior art is inconsistent with any definition provided in the present application, the definition of the term provided in the present application controls.
The numerical ranges in the present application are approximations, so that it may include the numerical values outside the range unless otherwise indicated. The numerical range includes all values from the lower value to the upper value that increase by 1 unit, provided that there is a spacing of at least 2 units between any lower value and any higher value. For ranges containing values less than 1 or containing fractions greater than 1 (e.g., 1.1,1.5, etc.), then 1 unit is suitably considered to be 0.0001,0.001,0.01, or 0.1. For a range containing units of less than 10 (e.g., 1 to 5), 1 unit is generally considered to be 0.1. These are merely specific examples of what is intended to be provided, and all possible combinations of numerical values between the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure.
The terms "comprises," "comprising," "including," and their derivatives do not exclude the presence of any other component, step or process, and are not related to whether or not such other component, step or process is disclosed in the present application. For the avoidance of any doubt, all use of the terms "comprising", "including" or "having" herein, unless expressly stated otherwise, may include any additional additive, adjuvant or compound. Rather, the term "consisting essentially of … …" excludes any other component, step or process from the scope of any of the terms recited below, as those out of necessity for operability. The term "consisting of … …" does not include any components, steps or processes not specifically described or listed. The term "or" refers to the listed individual members or any combination thereof unless explicitly stated otherwise.
In order to make the technical problems, technical schemes and beneficial effects solved by the application more clear, the application is further described in detail below with reference to the embodiments.
Examples
The following examples are presented herein to demonstrate preferred embodiments of the present application. It will be appreciated by those skilled in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function in the practice of the application, and thus can be considered to constitute preferred modes for its practice. Those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit or scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the disclosure of which is incorporated herein by reference as is commonly understood by reference.
Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments of the application described herein. Such equivalents are intended to be encompassed by the claims.
The experimental methods in the following examples are conventional methods unless otherwise specified. The instruments used in the following examples are laboratory conventional instruments unless otherwise specified; the test materials used in the examples described below, unless otherwise specified, were purchased from conventional biochemical reagent stores.
Example 1 selection of microsatellite loci
800 clinical tumor patient tissue samples containing colorectal cancer, lung cancer and other cancer species were collected for MSI prediction.
For the selection of microsatellite loci, this example used a large gene set (panel, as shown in Table 1) containing 169 gene tests developed based on VariantBaitsTM technology, associated with solid tumors such as colorectal, lung, endometrial, gastric, etc. And performing hybridization capture on DNA extracted from the sample by using a target area probe, then establishing a library to obtain a corresponding sample library, and further performing second-generation sequencing on the library to obtain second-generation sequencing original data (raw data) in fastq format.
TABLE 1 list of 169 genes
Example 2 sample library-based sequencing and sequence alignment
The quality control software picard is used for quality control of the original data obtained in the embodiment 1, sequencing joints, low-quality bases, sequencing error fragments and the like are filtered, and high-quality data (clean data) is obtained after filtering.
Comparing and analyzing the clean data by using sequence comparison software bwa-MEM to obtain genome specific position information (reference genome is hg 19) of the sequence; and (5) sequencing and deduplication are carried out by using samtools and sambamba software, so that a sample original bam file is obtained.
And inputting the original sample bam file into software Msisensor2 for analysis to obtain the numbers of reads with different lengths of the target region repeated base sequence, namely a sample data set.
Example 3 microsatellite locus stability State prediction
The target region sequencing depth is at least 400×, and subsequent calculations are performed if the depth is met.
For a particular microsatellite locus: chr2_47641559_CAGGT_27[ A ] _GGGTT, as shown in Table 2).
Table 2 chr2_47641559_CAGGT_27[A _gggtt information
The length distribution is shown in Table 3 and FIG. 1.
Table 3 chr2_47641559_CAGGT_27[A-GGGTT Length distribution
In this example, the MSI state is determined using five features, i.e., the number of peaks N, the skewness Skew, the Kurt, the standard deviation S, and the standard deviation P, of the length of the repetitive base sequence in the target region.
For a specific repetitive base sequence, the following was calculated.
Peak count calculation:
and counting the sequencing depth of the target area, and counting the number of peaks of the length of the obtained microsatellite loci according to a peak searching algorithm. Specifically, the method is realized by Scipy, the Peak Prominance calibration Peak number is calculated, and noise is removed.
In this example, the number of peaks of chr2_47641559_caggt_27[ a ] _gggtt is 1.
Reuse of the stable length of the known microsatellite lociThe theoretical maximum length of the microsatellite loci is 2 +.>I.e. 54, thus chr2_47641559_caggt_27[ a ]]Number of length classes +.>=54,/>=[1,54]。
The deflection calculation formula is:
the kurtosis calculation formula is:
the standard deviation calculation formula is:
the standard offset calculation formula is:
wherein,is a microsatellite locus length +.>Number of reads at time; />Is the number average value of reads with different lengths of microsatellite loci. />Is microsatellite locus length/>Normalized value of the number of reads at time, < ->;/>Is->Is a mean value of (c).
According to the above formula, the skewness, kurtosis, standard deviation and standard deviation of the chr2_47641559_caggt_27[ a ] _gggtt length are respectively: 2.80, 9.87, 0.05 and 19.82.
Example 4 establishing a logistic regression model to predict the stability of microsatellite loci
In this embodiment, a model is built using 6 microsatellite loci with known stability data to predict the stability of microsatellite loci with unknown stability data. The microsatellite loci of the 6 known stability data are shown in Table 4.
Table 4 6 known microsatellite loci
In order to establish a microsatellite locus stability model, MSI prediction results are verified and the performances of the MSI prediction results are evaluated, the inventor performs MSI-PCR capillary electrophoresis detection on a sample, the result is used as a gold standard, and the consistency of the results between the MSI prediction results and the MSI capillary electrophoresis detection is compared. The kit used for MSI-PCR capillary electrophoresis detection is a tung tree microsatellite instability (MSI) detection kit (multiple fluorescence PCR-capillary electrophoresis method), and the state of each microsatellite locus in each sample and the MSI total state of the sample are obtained according to MSI-PCR capillary electrophoresis detection.
Specifically, each microsatellite locus is divided into two major categories of microsatellite locus stability and microsatellite locus instability according to MSI-PCR capillary electrophoresis results, and if a certain microsatellite locus changes, namely the length of the microsatellite locus obviously changes due to insertion or deletion of a repeating unit, the microsatellite locus is unstable; in contrast, if a microsatellite locus is unchanged, i.e., there is no change in the length of the microsatellite locus due to insertion or deletion of a repeat unit or if the length change is insignificant even if insertion or deletion of a repeat unit is present, the labeled microsatellite locus is stable.
As shown in FIG. 2, the MSI-PCR capillary electrophoresis detection result of BAT25 site (chr 4 55598211) shows that the site is homozygous, and a group of displacement peaks (shown by a dotted line in the figure) are added in the tumor sample in comparison with the paired sample, so that the site is in an unstable state of the microsatellite site in the tumor sample, and in a stable state of the microsatellite site in the paired sample (blood).
Further, according to the stability degree of the microsatellite loci, the MSI total state of the sample is divided into three states of MSI-H, MSI-L, MSS, which correspond to the three states of microsatellite high instability, microsatellite low instability and microsatellite stability respectively.
MMS: all microsatellite loci are stable.
MSI-L: microsatellite loci of < 30% are unstable.
MSI-H: more than or equal to 30 percent of microsatellite loci are in an unstable state.
In this example, MSI-PCR capillary electrophoresis data for 800 samples were obtained, including stability data for 6 microsatellite loci and MSI overall status data for 800 samples.
MSI-PCR capillary electrophoresis data of 6 microsatellite locus stability of the 800 samples are processed, then samples of which the NGS results do not accord with quality control are removed, 3780 microsatellite locus stability data are obtained in total, and the samples are randomly and hierarchically sampled and divided into a training set and a testing set according to the proportion of 7:3. For the training set, based on the Msisensor2 result file obtained in the embodiment 2, calculating the lengths of 6 microsatellite loci, obtaining the peak number N, kurt, shew, standard deviation S and standard deviation P of the lengths of the microsatellite loci by using the method in the embodiment 3, generating training set data, recording the unstable microsatellite locus sample as 1 and the stable microsatellite locus sample as 0, and forming the training set 1. The same procedure resulted in test set 1. Training the training set 1 data based on the logistic regression model to obtain the logistic regression model of the training set 1.
Based on the logistic regression model obtained by the training set 1, the stability of the microsatellite loci of the testing set 1 is predicted, the accuracy of the stability of the microsatellite loci of the testing set is estimated according to the MSI-PCR capillary electrophoresis result, then model correction is carried out, noise is removed, and model parameters and a solving algorithm are modified. Finally, the regularization parameter of the appointed model is l2, the maximum iteration number is 5000, and the classification type is two classifications. Because the unstable samples of the microsatellite loci in the actual training set are far less than the stable samples (15:1), the penalty mode of the loss function is balance, and the L-BFGS algorithm is utilized to solve the problem, so that a preferred logistic regression model1 is obtained, and the performance evaluation is shown in figure 3.
Based on the logistic regression model1 obtained in training set 1, the stability of all 22 selected microsatellite loci was calculated to obtain a microsatellite locus dataset as shown in Table 5.
TABLE 5 22 microsatellite loci
Example 5 establishment of random forest model prediction sample MSI population State
And randomly layering and sampling the 22 microsatellite locus data sets according to a ratio of 7:3 to divide the microsatellite locus data sets into a training set and a testing set, and obtaining a training set 2 and a testing set 2. Training the training set 2 data based on a random forest model, and designating a penalty mode of a loss function as a balance to obtain the random forest model of the training set 2.
Based on the random forest model obtained by the training set 2, predicting the MSI overall state of the testing set 2, and evaluating the accuracy of the MSI overall state of the testing sample according to the MSI-PCR capillary electrophoresis result. The above steps are repeated until the preferred random forest model2 is obtained, and the performance evaluation result is shown in fig. 4.
Example 6 application of predictive model
143 samples are obtained for performing NGS sequencing data analysis and calculation to obtain an external data set. Microsatellite locus stability and overall MSI status of the external dataset were predicted using logistic regression model1 and random forest model2 and validated using MSI overall status data obtained by MSI-PCR capillary electrophoresis according to example 4, the results are shown in Table 6.
TABLE 6 143 external sample validation results
From this, the sensitivity of the predictive model of the application was 1 and the specificity was 0.92.
MSI-L and MSS are generally classified into one type in actual clinical medication, namely MSI-L/MSS. The resulting model ROC curve is shown in fig. 5, with AUC of 0.99.
Example 7 System for determining MSI Overall State of sample under test
The system for determining the MSI overall state of the sample to be tested according to the embodiment is established based on the above method, as shown in fig. 6, and includes:
(1) the sequencing data input module is used for inputting capturing sequencing data of a target area containing 22 microsatellite loci in a sample to be tested and obtaining the peak number, kurtosis, skewness, standard deviation and standard deviation of the length of the 22 microsatellite loci.
And (3) performing quality control on the original data obtained by capturing the sequencing data by using quality control software picard, filtering a sequencing joint, low-quality bases, sequencing error fragments and the like, and filtering to obtain high-quality data (clean data).
Comparing and analyzing the clean data by using sequence comparison software bwa-MEM to obtain genome specific position information (reference genome is hg 19) of the sequence; and (5) sequencing and deduplication are carried out by using samtools and sambamba software, so that a sample original bam file is obtained.
And inputting the original sample bam file into software Msisensor2 for analysis to obtain the result of the length of the target region repeated base sequence (microsatellite locus).
For a certain microsatellite locus, firstly counting the peak number of the microsatellite locus length according to a peak searching algorithm, and respectively calculating the skewness, kurtosis, standard deviation and standard deviation of the attitude microsatellite locus length by using the following formulas:
the deflection calculation formula is:
/>
the kurtosis calculation formula is:
the standard deviation calculation formula is:
the standard offset calculation formula is:
wherein,for the microsatellite loci length is +.>Number of reads at time,/->=[1,/>];/>The average value of the numbers of reads with different lengths of the microsatellite loci is obtained; />Is a microsatellite locus length +.>Normalized value of the number of reads at time, < ->;/>Is->Is the average value of (2); n is the length class number of the microsatellite loci, ">=2/>;/>Is the standard value of the microsatellite loci length.
(2) The microsatellite locus storage module is used for storing the peak number, kurtosis, skewness, standard deviation and stability data of the lengths of 6 microsatellite loci of the first population sample;
(3) the marked microsatellite locus stability prediction module is respectively connected with the sequencing data input module, the microsatellite locus storage module and the marked microsatellite locus data input module, and is used for constructing 22 microsatellite locus stability prediction models by using the peak number, kurtosis, skewness, standard deviation and standard deviation of the lengths of the 6 microsatellite loci of the first population sample and the stability data, predicting the stability of 22 microsatellite loci based on the peak number, kurtosis, skewness, standard deviation and standard deviation of the lengths of the 22 microsatellite loci obtained from the sequencing data input module, and outputting the stability data of the 22 microsatellite loci to the marked microsatellite locus data input module.
The construction of a marker microsatellite locus stability prediction model using the peak number, kurtosis, skewness, standard deviation and standard deviation of known microsatellite locus lengths and stability data comprises the steps of:
s11, randomly dividing the peak number, kurtosis, skewness, standard deviation and stability data of the known microsatellite locus length of a first population sample into two groups, wherein one group is a first training set and the other group is a first test set;
s12, constructing a marker microsatellite locus stability prediction model based on a logistic regression algorithm by using the first training set data;
and S13, in the first test set, verifying the obtained marked microsatellite locus stability prediction model.
(4) The marked microsatellite locus data input module is used for receiving stability data of 22 microsatellite loci of a sample to be detected, wherein the stability comprises stability and instability;
(5) an MSI global state storage module for storing stability data and MSI global state data of the tagged microsatellite loci of the second population sample, the MSI global state comprising MSI-H, MSI-L and MSS;
(6) the MSI overall state determining module is respectively connected with the data input module and the MSI overall state storage module and is used for constructing a prediction model by utilizing the stability data of the marked microsatellite loci of the second population samples and determining the MSI overall state of the sample to be detected based on the stability data of the marked microsatellite loci of the sample to be detected obtained from the marked microsatellite locus stability prediction module.
The method for constructing the prediction model by using the stability data of the marked microsatellite loci of the population samples comprises the following steps:
s21, randomly dividing the stability data of the marked microsatellite loci of the second population samples into two groups, wherein one group is a second training set, the other group is a second testing set, and each group comprises the stability data of the marked microsatellite loci of the MSI-H samples, the MSI-L samples and the MSS samples;
s22, constructing an MSI overall state prediction model based on a random forest algorithm by using the second training set data;
s23, in the second test set, verifying the obtained MSI overall state prediction model.
All documents mentioned in this disclosure are incorporated by reference in this disclosure as if each were individually incorporated by reference. Further, it will be appreciated that various changes and modifications may be made by those skilled in the art after reading the above teachings, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Claims (10)
1. A tagged microsatellite loci for determining the overall status of MSI of a sample under test, said tagged microsatellite loci comprising:
。
2. use of a detection reagent for labeling microsatellite loci according to claim 1 for preparing a kit for determining the overall status of MSI of a sample to be tested.
3. A system for determining the MSI population status of a sample to be tested, comprising the following modules:
a tagged microsatellite loci data input module for receiving stability data of the tagged microsatellite loci of claim 1 of a sample to be tested, said stability including stability and instability;
an MSI global state storage module for storing stability data and MSI global state data for the tagged microsatellite loci of a second population sample, the MSI global state comprising MSI-H, MSI-L and MSS;
the MSI overall state determining module is respectively connected with the data input module and the MSI overall state storage module and is used for constructing a prediction model by utilizing the stability data of the marked microsatellite loci of the second population samples and determining the MSI overall state of the sample to be detected based on the stability data of the marked microsatellite loci of the sample to be detected obtained from the marked microsatellite locus stability prediction module.
4. A system according to claim 3, wherein in the MSI global state determination module, the constructing a predictive model using stability data of the tagged microsatellite loci of the population sample comprises the steps of:
s21, randomly dividing the stability data of the marked microsatellite loci of the second population sample into two groups, wherein one group is a second training set, the other group is a second testing set, and each group comprises the stability data of the marked microsatellite loci of an MSI-H sample, an MSI-L sample and an MSS sample;
s22, constructing an MSI overall state prediction model based on a machine learning algorithm by using the second training set data;
s23, in the second test set, verifying the obtained prediction model.
5. The system of claim 4, wherein the machine learning algorithm is selected from any one of the following algorithms:
random forest algorithm, neural network algorithm, support vector machine algorithm, bayesian classification algorithm, gradient lifting algorithm, K neighbor algorithm and decision tree algorithm.
6. The system of claim 3, wherein the stability data for the tagged microsatellite loci obtained in the tagged microsatellite locus data entry module is obtained by a PCR-based method.
7. A system according to claim 3, wherein the system further comprises:
the sequencing data input module is used for inputting capturing sequencing data of a target area of the marked microsatellite loci of the sample to be tested and obtaining the peak number, kurtosis, skewness, standard deviation and standard deviation of the marked microsatellite guard point length;
the microsatellite locus storage module is used for storing the peak number, kurtosis, skewness, standard deviation and standard deviation of the length of the microsatellite locus of the first group sample and stability data; the known microsatellite loci include:
a tagged microsatellite locus stability prediction module, coupled to the sequencing data input module, the tagged microsatellite locus storage module, and the tagged microsatellite locus data input module, respectively, for constructing a tagged microsatellite locus stability prediction model using the peak count, kurtosis, skewness, standard deviation, and standard deviation of the first population sample known microsatellite locus length and stability data, and predicting stability of the tagged microsatellite locus based on the peak count, kurtosis, skewness, standard deviation, and standard deviation of the tagged microsatellite locus length obtained from the sequencing data input module, and for outputting stability data of the tagged microsatellite locus to the tagged microsatellite locus data input module.
8. The system of claim 7, wherein in the tagged microsatellite locus stability prediction module, the constructing a tagged microsatellite locus stability prediction model comprises the steps of:
s11, randomly dividing the peak number, kurtosis, skewness, standard deviation and stability data of the known microsatellite locus length of a first population sample into two groups, wherein one group is a first training set and the other group is a first test set;
s12, constructing a marker microsatellite locus stability prediction model based on a regression algorithm by using the first training set data;
and S13, in the first test set, verifying the obtained marked microsatellite locus stability prediction model.
9. The system of claim 7 or 8, wherein for a particular tagged microsatellite locus:
counting the peak number of the length of the marked microsatellite loci according to a peak searching algorithm, and respectively calculating the skewness, kurtosis, standard deviation and standard deviation of the length of the marked microsatellite loci by using the following formulas:
the deflection calculation formula is:
the kurtosis calculation formula is:
the standard deviation calculation formula is:
the standard offset calculation formula is:
wherein,the length of the microsatellite loci for the marker is +.>Number of reads at time,/->=[1,/>];/>The average value of the numbers of reads with different lengths of the marked microsatellite loci is obtained; />Is a microsatellite locus length +.>Normalized value of the number of reads at time, < ->;/>Is->Is the average value of (2); />For the length class number of the tagged microsatellite loci, -/-, for the tag microsatellite loci>=2/>;/>Is the standard value of the length of the marked microsatellite loci.
10. A kit for determining the overall state of MSI of a sample to be tested comprising the detection reagent for labelling microsatellite loci according to claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311224358.8A CN117198399B (en) | 2023-09-21 | 2023-09-21 | Microsatellite locus, system and kit for predicting MSI state |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311224358.8A CN117198399B (en) | 2023-09-21 | 2023-09-21 | Microsatellite locus, system and kit for predicting MSI state |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117198399A true CN117198399A (en) | 2023-12-08 |
CN117198399B CN117198399B (en) | 2024-07-19 |
Family
ID=88997703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311224358.8A Active CN117198399B (en) | 2023-09-21 | 2023-09-21 | Microsatellite locus, system and kit for predicting MSI state |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117198399B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111304303A (en) * | 2020-02-18 | 2020-06-19 | 福建和瑞基因科技有限公司 | Method for predicting instability of microsatellite and application thereof |
US20200202978A1 (en) * | 2017-09-06 | 2020-06-25 | Geneseeq Technology Inc. | Sequencing data analysis method, device and computer-readable medium for microsatellite instability |
CN112442540A (en) * | 2021-01-27 | 2021-03-05 | 上海仁东医学检验所有限公司 | Microsatellite instability detection method, marker combination, kit and application |
US20210098078A1 (en) * | 2019-08-01 | 2021-04-01 | Tempus Labs, Inc. | Methods and systems for detecting microsatellite instability of a cancer in a liquid biopsy assay |
CN114464257A (en) * | 2022-03-15 | 2022-05-10 | 郑州安图生物工程股份有限公司 | Microsatellite instability detection method and device based on next-generation sequencing |
CN115132327A (en) * | 2022-05-25 | 2022-09-30 | 中国医学科学院肿瘤医院 | Microsatellite instability prediction system, construction method thereof, terminal equipment and medium |
CN115595371A (en) * | 2022-12-07 | 2023-01-13 | 元码基因科技(北京)股份有限公司(Cn) | Method for determining colorectal cancer patient MSI state through single-sample detection based on secondary sequencing platform and application |
-
2023
- 2023-09-21 CN CN202311224358.8A patent/CN117198399B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200202978A1 (en) * | 2017-09-06 | 2020-06-25 | Geneseeq Technology Inc. | Sequencing data analysis method, device and computer-readable medium for microsatellite instability |
US20210098078A1 (en) * | 2019-08-01 | 2021-04-01 | Tempus Labs, Inc. | Methods and systems for detecting microsatellite instability of a cancer in a liquid biopsy assay |
CN111304303A (en) * | 2020-02-18 | 2020-06-19 | 福建和瑞基因科技有限公司 | Method for predicting instability of microsatellite and application thereof |
CN112442540A (en) * | 2021-01-27 | 2021-03-05 | 上海仁东医学检验所有限公司 | Microsatellite instability detection method, marker combination, kit and application |
CN114464257A (en) * | 2022-03-15 | 2022-05-10 | 郑州安图生物工程股份有限公司 | Microsatellite instability detection method and device based on next-generation sequencing |
CN115132327A (en) * | 2022-05-25 | 2022-09-30 | 中国医学科学院肿瘤医院 | Microsatellite instability prediction system, construction method thereof, terminal equipment and medium |
CN115595371A (en) * | 2022-12-07 | 2023-01-13 | 元码基因科技(北京)股份有限公司(Cn) | Method for determining colorectal cancer patient MSI state through single-sample detection based on secondary sequencing platform and application |
Non-Patent Citations (2)
Title |
---|
JIANG ZHENXING ET AL.: "Development and Interpretation of a Clinicopathological-Based Model for the Identification of Microsatellite Instability in Colorectal Cancer", DISEASE MARKERS, vol. 2023, 18 February 2023 (2023-02-18), pages 1 - 12 * |
王琦, 李雪, 张云艳, 高慧, 王哲, 傅松滨: "应用荧光标记多重PCR对原发性胃癌进行微卫星不稳定性检测", 遗传学报, no. 03, 10 March 2004 (2004-03-10), pages 241 - 245 * |
Also Published As
Publication number | Publication date |
---|---|
CN117198399B (en) | 2024-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109887548B (en) | ctDNA ratio detection method and detection device based on capture sequencing | |
CN109207594B (en) | Method for detecting microsatellite stability state and genome change through plasma based on next generation sequencing | |
CN111304303B (en) | Method for predicting microsatellite instability and application thereof | |
CN106755501B (en) | Method for simultaneously detecting microsatellite locus stability and genome change based on next-generation sequencing | |
CN109767810B (en) | High-throughput sequencing data analysis method and device | |
KR102667912B1 (en) | Systems and methods for determining microsatellite instability | |
CN110808081B (en) | Model construction method for identifying tumor purity sample and application | |
CN110910957A (en) | Single-tumor-sample-based high-throughput sequencing microsatellite instability detection site screening method | |
Talhouk et al. | Single-patient molecular testing with NanoString nCounter data using a reference-based strategy for batch effect correction | |
CN113096728B (en) | Method, device, storage medium and equipment for detecting tiny residual focus | |
CN107849613A (en) | Method for lung cancer parting | |
CN114317728B (en) | Primer group, kit, method and system for detecting multiple mutations in SMA | |
CN113278706B (en) | Method for distinguishing somatic mutation from germline mutation | |
WO2024140368A1 (en) | Sample cross contamination detection method and device | |
CN112725446A (en) | Microsatellite locus marker and application thereof | |
CN117198399B (en) | Microsatellite locus, system and kit for predicting MSI state | |
WO2019132010A1 (en) | Method, apparatus and program for estimating base type in base sequence | |
CN104212884B (en) | Pancreatic neuroendocrine tumor susceptible gene locus, detection method and kit | |
Roy et al. | NGS-μsat: bioinformatics framework supporting high throughput microsatellite genotyping from next generation sequencing platforms | |
CN116312779A (en) | Method and apparatus for detecting sample contamination and identifying sample mismatch | |
CN114300089B (en) | Decision algorithm for middle and late colorectal cancer treatment scheme | |
CN114420214A (en) | Quality evaluation method and screening method of nucleic acid sequencing data | |
CN114400045A (en) | Method, probe set, kit and system for detecting homologous recombination repair defects based on second-generation sequencing | |
KR101977976B1 (en) | Method for increasing read data analysis accuracy in amplicon based NGS by using primer remover | |
US20220380841A1 (en) | Methods and Kits using Internal Standards to Control for Complexity of Next Generation Sequencing(NGS) Libraries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |