CN116083425B

CN116083425B - Primer combination, kit and library construction method for detecting endometrial cancer

Info

Publication number: CN116083425B
Application number: CN202211415746.XA
Authority: CN
Inventors: 姚小明; 张雅妮; 徐世美
Original assignee: Shenzhen Kairuisi Medical Technology Co ltd
Current assignee: Zhejiang Kairuisi Medical Technology Co ltd
Priority date: 2022-11-11
Filing date: 2022-11-11
Publication date: 2023-09-29
Anticipated expiration: 2042-11-11
Also published as: CN116083425A

Abstract

Primer combinations, kits and library construction methods for detecting endometrial cancer, wherein the primer combinations are used for targeted amplification of sites in at least one of the following genes: PTEN, TP53, PIK3CA, PIK3R1, KRAS, CTNNB1, FGFR2, RNF43, tele, PPP2R1A, FBXW, AKT1, APC, BRAF, CDKN2A, EGFR, NRAS, MAPK1, GNAS, HRAS. The application can detect the gene mutation sites in a plurality of areas of 20 endometrial cancer oncogenes at the same time and directly reflect the specific mutation sites of related genes.

Description

Primer combination, kit and library construction method for detecting endometrial cancer

Technical Field

The application relates to the field of gene detection, in particular to a primer combination, a kit and a library construction method for detecting endometrial cancer.

Background

Endometrial cancer is an epithelial malignancy that occurs in the endometrium, also known as endometrial cancer, is one of the three common malignant tumors of the female genital tract, the sixth most common cancer in women, and is commonly found in perimenopausal and postmenopausal women. With the increase in average life span of the population and the change in lifestyle, the incidence of endometrial cancer has tended to continue to rise and younger for nearly 20 years. The global number of endometrial cancer cases exceeds 41.7 ten thousand cases in 2020. Endometrial cancer has been the leading incidence of female reproductive malignant tumors in developed countries. In China, endometrial cancer is the second most common gynecological malignant tumor after cervical cancer, and accounts for about 20% -30% of gynecological malignant tumors. The incidence of endometrial cancer in partially developed cities has reached the first place of gynaecological malignancy. Over the past few decades, mortality rates have increased due to endometrial cancer, which often requires surgical removal of the uterus, even the ovaries and fallopian tubes, which means that they lose fertility and, at the same time, estrogen loss accelerates aging in young patients.

The exact cause of endometrial cancer is not known. Overweight or obesity can greatly increase the chances of a woman suffering from endometrial cancer. Other risk factors include age, family history, diagnosis of polycystic ovary syndrome, and previous use of breast cancer treatment drug tamoxifen. About 80% of endometrial cancers are adenocarcinoma, meaning that the cancer occurs in cells that develop in the endometrial glands. It is currently accepted by the scientific community that all cancers are thought to be caused by somatic mutations (mutations in any cell in the body other than germ cells), although so far little is known to scientists about the mutation process involved. Cancer is essentially a genetic disease, all cancers are derived from genetic mutation, but not all genetic mutation is derived from the inheritance of parents, and many environmental factors such as virus infection, chemicals, rays and the like can cause genetic mutation, thereby causing cancer to occur. Studies have shown that there are a variety of genetic mutations in endometrial cancer, driving mutations have been found in many different genes, including ERBB2, ERBB3, FGFR2, HRA, KRAS, BRAF, PIK3CA, PIK3R1, ARHGAP35, RRAS2, NF1, PPP2R1A, PTEN, ZFHX3, FOXA2, KMT2D, ARID B, and FBXW7, and the like, and there have been a number of studies reporting that genetic mutation detection is likely to be used in screening and diagnosis of endometrial cancer. Endometrial cancer has different clinical stages and obviously different prognosis. Early detection of endometrial cancer is highly curable, and timely prevention and early detection of endometrial cancer are more needed. There is currently a lack of reliable and effective methods in clinic for screening for endometrial cancer in an average population of people. The sensitivity and specificity of the existing clinical screening method for endometrial cancer are not ideal, and the specific conditions are as follows: 1) The doctor's operation belongs to invasive operation in uterine cavity, and may cause pain, hemorrhage, perforation and other complications, and the doctor's operation is not high in material sampling satisfaction and is easy to cause missed diagnosis for lesions with small volume or located at uterine corners. 2) Pipeller biopsy, which only evaluates part of the intimal tissue, is difficult and expensive to accurately evaluate for endometrial lesions at the uterine horn. 3) Ultrasonic examination and MRI can only know the size of uterus, the thickness of endometrium, whether echo is uneven or whether neoplasm exists in uterine cavity, whether myometrium infiltration exists or not and the degree of myometrium infiltration; is a preliminary screening method; benign and malignant diseases cannot be distinguished, and early endometrial cancer lesions cannot be effectively detected. 5) Hysteroscopy may result in iatrogenic spreading, because of hydraulic problems, which may carry cancer cells from the uterus to the abdominal cavity. 6) Tumor marker examination is low in accuracy and lacks specific tumor markers for endometrial cancer. In addition, single pathological typing of endometrial cancer cannot meet clinical needs, and mutation conditions of cancer tissues need to be clarified from gene levels, so that a molecular basis is provided for clinical use and curative effect prediction of targeted drugs. In view of this, there is an increasing clinical demand for accurate detection of endometrial cancer gene mutations, and it is necessary to develop a method for detecting endometrial cancer-related polygenic mutations.

At present, common methods for detecting gene mutation include a fluorescent quantitative PCR method, a gene chip method, a Sanger method, a high throughput sequencing method and the like. The fluorescent quantitative PCR method and the gene chip method can only detect the site of one or a plurality of known mutations, and the Sanger sequencing method can only sequence a certain section of region of one sample at a time, and the common defects of detection are that the fluorescent quantitative PCR method and the gene chip method can not be used for detecting the site of the unknown mutation, the detection efficiency is low, the result is difficult to interpret, the repeatability is poor, the false positive and the false negative are more, and the like.

Disclosure of Invention

According to a first aspect, in one embodiment there is provided a primer combination for targeted amplification of a site in at least one of the following genes: PTEN, TP53, PIK3CA, PIK3R1, KRAS, CTNNB1, FGFR2, RNF43, tele, PPP2R1A, FBXW, AKT1, APC, BRAF, CDKN2A, EGFR, NRAS, MAPK1, GNAS, HRAS.

According to a second aspect, in one embodiment there is provided a kit comprising the primer combination of the first aspect.

According to a third aspect, there is provided in one embodiment a library construction method comprising:

a linker ligation step comprising ligating a linker with a molecular tag to a nucleic acid sample to obtain a linker-ligated sample;

a first round of nested PCR amplification step comprising amplifying the adaptor-ligated sample using an outer primer to obtain a first round of amplification product;

and a second round of amplification step of nested PCR, wherein the second round of amplification step comprises the step of amplifying the sample connected with the connector by using an inner primer to obtain a second round of amplification product.

According to the primer combination, the kit and the library construction method for detecting endometrial cancer, which are disclosed by the embodiment, the mutation sites of genes in a plurality of areas of 20 endometrial cancer oncogenes can be detected at the same time, and specific mutation sites of related genes can be directly reflected.

In one embodiment, the molecular tag nested multiplex PCR adopted by the application can effectively eliminate false positive caused in the PCR process, improve the detection limit of a low-frequency mutation detection sample, and can detect 0.1% of low-frequency variation. Can efficiently and sensitively detect the mutation condition of the endometrial cancer oncogene, and simultaneously reduces the cost and simplifies the operation steps. Can be used for endometrial cancer screening, early diagnosis, auxiliary diagnosis, cancer after-cure monitoring, targeted drug screening and the like.

Drawings

FIG. 1 is a flow chart of library construction in one embodiment.

Detailed Description

The application will be described in further detail below with reference to the drawings by means of specific embodiments. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted in various situations, or replaced by other materials, methods. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning.

The high-flux sequencing technology NGS is an international advanced novel sequencing technology, can perform sequence determination on hundreds of thousands to millions of DNA molecules at a time in parallel, and greatly improves the sequencing flux. Meanwhile, along with the rapid development of bioinformatics, through multi-gene and multi-site integrated detection and analysis, gene sites closely related to tumors can be detected and analyzed at one time, and the gene mutation information of patients can be detected.

The main flow high throughput sequencing technologies are mainly two kinds: one is large and whole genome-wide/transcriptome-wide pool sequencing, the other is small and refined targeted pool sequencing. Although the whole genome and whole exon sequencing methods can meet clinical needs in terms of throughput, they cannot be popularized in clinical applications due to high price. Therefore, the targeted library-building sequencing plays a positive role in clinical auxiliary diagnosis and treatment of diseases and tumor gene detection by virtue of the advantages of low cost, high data utilization rate, flexible combination and the like.

The library construction method in the targeted library construction sequencing technology is further divided into: the hybridization capturing method is used for establishing a library and the multiplex PCR method is used for establishing a library. The targeted multiplex PCR library establishment does not need to interrupt sample DNA, and target sequences are amplified simultaneously only by adding various primers, so that the library establishment flow is reduced, and the library establishment time is greatly shortened. Multiplex PCR is a PCR technique in which multiple pairs of primers are added to one reaction system, and different gene fragments of the same DNA sample are amplified simultaneously. By analyzing the sequences of the different gene fragments, whether the gene has deletion, insertion or point mutation is judged.

According to a first aspect, in an embodiment, a primer combination is provided for targeted amplification of a site in at least one of the following genes: PTEN, TP53, PIK3CA, PIK3R1, KRAS, CTNNB1, FGFR2, RNF43, tele, PPP2R1A, FBXW, AKT1, APC, BRAF, CDKN2A, EGFR, NRAS, MAPK1, GNAS, HRAS.

In one embodiment, the primer set comprises an outer primer, an inner primer.

In one embodiment, the outer primer comprises at least one of the nucleotide sequences shown in SEQ ID Nos. 1 to 308, the sequence number being at least one of the sequences 2n, n being an integer not less than 0.

In one embodiment, the inner primer comprises at least one of the nucleotide sequences shown in SEQ ID Nos. 1 to 308, wherein the sequence number is 2n+1, and n is an integer not less than 0.

In one embodiment, the primer combination is used to detect endometrial cancer.

According to a second aspect, in an embodiment, a kit is provided comprising the primer combination of the first aspect.

In one embodiment, at least one of a molecular tagged linker, library pretreatment reagent, targeted amplification reagent, library amplification reagent, I5 tag, I7 tag, purification reagent is also included.

In one embodiment, the library pretreatment reagent comprises at least one of a terminal addition "A" reagent, a linker ligation reagent.

In one embodiment, the purification reagent comprises magnetic beads, including but not limited to IGT ^TM Pure loads or Agencourt AMPure XPMagnetic beads.

According to a third aspect, in an embodiment, there is provided a library construction method comprising:

and a linker ligation step comprising ligating a molecular tagged linker to the nucleic acid sample to obtain a linker-ligated sample.

And a first round of amplification step of nested PCR, wherein the first round of amplification step comprises the step of amplifying the sample connected with the connector by using an outer primer to obtain a first round of amplification product.

In one embodiment, the first amplification step of the nested PCR comprises a primer that is complementary to at least a portion of the sequence-specific reverse complement of the molecular tagged linker.

In one embodiment, in the first amplification step of the nested PCR, the reaction system further comprises a targeted amplification reagent.

In one embodiment, the second amplification step of the nested PCR comprises a primer that is complementary to at least a portion of the sequence-specific reverse complement of the molecular tagged linker.

In one embodiment, the method further comprises a purification step comprising subjecting the second round amplification product to magnetic bead purification to obtain a purified product.

In one embodiment, the method further comprises a library amplification step comprising amplifying the purified product to obtain a library amplified product.

In one embodiment, in the library amplification step, the reaction system contains at least one of library amplification reagent, I5 tag, I7 tag.

In one embodiment, a further purification step is included, comprising magnetic bead purification of the library amplification products to obtain a library useful for on-press sequencing.

In one embodiment, in the adaptor-ligation step, the nucleic acid sample is a sample after the end repair, addition of "A" reaction.

According to a fourth aspect, in an embodiment, there is provided a library prepared by the library construction method of the third aspect.

In one embodiment, the molecular tag UMI (Unique molecular identifer) is added in the multiplex PCR process to help correct errors generated in the amplification and sequencing processes, and for UMI labeled reads, the sequencing reads with the same UMI can be formed into a group in the data analysis process, and a final single-stranded consensus sequence (SSCS) can be obtained by correcting the data of the same group, removing the repeated sequence according to the UMI sequence and the position on the genome, and forming the single-stranded consensus sequence), so as to obtain a real mutation result, and reduce the background noise.

In one embodiment, the application provides a multiplex PCR specific primer, a kit and a detection method for detecting endometrial cancer polygene mutation based on a targeted high-throughput sequencing technology, which can detect a plurality of genes and a plurality of site mutations simultaneously, effectively improve detection efficiency and accuracy, reduce cost, simplify operation steps and the like.

In one embodiment, the application provides a PCR primer pair, a kit and a method for detecting endometrial cancer polygene mutation based on targeted high-throughput sequencing of molecular tag nested multiplex PCR. The adopted molecular tag nested multiplex PCR technology can effectively eliminate false positive brought in the PCR process, and improves the detection sensitivity and accuracy.

In one embodiment, the application provides a primer pair for detecting endometrial cancer polygene by high-throughput targeted sequencing based on molecular tag nested multiplex PCR, the nucleotide sequences of the 2 rounds of nested multiplex PCR primers are shown in SEQ ID No.1 to SEQ ID No.308, the initial position list of the amplified target fragment on the chromosome is shown in table 1, the nested PCR uses an outer primer for the first round of amplification, and the nested PCR uses an inner primer for the second round of amplification: the oncogene is PTEN, TP53, PIK3CA, PIK3R1, KRAS, CTNNB1, FGFR2, RNF43, POLE, PPP2R1A, FBXW7, AKT1, APC, BRAF, CDKN2A, EGFR, NRAS, MAPK1, GNAS, and HRAS. The mutation may be a substitution, insertion and/or deletion of one or more bases.

The outer primer is specifically a sequence with the sequence number of 2n in the nucleotide sequences shown in SEQ ID No. 1-308, and n is an integer more than or equal to 0. Such as the nucleotide sequence shown as SEQ ID No.2, SEQ ID No.4, SEQ ID No.6, SEQ ID No.8, such as SEQ ID No.156, SEQ ID No.158, SEQ ID No.160, SEQ ID No.162, etc

The upstream primer in the outer primer is the nucleotide sequence shown as SEQ ID No. 1-154, the sequence number is 2n, n is an integer which is more than or equal to 1, such as the nucleotide sequences shown as SEQ ID No.2, SEQ ID No.4, SEQ ID No.6, SEQ ID No.8 and the like.

The downstream primer in the outer primer is the nucleotide sequence shown as SEQ ID No. 155-308, the sequence number is 2n, n is an integer which is more than or equal to 1, such as the nucleotide sequences shown as SEQ ID No.156, SEQ ID No.158, SEQ ID No.160, SEQ ID No.162 and the like.

The inner primer is specifically a sequence with the sequence number of 2n+1 in the nucleotide sequences shown in SEQ ID No. 1-308, and n is an integer more than or equal to 0. Such as the nucleotide sequence shown as SEQ ID No.1, SEQ ID No.3, SEQ ID No.5, SEQ ID No.7, and further such as the nucleotide sequence shown as SEQ ID No.155, SEQ ID No.157, SEQ ID No.159, SEQ ID No.161, etc.

The upstream primer in the inner primer is the nucleotide sequence shown as SEQ ID No. 1-154, the sequence number is 2n+1, n is an integer more than or equal to 0, such as the nucleotide sequences shown as SEQ ID No.1, SEQ ID No.3, SEQ ID No.5, SEQ ID No.7, and the like.

The downstream primer in the inner primer is the nucleotide sequence shown as SEQ ID No. 155-308, the sequence number is 2n+1, n is an integer which is more than or equal to 0, such as the nucleotide sequences shown as SEQ ID No.155, SEQ ID No.157, SEQ ID No.159, SEQ ID No.161 and the like.

In one embodiment, the specific primers shown in SEQ ID No.1 to SEQ ID No.308 are specifically designed sequences for 20 endometrial cancer oncogenes, specifically, mutation site regions of 20 oncogenes such as PTEN, TP53, PIK3CA, PIK3R1, KRAS, CTNNB1, FGFR2, RNF43, POLE, PPP2R1A, FBXW7, AKT1, APC, BRAF, CDKN2A, EGFR, NRAS, MAPK1, GNAS, HRAS and the like can be amplified, and the amplified products are subjected to high-throughput sequencing to obtain sequence information, so that the mutation status of the genes in a sample can be detected. Because all the detection is of the target region fragment of the gene, in order to ensure the detection specificity of each region and increase the stability, 2 rounds of nest type multiplex amplification are adopted on the basis of the principle of the multiplex PCR amplification library construction method. The primers shown in SEQ ID No.1 to SEQ ID No.308 can be used for amplifying target sequences effectively in 1 tube simultaneously by using 2 rounds of nested multiplex PCR amplification technology, the target sequences are amplified effectively in 1 tube simultaneously by using the first round of multiplex PCR reaction, the target sequences are amplified effectively in 1 tube simultaneously by using the second round of multiplex PCR reaction, the target sequences are purified by using the PCR products, the PCR products are subjected to post-PCR reaction and re-purification, the obtained PCR products are used for preparing a sequencing library and performing high-throughput sequencing on an upper machine, and the constructed sequencing library is applicable to various high-throughput sequencing platforms such as Huada and Illumina after being detected to be qualified by using Qubit 3.0.

The primers are specifically as follows:

TABLE 1

/>

In one embodiment, it is within the scope of the present application to detect primer pair sequences of endometrial cancer polygenes and amplified regions of the target fragment on the chromosome based on molecular tagged nested multiplex PCR high throughput targeted sequencing.

In one embodiment, the application provides a kit for detecting endometrial cancer polygene by high-throughput targeted sequencing based on molecular tag nested multiplex PCR, which comprises 2 rounds of multiplex PCR primers (an outer primer is used for nested PCR first round amplification, an inner primer is used for nested PCR second round amplification), library pretreatment reagents, targeted amplification reagents, library amplification reagents, I5 tags, I7 tags and purified magnetic beads.

In one embodiment, the end-added "A" reagent comprises end-added "A" buffer, end-added "A" enzyme.

In one embodiment, the adaptor ligation reagent comprises a adaptor with UMI, a DNA ligase, and a DNA ligation buffer.

In one embodiment, the targeted amplification reagent comprises a high fidelity DNA polymerase, PCR buffer, dNTP mix.

In one embodiment, the kit further comprises a primer AnchorSeq that is specifically complementary to the UMI-equipped linker ^TM Anchor Primer (for Illumina), as well as outer primers used for nested PCR first round amplification, inner primers used for nested PCR second round amplification.

In one embodiment, the library amplification reagents include high fidelity DNA polymerase, PCR buffer, dNTP mixture, anchorSeq ^TM UDI Primer。

In one embodiment, the purification beads are used to purify the size range DNA of interest from the targeted amplification products and library amplification products.

Example 1

The library building flow is shown in fig. 1, and the specific steps are as follows:

1. the genomic DNA standard for known mutations was tested 4 times, 25ng (2.5 ng/. Mu.L) each, and after disruption of genomic DNA, end repair was performed using a commercial kit, 3' end plus "A".

Preparing a reaction system:

TABLE 2

Reagent(s)	Volume of
		Fragmented DNA	10μL
End Repair&A-Tailing Buffer	7μL
		End Repair&A-Tailing Enzyme Mix	3μL
Nuclease-Free Water	40μL
		Total volume of	60μL

The PCR instrument parameters were set as follows:

TABLE 3 Table 3

2. Will be Anchor seq ^TM UMI Adapter (15. Mu.M, for Illumina, ai Jitai well) was diluted 10-fold to 1.5. Mu.M/. Mu.L in advance using Nuclear-Free Water. The previous step was completed with the sample with "A" at the 3' end for UMI-attached ligation.

Preparing a reaction system:

TABLE 4 Table 4

Reagent(s)	Volume of
		Sample after completion of the reaction in step 1	60μL
AnchorSeq ^TM UMI Adapter diluent	5μL
		Nuclease-Free Water	10μL
Ligation Buffer	30μL
		DNA Ligase	5μL
Total volume of	110μL

The PCR instrument parameters were set as follows:

TABLE 5

3. Using IGT ^TM The Pure Beads purified the ligated DNA samples of step 2 to give 13. Mu.L of eluate.

4. First round amplification by nested PCR using an outer primer pair, adding the outer mixed primer pair, a targeted amplification reagent, and a primer AnchorSeq complementary to the specificity of the UMI-equipped adaptor to a reaction system ^TM Anchor Primer (for Illumina), the target region was amplified in a first round of nested PCR.

The outer primer pair is specifically a sequence with the sequence number of 2n in the nucleotide sequences shown in SEQ ID No. 1-308, and n is an integer more than or equal to 0.

The following reaction system is prepared:

TABLE 6

Reagent(s)	Volume of
		Step 2 sample after reaction	11μL
AnchorSeq ^TM PCR Master Mix	15μL
		AnchorSeq ^TM Anchored Primer	2μL
Outer mixed primer pair	2μL
		Total volume of	30μL

The PCR instrument parameters were set as follows:

TABLE 7

5. Using IGT ^TM The amplicon liquid of step 4 was purified by Pure Beads or Agencourt AMPure XP Beads to give 14 μl of eluate.

6. Performing nested PCR second round amplification by using the inner primer pair, adding the outer mixed primer pair, the targeted amplification reagent and a primer Anchor seq which is specifically complementary to the UMI-containing adaptor into a reaction system ^TM Anchor Primer (for Illumina), the target region was subjected to a second round of nested PCR amplification.

The inner primer pair is specifically a sequence with the sequence number of 2n+1 in the nucleotide sequences shown in SEQ ID No. 1-308, and n is an integer more than or equal to 0.

The following reaction system is prepared:

TABLE 8

Reagent(s)	Volume of
		Step 5 sample after completion of the reaction	11μL
AnchorSeq ^TM PCR Master Mix	15μL
		AnchorSeq ^TM Anchored Primer	2μL
Inner mixed primer pair	2μL
		Total volume of	30μL

The parameters of the PCR instrument are set as follows

TABLE 9

7. Using IGT ^TM Pure Beads purified the amplicon liquid of step 6 to give 15. Mu.L of eluate.

8. And (3) adding library amplification buffer solution, an I5 label, an I7 label and double distilled water into the amplicon liquid obtained in the step (7), and carrying out PCR reaction to enable sequencing adapter sequences to be carried on two sides of the amplicon.

The following reaction system is prepared:

table 10

Reagent(s)	Volume of
		Step 7 sample after the reaction	13μL
AnchorSeq ^TM PCR Master Mix	15μL
		AnchorSeq ^TM UDI Primer	2μL
Total volume of	30μL

The PCR instrument parameters were set as follows:

TABLE 11

9. Using IGT ^TM Pure Beads purified the amplicon liquid of step 8 to give 52. Mu.L of eluate. mu.L of the supernatant was pipetted into a new PCR tube and labeled for sequencing. 1 μl of library was taken and library concentration was sequenced using Qubit dsDNA HS Assay Kit and recorded. 1 μl of the library was used for fragment length measurement using a fragment analyzer.

10. After the detection is qualified, the Huada MGISEQ-2000 is subjected to machine sequencing.

The detection data of the known mutation standards show that the coverage (coverage rate,%) of each primer is 100%.

Three gene mutation sites are known (see Table 13).

The test results are shown in Table 12.

Table 12

Continuous table 12

Table 12 is the right data of Table 12.

The meanings of the indices in table 12 are as follows:

sequencing data statistics:

(1) Raw reads (M): the total sequence number of the original data is M;

(2) Raw bases (Mb): total base number of the original data, wherein the unit is Mb;

(3) UMI reads (M): extracting the total sequence number of the molecular tag, wherein the unit is M;

(4) UMI bases (Mb): extracting total base number of the molecular tag, wherein the unit is Mb;

(5) The polling rate (%): filtration ratio, 1-UMI bases (Mb)/Raw bases (Mb)

(6) Clear reads (M): extracting the total sequence number of the filtered data after molecular tag extraction, wherein the unit is Mb;

(7) Clear bases (Mb): extracting the total number of bases remained after filtering the data after molecular tag extraction, wherein the unit is Mb;

(8) QC rate (%): clear ratio, clear bases (Mb)/UMI bases (Mb)

Library quality statistics:

(1) Sample: sample names, commonly filled in company numbers;

(2) Average read length: sequencing average read length (average read length after quality control is slightly shorter than original data read length);

(3) Average base quality: sequencing base average homogeneity value, typically > =30;

(4) Average insert size: the average library construction length is more than 200 when the reading length is 150;

(5) Alignment rate: alignment rate, read/clean reads aligned to the reference genome

(6) Target size: the size of the target area to be captured is measured in base units;

(7) Target covered size: the size of the target area actually covered;

(8) Coverage rate (%): coverage rate, revealing whether the Target area covers good indicators, target covered size/Target size;

(9) Total effffective bases (Mb): total number of effective bases;

(10) Target effffective bases (Mb): total number of bases covered by the target region;

(11) Bases capture rate (%): base capture rate, revealing an index of the sample data utilization rate, target effffectivebases/Accurate mapped bases;

(12) Target average depth: average depth of Target region Target effffective bases/Target size;

(13) 1000X coverage rate (%): depth > = target area ratio of 1000X, 1000X coverage;

(14) 2000X coverage rate (%): depth > = target area ratio of 2000X, 2000X coverage;

(15) 5000X coverage rate (%): depth > = target area ratio of 5000X, 5000X coverage;

(16) 10000X coverage rate (%): depth > = target area ratio of 10000X, 10000X coverage;

(17) 5%X mean depth coverage rate (%): depth > = target area ratio of average depth 5%X (assuming average depth 20000 layers, ratio of 1000 layers or more);

(18) 10%X mean depth coverage rate (%): depth > = target area ratio of average depth 10% x;

(19) 20%X mean depth coverage rate (%): depth > = target area ratio of average depth 20% x;

(20) 30%X mean depth coverage rate (%): depth > = target area ratio of average depth 30% x;

(21) 40%X mean depth coverage rate (%): depth > = target area ratio of 40% x of average depth;

(22) 50%X mean depth coverage rate (%): depth > = target area ratio of average depth 50% x;

(23) Flank 10%X mean depth coverage rate (%): target area ratio of 10% depth on both sides of the average depth (assuming average depth 10000 layers, i.e., 9000-11000 layers);

(24) Average tag family size average tag family size, when the start and end positions of a fragment are identical to the tag, it is considered a family;

(25) SSCS effffective reads (M) number of valid reads before single ended correction;

(26) SSCS after reads (M), the number of valid reads after single ended correction;

(27) SSCS effffective rate (%) single ended correction ratio;

(28) SSCS Target average depth average depth of target area after single-ended correction;

(29) Again alignment rate (%) is the re-alignment rate after double-ended correction;

(30) DCS Target average depth average depth of target region after double-ended correction;

(31) Panel name: target area name.

The data for known mutation sites are as follows:

TABLE 13

/>

Example 2

Endometrial exfoliated cells were collected from three Hunan elegans, genomic DNA extraction was performed using the Tiangen blood/cell/tissue genome kit (accession number DP 304), genomic DNA concentration was measured by Quibt, and sample processing and on-machine detection were performed according to the procedure of example 1.

The test data of the samples are shown in Table 14.

TABLE 14

Continuous meter 14

/>

The following table 14 is the right data of table 14.

As can be seen, alignment rate refers to the read/clean reads aligned to the reference genome, and over 98% of the short sequences can be aligned to the upper genome, with Coverage (Coverage rate) of 100% revealing good Coverage of the target region, demonstrating the high specificity of multiplex PCR sequence enrichment. And the average sequencing depth of the target region reaches about 1 ten thousand X.

Example 3

This example illustrates at least 3 genes, and the specific locations on the chromosome of the gene fragments detected by the inner/outer primers of nested PCR.

To increase the accuracy of mutation detection, 2 pairs of nested PCR primers were designed based on the start site of the amplified fragment, of which 4 fragments are exemplified as follows.

TABLE 15

/>

Example 4

The application is optimized from methodology, endometrial cancer mutation covers the 9473 locus of 20 genes, 3 pairs of primer pairs are designed in each round of amplified gene fragment 2 rounds of nest type schemes, 3 pairing schemes are designed, F-terminal and R-terminal matching is performed in multiple PCR steps, and primers with coverage rate (coverage rate) reaching the optimal (100%) are selected and determined to be the optimal primer pairs. The following is an example of 3 gene embodiments for an optimization scheme.

Table 16

/>

In one embodiment, the application provides a high throughput sequencing sequence enrichment method that is efficient and reliable. The method has the advantages of high detection flux, strong specificity, wide sample sources, difficult pollution, high safety and the like for 9473 mutation sites of the endometrial cancer 20 gene, and the detection result has good repeatability and accuracy.

In one embodiment, the application provides a primer pair, a kit and a method for detecting endometrial cancer polygene mutation based on molecular tag nested multiplex PCR (polymerase chain reaction) high-throughput targeted sequencing. The application specifically discloses a multiplex PCR primer shown in SEQ ID No. 1-SEQ ID No.308, a kit containing the primer, and a method for detecting gene mutation sites by combining 2 rounds of nested PCR amplification reaction with the primer or the kit and targeted enrichment of endometrial cancer oncogene fragments.

In one embodiment, the method of the application is a highly efficient and reliable method that allows for the enrichment of mutant features. The multiplex PCR primer, the kit and the detection method can detect the gene mutation sites in a plurality of regions of 20 endometrial cancer oncogenes at the same time, directly reflect the specific mutation sites of related genes, effectively eliminate false positive caused in the PCR process by adopting the molecular tag nested multiplex PCR, improve the detection limit of a detection low-frequency mutation sample, and detect 0.1% of low-frequency mutation. Can efficiently and sensitively detect the mutation condition of the endometrial cancer oncogene, and simultaneously reduces the cost and simplifies the operation steps. Can be used for endometrial cancer screening, early diagnosis, auxiliary diagnosis, cancer after-cure monitoring, targeted drug screening and the like.

In one embodiment, the application provides a primer pair, a kit and a method for detecting endometrial cancer polygenes based on molecular tag nested multiplex PCR (polymerase chain reaction) high-throughput targeted sequencing.

The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims

1. A primer combination for targeted amplification of a site in a gene comprising: PTEN, TP53, PIK3CA, PIK3R1, KRAS, CTNNB1, FGFR2, RNF43, tele, PPP2R1A, FBXW, AKT1, APC, BRAF, CDKN2A, EGFR, NRAS, MAPK1, GNAS, HRAS;

the primer combination comprises an outer primer and an inner primer;

the outer primer is a sequence with the sequence number of 2n in the nucleotide sequence shown in SEQ ID No. 1-308, wherein the upstream primer is a sequence with the sequence number of 2n in the nucleotide sequence shown in SEQ ID No. 1-154, the downstream primer is a sequence with the sequence number of 2n in the nucleotide sequence shown in SEQ ID No. 155-308, and n is an integer more than or equal to 1;

the inner primer is a 2n+1 sequence in the nucleotide sequence shown in SEQ ID No. 1-308, wherein the upstream primer is a 2n+1 sequence in the nucleotide sequence shown in SEQ ID No. 1-154, the downstream primer is a 2n+1 sequence in the nucleotide sequence shown in SEQ ID No. 155-308, and n is an integer more than or equal to 0;

the outer primers are used for nested PCR first round amplification and the inner primers are used for nested PCR second round amplification.

2. The primer combination of claim 1 wherein the primer combination is used to detect endometrial cancer.

3. A kit comprising the primer combination of claim 1 or 2.

4. The kit of claim 3, further comprising at least one of a molecular tagged linker, library pretreatment reagent, targeted amplification reagent, library amplification reagent, I5 tag, I7 tag, purification reagent.

5. The kit of claim 4, wherein the library pretreatment reagent comprises at least one of a terminal addition "a" reagent, a linker ligation reagent.

6. The kit of claim 4, wherein the purification reagents comprise magnetic beads.

7. A method of library construction comprising:

a second round of nested PCR amplification step, comprising amplifying the sample with the adaptor connected by using an inner primer to obtain a second round of amplification product;

the inner primer is a 2n+1 sequence in the nucleotide sequence shown in SEQ ID No. 1-308, wherein the upstream primer is a 2n+1 sequence in the nucleotide sequence shown in SEQ ID No. 1-154, the downstream primer is a 2n+1 sequence in the nucleotide sequence shown in SEQ ID No. 155-308, and n is an integer not less than 0.

8. The method of library construction according to claim 7, wherein in the first round of amplification step of nested PCR, the reaction system comprises a primer that is complementary to at least a portion of the sequence-specific reverse complement of the molecular tagged linker.

9. The method of claim 8, wherein the reaction system further comprises a target amplification reagent in the first amplification step of nested PCR.

10. The method of library construction according to claim 8, wherein in the second round of nested PCR amplification step, the reaction system comprises a primer that is complementary to at least a portion of the sequence-specific reverse complement of the molecular tagged linker.

11. The method of claim 10, wherein in the second amplification step of nested PCR, the reaction system further comprises a targeted amplification reagent.

12. The library construction method of claim 7, further comprising a purification step comprising subjecting the second round of amplification products to magnetic bead purification to obtain purified products.

13. The method of claim 12, further comprising a library amplification step comprising amplifying the purified product to obtain a library amplification product.

14. The method of claim 13, wherein in the library amplification step, the reaction system contains at least one of a library amplification reagent, an I5 tag, and an I7 tag.

15. The method of library construction of claim 13, further comprising a re-purification step comprising bead purification of the library amplification product to obtain a library useful for on-machine sequencing.

16. The method of constructing a library according to claim 7, wherein in the step of ligating the adaptor, the nucleic acid sample is a sample subjected to an end repair and an "A" addition reaction.

17. A library prepared by the library construction method according to any one of claims 7 to 16.