CN113160889A

CN113160889A - Cancer noninvasive early screening method based on cfDNA omics characteristics

Info

Publication number: CN113160889A
Application number: CN202110118814.5A
Authority: CN
Inventors: 蓝勋; 季加孚; 布召德; 李�杰; 陈佳辉; 孙克用; 孙欣
Original assignee: Tsinghua University; Beijing Cancer Hospital
Current assignee: Renke Beijing Biotechnology Co ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-07-23
Anticipated expiration: 2041-01-28
Also published as: CN113160889B

Abstract

The invention relates to a cancer noninvasive early screening system based on cfDNA omics characteristics, which comprises a cfDNA omics characteristic model and a machine learning training model, wherein the cfDNA omics characteristic model is established; extracting cfDNA by blood collection; performing library construction and sequencing on the extracted cfDNA; and extracting cfDNA omics characteristics for comparison. The method comprehensively characterizes the cfDNA in the gastric cancer patients by combining the length distribution characteristics, the copy number variation density distribution characteristics and the openness characteristics around the cfDNA promoter and by a low-depth genome-wide sequencing mode of the cfDNA, so that the early gastric cancer patients can be accurately identified.

Description

Cancer noninvasive early screening method based on cfDNA omics characteristics

Technical Field

The invention relates to the field of cancer early screening, in particular to a cancer noninvasive early screening method based on cfDNA omics characteristics.

Background

Fluid biopsy is the clinical application of early screening, molecular typing, prognosis, medication guidance, and recurrence monitoring of cancer by analyzing cancer components in blood. Liquid biopsy is used as a new precise medical technology, can qualitatively and quantitatively detect tumor cells and DNA directly related to tumors, has the characteristics of non-invasiveness, convenience in sampling, real-time monitoring and the like, and gradually plays an increasingly important role in tumor diagnosis and treatment.

Currently, the conventional method of studying fluid biopsy, cancer early screening, is to identify cfDNA released by tumors by mutation detection of oncogenes or cancer suppressor genes. Document Razavi, p., Li, b.t., Brown, d.n.et al.high-intensity sequencing reactions the sources of plasma circulating cell-free DNA variants. nat Med 25, 1928-. This can lead to a significant misjudgment rate in identifying cancer patients through mutation of a specific target gene. Moreover, since cancer patients have great variability, a part of the cancer patients may be missed after defining a specific target gene. In addition, high depth target sequencing is also extremely expensive and cannot be used universally in the clinic. Moreover, the above studies have been only directed to patients with advanced cancer metastasis.

The documents Chen Xiaoji, Chang Ching-Wei, Spoerke Jill M, et al, Low-pass white-genome Sequencing of Circulating Cell-free DNA signatures Dynamic Changes in Genomic Copy Number in Squamous Lung Cancer Clinical code 2019,25(7): 2254-. Meaning that this approach is only useful for a large number of cancer species with copy number variations. Lower depth genome-wide sequencing is also prone to missing copy number variation regions of some cancer genes. Moreover, the above studies have been directed only to patients in the advanced stages of cancer metastasis.

Matthew et al, obtained cfDNA isolated from circulating plasma, map the occupancy Of nucleosomes within the genome, found that the distribution pattern Of cfDNA is closely related to the tissue site, and predicted the distribution pattern Of nucleosomes by studying cfDNA, thereby determining the specific origin Of cfDNA, which can be used for non-invasive detection Of clinical conditions, but it is limited to the theoretical level and does not relate to specific applications, lacks a comprehensive evaluation Of the patient's DNA genome, and lacks an evaluation Of the abundance Of cfDNA.

Therefore, it is necessary for those skilled in the art to design a noninvasive cancer early screening method which can completely predict early gastric cancer patients only by a low-depth whole genome sequencing mode, can greatly reduce the cost of cancer early screening and improve the screening accuracy.

Disclosure of Invention

In view of the above, the present application aims to provide a cancer noninvasive early screening method based on cfDNA omics characteristics, which comprehensively characterizes cfDNA in gastric cancer patients by combining cfDNA length distribution characteristics, copy number variation density distribution characteristics and cfDNA promoter periphery openness characteristics and by a cfDNA low-depth whole genome sequencing manner, thereby accurately identifying early gastric cancer patients.

In order to achieve the above object, the present application provides the following technical solutions.

A cancer noninvasive early screening system based on cfDNA omics characteristics comprises a cfDNA omics characteristic model and a machine learning training model, and is characterized in that the cancer noninvasive screening method comprises the following steps:

s101, establishing a cfDNA omics characteristic model;

s102, blood collection;

s103, extracting cfDNA;

s104, performing library construction and sequencing on the extracted cfDNA;

and S105, extracting cfDNA omics characteristics and comparing the cfDNA omics characteristics.

Preferably, the establishing a cfdnamics feature model in step S101 comprises the following steps:

s201, blood collection;

s202, extracting cfDNA;

s203, performing library construction and sequencing on the extracted cfDNA;

s204, extracting cfDNA omics characteristics;

and S205, machine learning and training the model.

Preferably, the blood collection in step S102 and step S201 is performed by whole blood extraction using a blood collection tube. The blood collection tube contains a preservative which can stabilize nucleated blood cells, prevent the release of cell genome DNA, inhibit cfDNA nuclease-mediated degradation and contribute to the overall stability of cfDNA.

Preferably, the extracting of cfDNA in step S103 and step S202 includes the steps of:

s301, placing the blood collection tube in a centrifuge, and centrifuging until plasma is separated;

s302, adding protease K and ACL buffer into a centrifugal tube containing plasma, fully mixing uniformly and incubating;

s303, carrying out suction filtration on the incubated collection pipe by using a vacuum pump and washing off impurities;

s304, placing the mixture in a centrifuge for centrifugation;

s306, placing the collecting pipe in a metal bath to volatilize ethanol, adding AVE, and incubating;

s307, placing the collecting pipe in a centrifuge for centrifugation, carrying out DNA concentration determination on the filtrate, and detecting the fragment distribution of the cfDNA.

Preferably, the cf omics features comprise essentially one or more of fragment pattern, cnv diversity and TSS coverage.

Preferably, the cfdnamics feature extraction method in steps S101, S105 and S204 comprises:

s401, comparing the cfDNA sequence file with a reference genome to obtain a BAM file;

s402, removing low-quality sequences and repeated sequences in the BAM file;

s403, excluding the region with low coverage rate of the reference genome and the Duke black box region;

s404, dividing the chromosome into adjacent segments without intersection;

s405, counting the number of long and short cfDNAs;

s406, correcting and processing the counted number by GC content;

s407, fragment pattern quantization is carried out by using a proportion; and carrying out median standardization by adopting a gold standard, and counting the density distribution of copy number variation.

S408, obtaining the coordinate of a transcription start site of the reference genome, comparing the coordinate to a BAM file, and obtaining the coverage of a sequence near the site;

and S409, obtaining TSS coverage through coverage calculation.

Preferably, the method for establishing the machine learning training model in step S205 includes:

s501, dividing a sample into a training set and a testing set;

s502, processing sample data in the training set;

s503, extracting omics characteristics of cfDNA in the training set and verifying the characteristics in the testing set;

and S504, evaluating the efficiency of the model.

Preferably, the test set in step 501 comprises n gastric cancer samples and m healthy samples, and the training set comprises n +1 gastric cancer samples and m healthy samples, wherein n and m are positive integers.

Preferably, the sample data processing method in step S502 includes using an algorithm using ten-fold cross validation.

Preferably, the evaluation in step S504 includes calculation and evaluation of sensitivity, specificity, accuracy, recall, ROC, and AUG.

The beneficial technical effects obtained by the invention are as follows:

1) the invention adopts a low-depth whole genome sequencing mode, compared with the target sequencing with ultrahigh depth or high depth; the sequencing cost is greatly reduced, and the cost is lower;

2) according to the invention, the overall appearance of cfDNA in the gastric cancer patients can be reflected more comprehensively by means of whole genome sequencing, and the omission of gastric cancer patients with large heterogeneity is avoided;

3) the invention can more comprehensively excavate the specificity of cfDNA in the gastric cancer patient through the analysis mode of the trimomics.

The foregoing description is only an overview of the technical solutions of the present application, so that the technical means of the present application can be more clearly understood and the present application can be implemented according to the content of the description, and in order to make the above and other objects, features and advantages of the present application more clearly understood, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

Figure 1 cfdnamics features non-invasive screening flowsheet for patients in early gastric cancer;

FIG. 2 is a schematic diagram of feature extraction and model training for cfDNA fragment patterns;

FIG. 3 schematic diagram of feature extraction and model training for cfDNA cnv diversity;

FIG. 4 is a schematic diagram of feature extraction and model training for cfDNA TSS coverage;

FIG. 5 is a schematic diagram of MUC2 as a target gene for early gastric cancer.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. In the following description, specific details such as specific configurations and components are provided only to help the embodiments of the present application be fully understood. Accordingly, it will be apparent to those skilled in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present application. In addition, descriptions of well-known functions and constructions are omitted in the embodiments for clarity and conciseness.

It should be appreciated that reference throughout this specification to "one embodiment" or "the embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrase "one embodiment" or "the present embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Further, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, B exists alone, and A and B exist at the same time, and the term "/and" is used herein to describe another association object relationship, which means that two relationships may exist, for example, A/and B, may mean: a alone, and both a and B alone, and further, the character "/" in this document generally means that the former and latter associated objects are in an "or" relationship.

The term "at least one" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, at least one of a and B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion.

Example 1

s101, establishing a cfDNA omics characteristic model;

s102, blood collection;

s103, extracting cfDNA;

s104, performing library construction and sequencing on the extracted cfDNA;

s201, blood collection;

s202, extracting cfDNA;

s203, performing library construction and sequencing on the extracted cfDNA;

s204, extracting cfDNA omics characteristics;

and S205, machine learning and training the model.

Preferably, the blood collection in step S102 and step S201 is a whole blood extraction using Streck blood collection tubes. The Streck blood collection tube contains a preservative which can stabilize nucleated blood cells, prevent the release of cell genome DNA, inhibit cfDNA nuclease-mediated degradation and contribute to the overall stability of cfDNA.

s301, placing the Streck blood collection tube in a centrifuge, and centrifuging until plasma is separated;

s304, placing the mixture in a centrifuge for centrifugation;

s307, placing the collecting tube in a centrifuge for centrifugation, collecting the filtrate in an EP tube, carrying out DNA concentration determination, and detecting the fragment distribution of the cfDNA.

Preferably, the cf omics features comprise essentially one or more combinations of fragment pattern, cnv diversity and TSS coverage.

s402, removing low-quality sequences and repeated sequences in the BAM file;

s404, dividing the chromosome into adjacent segments without intersection;

s405, counting the number of long and short cfDNAs;

s406, correcting and processing the counted number by GC content;

and S409, obtaining TSS coverage through coverage calculation.

s501, dividing a sample into a training set and a testing set;

s502, processing sample data in the training set;

and S504, evaluating the efficiency of the model.

Preferably, the test set in step 501 includes n gastric cancer samples and m healthy samples, the training set includes n +1 gastric cancer samples and m healthy samples, and n and m are positive integers.

Preferably, the evaluation in step S504 includes the calculation of sensitivity, specificity, accuracy, recall, ROC and AUG.

Example 2

This embodiment is performed based on embodiment 1, and the same points as embodiment 1 are not repeated.

This example introduces a method of cfDNA extraction, comprising the specific steps of:

s601, placing a Streck tube in a 4 ℃ centrifuge, centrifuging at 2000rpm for 10min, and separating plasma;

s602, adding 500ul protease K and 4ML ACL buffer into a 50ML centrifuge tube containing plasma, fully mixing the protease K and the 4ML ACL buffer uniformly for vortex 30S, and placing the mixture in a water bath kettle at 60 ℃ for incubation for 30 min;

s603, carrying out suction filtration on the incubated collection pipe for 10min by using a vacuum pump, sequentially adding 600ul of ACW1, 750ul of ACW2 and 750ul of 100% ethanol, and washing away impurities;

s604, centrifuging at 12000rpm for 3 min;

s605, placing the collecting pipe in a metal bath at 50 ℃ for 10min, volatilizing ethanol, adding 110ul AVE, placing the mixture in the collecting pipe, and incubating for 3min at room temperature;

s606, placing the collection tube in a centrifuge at 12000rpm, centrifuging for 1min, collecting the filtrate into a 1.5ml EP tube, performing DNA concentration determination by using Quibt 3.0, and detecting the fragment distribution of cfDNA through 2100.

Example 3

This example introduces a method for feature extraction in cfDNA triomics, comprising the steps of:

fragment pattern first align cfDNA sequence files to reference genome hg19, discard low quality sequences from the resulting BAM files and filter out duplicate sequences; then excluding the region with low coverage of hg19 reference genome and the Duke black box region; next, hg19 autosomes were divided into 504 contiguous, non-intersecting segments, each 5mb in length; counting the number of cfDNAs with the length of more than 150bp and the number of cfDNAs with the length of less than 150bp in each fragment region; correcting the GC content of the number of the cfDNAs by using an LOESS regression method, and processing the number of the cfDNAs after the GC correction by using a mean value standardization method; and finally, obtaining the number of the cfDNA short and short fragments in each 5mb interval, and finally quantifying fragment patterns by using the proportion.

Cnv diversity: removing low-quality and repeated sequences in the aligned BAM file, and dividing the chromosome into 51120 adjacent fragments without intersection, wherein each fragment is 50 kb; correcting the GC content in the same way as in the fragment pattern, taking the median of the number of cfDNA of each fragment of healthy people after GC correction as a gold standard, and carrying out median standardization on the number of cfDNA in each fragment of the cancer patient by using the gold standard; dividing the amplified and deleted fragments by taking 0.2 as a threshold value, counting the density distribution of copy number variation, and identifying abnormal amplified and deleted intervals; the path and the biological process in which the gene in the amplification interval is involved are explored.

Removing low-quality sequences in the BAM file, downloading coordinates of hg19 reference genome transcription start sites from an ENSEMBL database, and comparing the coordinates to the BAM file to obtain the coverage of sequences near the sites; firstly, calculating the coverage of Nucleosome Deletion Region (NDR) near the initiation site, and then calculating the coverage from upstream 1000bp to downstream 1000bp (2k region) of the initiation site; then, in order to standardize the two coverage degrees, the average value of the coverage degrees of the upstream 3000bp to upstream 1000bp fragment and the downstream 1000bp to downstream 3000bp fragment of the initiation site is calculated to be used as a gold standard; NDR and 2k regions were divided by the gold standard as the final TSS coverage.

Example 4

This embodiment is performed on the basis of embodiment 1, and the same points as embodiment 1 are not repeated.

This example presents cfDNA combined feature extraction and model training.

Preoperative peripheral blood of 81 gastric cancer patients and peripheral blood of 38 healthy persons were collected and cfDNA extraction, pooling and sequencing were performed in this example.

The first step is as follows: the feature extraction and model training of cfDNA fragment patterns, the results are shown in fig. 2.

A. Dividing the BAM file after the comparison into 504 bins which are adjacent and have no intersection, then calculating the number of long fragments of which the cfDNA is more than 150bp and the number of short fragments of which the cfDNA is less than 150bp after GC correction in each bin, and calculating the proportion of the short fragments to the long fragments; the proportion of healthy persons was found to be relatively concentrated and the number of long fragments per bin was more proportional than in patients with gastric cancer; the proportion of gastric cancer patients is relatively diffuse and the proportion of short fragments per bin is greater compared to healthy people.

B. After the average value standardization of the proportion distribution of the gastric cancer patients and the healthy people is finished, the proportion of the healthy people is found to be stable and unchanged, and the proportion variability of the gastric cancer patients is strong.

C. The median of 504 bins from healthy persons was used as the gold standard, and the similarity between each sample and the gold standard was sought. The similarity between healthy people is found to be strong, and a healthy person from a nature article is selected for comparison, so that the healthy person in the nature is found to be similar to the gold standard, and the similarity between the gastric cancer patient and the gold standard is obviously reduced; compared with healthy people, the difference p value is 0.0003313, compared with healthy people in nature, the difference p value is 3.686e-08, and the detection mode is rank sum detection.

D. Training the training set by a random gradient descent algorithm, extracting features by adopting a ten-fold cross validation mode, and finally evaluating the performance of the model in the test set. In the test set, the AUC was 0.96447, the sensitivity was 0.975, the specificity was 0.842, the accuracy was 0.929, and the recall was 0.941.

The second step is that: feature extraction and model training of cfDNA cnv diversity, the results are shown in fig. 3.

A. After calculating the cnv of cfDNA, its density was evaluated. The cnv density of healthy people was found to be concentrated around the 0 value, while that of gastric cancer patients was more dispersed and a common feature. The cnv density is shown for a healthy person and for a patient with gastric cancer.

B. Setting the interval larger than 0.2 as the gene fragment amplification interval and setting the interval smaller than-0.2 as the gene fragment deletion interval, and counting the proportion of the amplification intervals and the deletion intervals of all samples. The proportion of the cnv abnormal interval of the gastric cancer patient is far higher than that of a healthy person, the p value of the difference significance is 6.499e-12, and the test mode is rank sum test.

C. Training the training set by a random gradient descent algorithm, extracting features by adopting a ten-fold cross validation mode, and finally evaluating the performance of the model in the test set. In the test set, AUC was 0.98947, sensitivity was 1, specificity was 0.895, accuracy was 0.952, and recall was 1.

The third step: feature extraction and model training of cfDNA TSS coverage, the results are shown in fig. 4.

A. The figure shows cfDNA coverage of a gene 1mb upstream to 1mb downstream of the transcription start site for a particular gastric cancer patient. The red dashed line represents the transcription initiation site, near which the coverage of cfDNA is greatly down-regulated, representing here a promoter region, which can be recognized by transcription factors.

B. The figure shows cfDNA coverage of the transcription start site from 1kb upstream to 1kb downstream. The red dashed line represents the transcription initiation site, and similar to graph a, around the transcription initiation, coverage of cfDNA is greatly down-regulated, representing here a promoter region, which can be recognized by transcription factors.

C. The cfDNA coverage of the transcriptional start site from 150bp upstream to 50bp downstream Nucleosome Deletion Region (NDR) is shown. The red dashed line represents the transcription initiation site, and similar to graph a, around the transcription initiation, coverage of cfDNA is greatly down-regulated, representing here a promoter region, which can be recognized by transcription factors.

D. The figure shows twenty thousand protein coding genes, the lower the mean coverage of the cfDNA of 81 gastric cancer samples in 2k region, the stronger the openness of the transcription initiation site of the gene is shown, and after the mean coverage sequencing is completed, the more the gene is, the stronger the openness of the promoter is.

E. The figure shows twenty thousand protein coding genes, the mean coverage of cfDNA of 81 gastric cancer samples in a Nucleosome Deletion Region (NDR) is lower, the transcription initiation site of the gene has strong openness, and the more upward the genes are sequenced after the mean coverage, the stronger the openness of a promoter is.

F. Selecting 2k region and Nucleosome Deleted Region (NDR), and if the mean coverage of transcription initiation sites of the gene in more than 80% of gastric cancer samples is less than 1, judging the promoter of the gene as an open region. The genes are subjected to KEGG pathway enrichment analysis, and the pathways are related to cell proliferation, autophagy and migration, and are significantly related to the occurrence and development of cancer.

G. Training the training set by a random gradient descent algorithm, extracting features by adopting a ten-fold cross validation mode, and finally evaluating the performance of the model in the test set. In the test set, AUC was 0.98947, sensitivity was 1, specificity was 0.895, accuracy was 0.952, and recall was 1.

The fourth step: the cfDNA TSS coverage and single cell analysis identified MUC2 as the target gene for early stage gastric cancer, and the results are shown in fig. 5.

After analyzing the characteristics of cfDNA TSS coverage, combined with gastric cancer single cell transcriptome data analysis, the gastric cancer patients in early stage have stronger opening property of MUC2 promoter region compared with healthy people, the lower normalized coverage represents stronger opening property, the stronger the opening property is in 2K region (graph A) and NDR (graph B), and the expression of MUC2 is also strong. The tumor region of the early stage gastric cancer patient was found to be stained with MUC2 fluorescent protein by HE staining and IF staining, as shown in figure C, while the tumor region of the late stage gastric cancer patient was only partially stained with MUC2 fluorescent protein or was not stained with MUC2 fluorescent protein directly, as shown in figure D.

The above description is only a preferred embodiment of the present invention, and it is not intended to limit the scope of the present invention, and various modifications and changes may be made by those skilled in the art. Variations, modifications, substitutions, integrations and parameter changes of the embodiments may be made without departing from the principle and spirit of the invention, which may be within the spirit and principle of the invention, by conventional substitution or may realize the same function.

Claims

1. A cancer noninvasive early screening system based on cfDNA omics characteristics comprises a cfDNA omics characteristic model and a machine learning training model, and is characterized in that the cancer noninvasive screening method comprises the following steps:

s101, establishing a cfDNA omics characteristic model;

s102, blood collection;

s103, extracting cfDNA;

s104, performing library construction and sequencing on the extracted cfDNA;

2. The cancer noninvasive early screening system based on cfdnamics features according to claim 1, wherein the noninvasive screening method for cancer step S101, wherein the establishing a cfdnamics features model specifically comprises the following steps:

s201, blood collection;

s202, extracting cfDNA;

s203, performing library construction and sequencing on the extracted cfDNA;

s204, extracting cfDNA omics characteristics;

and S205, machine learning and training the model.

3. The cancer noninvasive early screening system based on cfDNomics characteristics as defined in claim 1 or 2, wherein the noninvasive cancer screening method comprises the steps S102 and S201, wherein the blood collection is performed by whole blood extraction with a blood collection tube; the blood collection tube contains a preservative which can stabilize nucleated blood cells, prevent the release of cell genome DNA, inhibit cfDNA nuclease-mediated degradation and contribute to the overall stability of cfDNA.

4. The cancer noninvasive early screening system based on cfDNA omics characteristics as claimed in claim 1 or 2, wherein the noninvasive screening method for cancer comprises the following steps of extracting cfDNA in step S103 and step S202:

s304, placing the mixture in a centrifuge for centrifugation;

5. The cancer noninvasive early screening system based on cfdnamics features according to claim 1 or 2, wherein the cfdnamics features comprise one or more of fragmentpatterrn, cnv density and TSS coverage.

6. The cancer noninvasive early screening system based on cfdnamics features according to claim 1 or 2, wherein the cancer screening method in steps S101, S105 and S204 comprises the cfdnamics feature extraction method:

s402, removing low-quality sequences and repeated sequences in the BAM file;

s404, dividing the chromosome into adjacent segments without intersection;

s405, counting the number of long and short cfDNAs;

s406, correcting and processing the counted number by GC content;

and S409, obtaining TSS coverage through coverage calculation.

7. The cancer noninvasive early screening system based on cfDNomics characteristics as claimed in claim 2, wherein the establishment method of the machine learning training model in step S205 of the noninvasive screening method for cancer comprises the following steps:

s501, dividing a sample into a training set and a testing set;

s502, processing sample data in the training set;

and S504, evaluating the efficiency of the model.

8. The noninvasive cancer early screening system based on cfDNomics features of claim 7, wherein in step 501 of the noninvasive cancer screening method, the test set comprises n gastric cancer samples and m healthy samples, and the training set comprises n +1 gastric cancer samples and m healthy samples, wherein n and m are positive integers.

9. The cancer noninvasive early screening system based on cfDNomics features as defined in claim 7, wherein the sample data processing method in step S502 of the noninvasive screening method for cancer comprises an algorithm using ten-fold cross validation.

10. The noninvasive cancer early screening system based on cfDNomics characteristics as set forth in claim 7, wherein the assessment in step S504 of the noninvasive cancer screening method comprises the calculation and assessment of sensitivity, specificity, accuracy, recall, ROC and AUG.