WO2019128233A1 - 宫颈癌的判断方法及系统 - Google Patents

宫颈癌的判断方法及系统 Download PDF

Info

Publication number
WO2019128233A1
WO2019128233A1 PCT/CN2018/098557 CN2018098557W WO2019128233A1 WO 2019128233 A1 WO2019128233 A1 WO 2019128233A1 CN 2018098557 W CN2018098557 W CN 2018098557W WO 2019128233 A1 WO2019128233 A1 WO 2019128233A1
Authority
WO
WIPO (PCT)
Prior art keywords
chromosome
chromosomes
chri
cervical cancer
long arm
Prior art date
Application number
PCT/CN2018/098557
Other languages
English (en)
French (fr)
Inventor
魏国鹏
Original Assignee
南京格致基因生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京格致基因生物科技有限公司 filed Critical 南京格致基因生物科技有限公司
Publication of WO2019128233A1 publication Critical patent/WO2019128233A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates to a method and a system for judging cervical cancer.
  • Cervical cancer is one of the most common gynecological tumors, and its incidence is increasing year by year. Among the many known cancers, cervical cancer is the only malignant tumor that determines the cause. High-risk persistent infection of human papillomavirus (HPV) is the leading cause of cervical cancer. Cervical cancer is a cancer that can be diagnosed and prevented early by conventional means. At present, the main methods for the diagnosis of cervical cancer are high-risk HPV detection and cytological examination of cervical exfoliated cells.
  • the detection methods for high-risk HPV include: real-time fluorescent quantitative PCR, second-generation hybrid capture, and enzyme-cut signal amplification.
  • the cytological examination of cervical exfoliated cells mainly includes liquid-based thin layer cytology (TCT), which has the advantages of non-invasiveness and relatively accurate judgment on some cervical cancers; the disadvantage is low sensitivity.
  • TCT liquid-based thin layer cytology
  • ASC-US atypical squamous epithelial cells
  • APC atypical glandular cells
  • Chromosomal imbalance is one of the characteristics of malignant tumors, which refers to genomic structural changes that occur relative to common diploid genomes, including changes in the number of chromosomes, such as polyploids or haploids; , such as increased copy number or missing copy number. There is currently no method for diagnosing cervical cancer using chromosome imbalance.
  • the present inventors have found that high-throughput sequencing can conveniently and quickly determine whether a chromosome has a chromosomal imbalance, thereby screening, diagnosing or grading cervical cancer.
  • the present invention provides a method of determining whether a chromosome has a chromosomal imbalance, a computer readable medium storing instructions for performing the method, a computing device including the computer readable medium, and a system including the computing device, A method of screening, diagnosing, or risk stratifying cervical cancer, a computer readable medium storing instructions for performing the method, a computing device including the computer readable medium, and a system including the computing device.
  • the invention also provides a set of chromosomes for screening, diagnosing or risk grading cervical cancer, and an agent for detecting chromosomal imbalance of the set of chromosomes for preparing a diagnosis, diagnosis or risk grading of cervical cancer Uses in the agent, and devices for detecting chromosomal imbalances in the set of chromosomes for use in the preparation of devices for screening, diagnosing or risk stratifying cervical cancer.
  • the invention provides for determining whether at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 of a sample from a subject (eg, a human) has a chromosomal imbalance (eg, a long arm copy number of the chromosome) A method of whether the difference from the short arm copy number is higher than or equal to the threshold, and whether the difference between the long arm coverage and the short arm coverage is higher than or equal to the threshold.
  • a chromosomal imbalance eg, a long arm copy number of the chromosome
  • chromosomal structural information of at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 of the sample from the subject can be determined (eg, the chromosome is not determined) Balance, the difference between the long arm copy number and the short arm copy number, or the structural information required for the difference between the long arm coverage and the short arm coverage) is compared with the chromosomal structure information of the corresponding chromosome from a healthy individual to determine Whether there is a chromosomal imbalance in the above chromosome in the sample from the individual.
  • the present invention also provides a method for determining chromosomal imbalance, which may include: determining a difference between a long arm copy number of a chromosome and a copy number of a short arm, in a case where a difference between a long arm copy number of a chromosome and a short arm copy number is greater than or equal to a threshold value, It was judged that there was a chromosome imbalance.
  • the present invention also provides a method for determining chromosomal imbalance, which may include: determining a difference between a long arm coverage of a chromosome and a short arm coverage, and in a case where a difference between a long arm coverage and a short arm coverage is higher than or equal to a threshold, It was judged that there was a chromosome imbalance.
  • the chromosomal imbalance can be determined by:
  • a genome-wide data sequence of a subject (eg, a human) (eg, a genome-wide data sequence obtained by high-throughput sequencing technology) is aligned to a reference genome of the same subject (eg, human reference genome Hg19), and for example, 10 to 1000 kb/segment (preferably 50 to 800 kb/segment, more preferably 100 to 500 kb/segment, more preferably 150 to 300 kb/segment, most preferably 200 kb/segment), and equally divided into a plurality of segments (for example, bin);
  • kb/segment preferably 50 to 800 kb/segment, more preferably 100 to 500 kb/segment, more preferably 150 to 300 kb/segment, most preferably 200 kb/segment
  • p represents a long arm
  • q represents a short arm
  • Chr is an abbreviation for chromosome
  • i is selected from 2, 3, 5, 8, 11, 17, and 18.
  • the Z score (Z Chri ) of the human chromosome i (Chri) can be further calculated based on the R value (R Chri ) of the human chromosome i (Chri) according to the following formula 2 :
  • ⁇ R Chri is the average of the R values corresponding to the healthy population
  • ⁇ R Chri is the standard deviation of the R value corresponding to the healthy population.
  • the i is selected from the group consisting of 2, 3, 5, and 8, selected from 2, 3, 5, 8, and 18, selected from 3, 5, and 11, or selected from 3, 5, 11, 17 and 18.
  • the ith chromosome has a chromosome imbalance; and in the case where the absolute value of the Z-score is ⁇ 3, the The chromosome i has no chromosome imbalance.
  • the method may include the following steps:
  • (c) aligning the reads to the reference genome for example, in the range of 10 to 1000 kb/seg (preferably 50 to 800 kb/segment, more preferably 100 to 500 kb/segment, more preferably 150 to 300 kb/segment, most preferably 200 kb. / segment), divided into multiple segments (such as bin);
  • p represents a long arm
  • q represents a short arm
  • Chr is an abbreviation for chromosome
  • i is selected from 2, 5, 8, 11, 17, and 18;
  • the Z score (Z Chri ) of the human chromosome i (Chri) is calculated according to the following formula 2:
  • ⁇ R Chri is the average of the R values corresponding to the healthy population
  • ⁇ R Chri is the standard deviation of the R value corresponding to the healthy population
  • i is selected from the group consisting of 2, 3, 5 and 8, selected from 2, 3, 5, 8 and 18, selected from 3, 5 and 11, or selected from 3, 5, 11, 17 and 18.
  • the absolute value of the Z-score is ⁇ 3, it is determined that the ith chromosome has a chromosome imbalance; and when the absolute value of the Z-score is ⁇ 3, it is determined that the ith chromosome has no chromosomal imbalance.
  • the invention provides a method of screening, diagnosing or risk stratifying cervical cancer, the method comprising determining 2, 3, 5, 8, 11, 17 of a sample from a subject (eg, a human) And whether there is chromosomal imbalance in at least one of chromosomes 18 (eg, whether the difference between the long arm copy number and the short arm copy number is higher than or equal to the threshold, and whether the difference between the long arm coverage and the short arm coverage is high.
  • chromosomal structural information of at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 of the sample from the subject eg, determination of chromosomal imbalance, long arm of the chromosome
  • Comparison of the difference between the copy number and the short arm copy number, or the structural information required for the difference between the long arm coverage and the short arm coverage is compared with the chromosomal structure information of the corresponding chromosome from the healthy individual to determine the individual from the individual Whether there is chromosomal imbalance in the above chromosomes in the sample, in which the chromosome is not balanced (for example, the difference between the long arm copy number of the chromosome and the short arm copy number is higher than or equal to the threshold value, and the long arm of the chromosome
  • cover and the cover of the short arm of the difference is greater than or equal to the threshold
  • chromosomal imbalance can be determined by:
  • a genome-wide data sequence of a subject (eg, a human) (eg, a genome-wide data sequence obtained by high-throughput sequencing technology) is aligned to a reference genome (eg, human reference genome Hg19), for example, in a range of 10 to 1000 kb/segment (preferably, it is 50 to 800 kb/segment, more preferably 100 to 500 kb/segment, more preferably 150 to 300 kb/segment, most preferably 200 kb/segment), and is equally divided into a plurality of segments (for example, bin);
  • p represents a long arm
  • q represents a short arm
  • Chr is an abbreviation for chromosome
  • i is selected from 2, 3, 5, 8, 11, 17, and 18.
  • the human i can be further calculated based on the R value (R Chri ) of the human chromosome i (Chri) according to the following formula 2.
  • ⁇ R Chri is the average of the R values corresponding to healthy populations (non-cervical diseases (except cervicitis) and non-other cancer patients);
  • ⁇ R Chri is the standard deviation of the R value corresponding to a healthy population (non-cervical disease (except cervicitis) and non-other cancer patients), and
  • the C score is calculated according to the following formula 3:
  • the method may comprise the following steps:
  • (c) aligning reads to the reference genome for example, in the range of 10 to 1000 kb/seg (preferably 50 to 800 kb/segment, more preferably 100 to 500 kb/segment, more preferably 150 to 300 kb/segment, most preferably 200 kb. / segment), divided into multiple segments (such as bin);
  • p represents a long arm
  • q represents a short arm
  • Chr is an abbreviation for chromosome
  • i is selected from 2, 3, 5, 8, 11, 17, and 18;
  • the Z score (Z Chri ) of the human chromosome i (Chri) is calculated according to the following formula 2:
  • ⁇ R Chri is the average of the R values corresponding to the healthy population
  • ⁇ R Chri is the standard deviation of the R value corresponding to the healthy population
  • the C score is calculated according to the following formula 3:
  • i is selected from the group consisting of 2, 3, 5 and 8, selected from 2, 3, 5, 8 and 18, selected from 3, 5 and 11, or selected from 3, 5, 11, 17 and 18.
  • the absolute value of the Z-score is ⁇ 3, it is determined that the ith chromosome has a chromosome imbalance; and when the absolute value of the Z-score is ⁇ 3, it is determined that the ith chromosome has no chromosomal imbalance.
  • the present invention provides a computer readable medium having stored thereon instructions, wherein when the instructions are executed by a processor, causing a computer to:
  • chromosomes 2, 3, 5, 8, 11, 17 and 18 of a sample from a subject eg, a human
  • a chromosomal imbalance eg, a long arm copy number and a short arm copy number
  • the threshold such as whether the difference between the long arm coverage and the short arm coverage is higher than or equal to the threshold
  • chromosomal structural information of at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 of a sample from a subject eg, determination of chromosomal imbalance, chromosome long arm copy number, and short arm copy
  • the difference in number, or the structural information required for the difference between the long arm coverage and the short arm coverage is compared with the chromosomal structure information of the corresponding chromosome from the healthy individual to determine if the above chromosome exists in the sample from the individual Chromosomal imbalance, in which the chromosomal imbalance (such as the difference between the long arm copy number of the chromosome and the short arm copy number is higher than or equal to the threshold, and the difference between the long arm coverage and the short arm coverage is higher than or equal to the threshold)
  • the subject has cervical cancer or is at risk of developing cervical cancer.
  • the instructions stored in the computer readable medium described above that are to be executed by the processor determine the chromosomal imbalance by:
  • a genome-wide data sequence of a subject (eg, a human) (eg, a genome-wide data sequence obtained by high-throughput sequencing technology) is aligned to a reference genome (eg, human reference genome Hg19), for example, in a range of 10 to 1000 kb/segment (preferably, it is 50 to 800 kb/segment, more preferably 100 to 500 kb/segment, more preferably 150 to 300 kb/segment, most preferably 200 kb/segment), and is equally divided into a plurality of segments (for example, bin);
  • p represents a long arm
  • q represents a short arm
  • Chr is an abbreviation for chromosome
  • i is selected from 2, 3, 5, 8, 11, 17, and 18.
  • the Z score of the human chromosome i may be further calculated based on the R value (R Chri ) of the human chromosome ith (Chri) according to the following formula 2 ( Z Chri ):
  • ⁇ R Chri is the average of the R values corresponding to the healthy population
  • ⁇ R Chri is the standard deviation of the R value corresponding to the healthy population
  • the C score is calculated according to the following formula 3:
  • the i is selected from the group consisting of 2, 3, 5 and 8, selected from 2, 3, 5, 8 and 18, selected from 3, 5 and 11, or selected from 3 , 5, 11, 17 and 18.
  • the present invention provides a computing device, which can include the computer readable medium and processor described above.
  • the present invention provides a system that can include:
  • a sequencing device for receiving nucleic acid from a test sample to provide nucleic acid sequence information from the sample (eg, nucleic acid sequence information obtained by high throughput sequencing techniques).
  • the sequencing device is a high throughput sequencer.
  • the invention provides a set of chromosomes for screening, diagnosing, or risk stratifying cervical cancer, the set of chromosomes comprising at least one of chromosomes 2, 3, 5, 8, 11, 17, and 18.
  • the chromosome is a combination of chromosomes 2, 3, 5 and 8, a combination of chromosomes 2, 3, 5, 8 and 18, 3, 5 and 11 A combination of chromosomes, or a combination of chromosomes 3, 5, 11, 17 and 18.
  • the invention provides for detecting a chromosomal imbalance of at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 (preferably a difference between a long arm copy number and a short arm copy number, more preferably a chromosome
  • a chromosomal imbalance of at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 preferably a difference between a long arm copy number and a short arm copy number, more preferably a chromosome
  • the invention relates to detecting a chromosomal imbalance of at least one of chromosomes 2, 3, 5, 8, 11, 17 and 18 (preferably a difference between a long arm copy number and a short arm copy number, more preferably a chromosome
  • a device for the difference between long arm coverage and short arm coverage in the preparation of a device for screening, diagnosing or risk stratifying cervical cancer.
  • the invention can conveniently and quickly determine whether a chromosome has an imbalance of chromosomes by high-throughput sequencing, and then screen, diagnose or classify cervical cancer with high sensitivity, specificity and accuracy, low missed diagnosis rate and misdiagnosis rate.
  • cervical cancer may include any type of cervical cancer.
  • Types of cervical cancer that are common in the field may include: squamous cell carcinoma type (divided into grade III: grade I is a highly differentiated squamous cell carcinoma type, grade II is a moderately differentiated squamous cell carcinoma type (non-keratinized large cell type), and grade III is low.
  • grade III There are three types of differentiated squamous cell carcinoma (small cell type), adenocarcinoma type and adenosquamous carcinoma type (the cancer tissue contains two types of adenocarcinoma type and squamous cell type).
  • cervical cancer can also include cervical cancer in any individual. In one embodiment, the individual is selected from the group consisting of a human and a non-human mammal.
  • cervical cells may include cells located anywhere in the cervix or the inner wall of the cervical canal and cells that are detached from any part of the cervix where lesions may occur.
  • the cervical cells are cells that are manually detached from the cervix or the inner wall of the cervical canal, also referred to as “cervical exfoliated cells.”
  • chromosome refers to a substance carrying genetic information in the nucleus, which is cylindrical or rod-shaped under a microscope and is mainly composed of DNA and protein.
  • the portion from the centromere to the ends of the chromosome is called the chromosome arm. If the centromere is not in the center of the chromosome, it can be divided into a long arm (p) and a short arm (q). The length of both arms is important for identifying chromosomes.
  • chromosomal imbalance refers to genomic structural variation occurring relative to a common diploid genome, which may include changes in the number of chromosomes, such as polyploid or haploid; also includes local changes in chromosomes. For example, amplification, deletion, insertion or translocation of a fragment of a chromosome. Narrow chromosomal imbalance refers to aneuploidy.
  • aneuploidy deficiency loss of a pair of homologous chromosomes, ie the number of chromosomes in the cell is 2n-2;
  • non-holistic Ploidy monomericity Loss of a single chromosome, ie the number of chromosomes in the cell is 2n-1;
  • Aneuploidy Trisomy Add an extra chromosome, ie one chromosome in the genome has three copies. That is, the number of chromosomes in the cell is 2n+1; (4) aneuploidy.
  • Four-body add a pair of extra chromosomes, so that one chromosome in the genome has four copies. That is, the number of chromosomes of the cells is 2n+2.
  • the chromosomal structure information is structural information that reflects chromosome copy number variation.
  • DNA deoxyribonucleic acid
  • chromosomes a major component of chromosomes and a major genetic material.
  • DNA fragment library means that the sample DNA fragment is end-filled, a phosphate group is added at the 5' end, and an adenine nucleotide (A) is added at the 3' end, and then at both ends. Double-stranded DNA obtained by Adapter.
  • Adapter refers to a fixed sequence attached to both ends of a sample DNA fragment, which contains a sequence portion complementary to the sequencing chip, a sequencing primer sequence, a sample barcode, and the like.
  • sample means about 5 to 15 bp, preferably about 6 to 12 bp, more preferably about 7 to 10 bp, and most preferably about 8 bp in the above Adapter for distinguishing Tag sequence for different samples.
  • High-throughput sequencing also known as Next-generation sequencing
  • Next-generation sequencing refers to the ability to parallel hundreds of thousands to millions of DNA molecules at a time.
  • a sequencing technique for performing sequence determination is also known as Next-generation sequencing.
  • reads refers to the sequence and length of a sample DNA fragment (subtracting a fragment after the sequence joined in the library preparation stage) in a DNA fragment library as measured by high-throughput sequencing. .
  • sequence alignment refers to aligning reads on a reference genome (eg, a human reference genome) by a sequence identity principle.
  • a "reference genome” is a whole genome sequence of a species homologous to a sample DNA available from a public database.
  • the reference genome is a reference genome of a human or non-human mammal.
  • the public database is not particularly limited. In a preferred embodiment, the public database is GenBank of NCBI.
  • chromosomal coverage refers to the average of the number of reads of all chromosomes/short arms.
  • individual benign disease of the cervix refers to an individual suffering from a benign cervical disease, wherein the benign cervical disease includes cervical intraepithelial neoplasia, benign cervical tumor, cervical cyst, and the like.
  • the healthy population refers to a population of non-cervical diseases (other than cervicitis) and not other cancer patients.
  • the healthy population can include a cervicitis population.
  • the population is a population of human or non-human mammals.
  • the non-human mammal can include cattle, horses, pigs, sheep, dogs, cats, monkeys, rats, and the like.
  • sensitivity refers to the percentage of positive samples detected by the method of the present invention to the number of samples that are pathologically diagnosed as cervical cancer.
  • sensitivity can be expressed by the following formula, reflecting the correct rate of patient judgment:
  • Sensitivity number of true positives / (number of true positives + number of false negatives) ⁇ 100%.
  • true positive, false positive, true negative, and false negative are represented by a, b, c, and d, respectively, the relationship between sensitivity, specificity, missed diagnosis rate, misdiagnosis rate, and accuracy can be as follows.
  • true positive indicates the number of cases in which the pathological diagnosis is diseased (such as cervical cancer), and the result of the method is also positive
  • false positive indicates pathological diagnosis.
  • the number of cases that are disease-free (such as non-cervical cancer) and the result of this method is also positive
  • false negative indicates the number of cases in which the pathological diagnosis is diseased (such as cervical cancer), and the result of this method is also negative
  • Negative indicates the number of cases in which the pathological diagnosis was disease-free (eg, non-cervical cancer) and the results of the method were also negative.
  • Sensitivity sen a/(a+c);
  • Missed diagnosis rate c / (a + c);
  • specificity refers to the percentage of samples that have been subjected to a negative test result by the test sample detected by the method, and which account for a non-cervical cancer diagnosed by pathological examination. In medical diagnosis, specificity can be expressed by the following formula, reflecting the correct rate of non-patients:
  • the "missing rate” also known as the false negative rate, refers to the actual illness (such as cervical cancer) when screening or diagnosing a disease (such as cervical cancer) in a test population.
  • the tester and according to this diagnostic method and criteria, was determined as the percentage of non-patients.
  • the rate of missed diagnosis can be expressed by the following formula:
  • Missed diagnosis rate number of false negatives / (number of true positives + number of false negatives) ⁇ 100%.
  • misdiagnosis rate also known as the false positive rate, refers to the actual absence of disease (such as cervical cancer) when screening or diagnosing a disease (such as cervical cancer) in a test population.
  • the tester, and the percentage of patients is determined according to the diagnostic method and criteria.
  • the rate of misdiagnosis can be expressed by the following formula:
  • Misdiagnosis rate number of false positives / (number of true negatives + number of false positives) ⁇ 100%.
  • the process of determining whether a chromosome has a chromosome imbalance or not, and screening, diagnosing, or risk grading cervical cancer may include:
  • (D) Sequence alignment that is, the sequence (effective read read) of the sample DNA fragment measured by high-throughput sequencing is aligned to the human reference genome, and for example, according to 10 to 1000 kb/seg (preferably 50-800 kb / segment, more preferably 100 to 500 kb / segment, more preferably 150 to 300 kb / segment, most preferably 200 kb / segment), divided into multiple segments (eg, bin), respectively, to calculate the segment covered by the long arm of chromosome i (eg The average number of reads of the bin) (cov Chrip ) and the average number of reads of the segment (eg bin) covered by the short arm of the chromosome (cov Chriq );
  • cervical exfoliated cells of a subject can be collected by methods commonly used in the art.
  • the method of collecting cervical exfoliated cells may include using a cervical sampler to brush the inner wall of the cervix and the cervix cells, and immersing the cervical sampler brush in the cell preservation solution, so that the brush adhered to the cervical sampler brush The cervical exfoliated cells on the head are released into the cell preservation solution to form a cell mixture. Cervical exfoliated cells are isolated by performing conventional centrifugation on the cell mixture.
  • the type and pattern of the cervical sampler are not particularly limited as long as the required amount of cervical exfoliated cells can be collected.
  • any commercially available cervical sampler can be employed.
  • Hologic's ThinPrep disposable cervical sampler can be used.
  • the composition of the cell preservation solution is not particularly limited as long as the cervical exfoliated cells can be temporarily stored.
  • any commercially available cell preservation solution for cervical exfoliated cells may be employed, or the cell preservation solution for cervical exfoliated cells may be formulated according to a conventional method.
  • Hologic's ThinPrep cell preservation solution can be used as a cell preservation solution for cervical exfoliated cells.
  • the strength and the number of times of centrifugation of the cervical exfoliated cell mixture are not particularly limited as long as the separation of cervical exfoliated cells can be achieved.
  • the cervical exfoliated cell mixture is centrifuged 1 to 5 times, preferably twice, with a centrifugal force of 1200 to 2000 g, more preferably 1400 to 1800 g, and most preferably 1600 g.
  • genomic DNA can be extracted from cervical exfoliated cells by any conventional method in the art.
  • genomic DNA can be fragmented and a DNA fragment library can be constructed by any conventional method in the art.
  • genomic DNA is fragmented and a library of DNA fragments is constructed using any commercially available kit.
  • genomic DNA is fragmented and a library of DNA fragments is constructed using Kapa's HyperPlus kit.
  • the process of fragmenting genomic DNA and constructing a library of DNA fragments using the kit can include:
  • fragment size is preferably 200 to 800 bp, more preferably 200 to 700 bp, still more preferably 200 to 600 bp, still more preferably 200 to 500 bp, and more
  • a strip of 220-350 bp, more preferably 280-320 bp is subjected to gel extraction, and a DNA fragment (ie, a DNA fragment library) of the correct linker and the sample tag is recovered using any commercially available kit;
  • the sequencing method and apparatus employed are not particularly limited.
  • the DNA fragment library is subjected to high throughput sequencing using a commercially available sequencer.
  • a DNA fragment library can be Qualcomm using an Illumina sequencer, an Applied Biosystems (ABI) sequencer, a Roche sequencer, a Helicos sequencer, or a Complete Genomics sequencer. Sequencing.
  • the DNA fragment library is subjected to high throughput sequencing using an Illumina sequencing machine.
  • the Adapter and the sample barcode are subtracted from the measured sequence, and the noise (such as a low-mass region) is removed to obtain a sequence of the sample DNA fragment, that is, an effective read length ( Reads).
  • the means for comparing the effective read reads to the human reference genome is not particularly limited, and the sequence alignment can be carried out by any conventional means in the art.
  • the sequence alignment can be performed using BWA-MEM software ( http://bio-bwa.sourceforge.net ).
  • the sequence alignment results can be written to any suitable file format at any suitable size per bin.
  • the sequence alignment results are written to a plurality of files, for example, *.bin (or *.bam) format, in a size of 10 to 1000 kb, 50 to 500 kb, preferably 100 to 300 kb, and more preferably 200 kb per segment.
  • multiple sequences covered by a specified position of a human chromosome i are selected from a plurality of saved sequence alignment result files (eg, the above *.bin (or *.bam) file) Align the result file (for example, the above *.bin (or *.bam) file), and calculate the selected sequence alignment result file (for example, the above *.bin (or *.bam) file) to compare the person The average number of valid reads on the reference genome (cov Chri ).
  • the selected position of the human chromosome i is the long arm and the short arm of the human chromosome i (Chri), which is aligned to the chromosome chromosomal (Chri) of the human reference genome.
  • the average number of effective read reads on the long arm is represented by cov Chrip , and the number of valid read reads on the short arm of chromosome chromophore (Chri) of the human reference genome is compared.
  • the average is expressed in cov Chriq .
  • i is at least 1, 2, at least 3, at least 4, at least 5, at least 6, or all of 2, 3, 5, 8, 11, 17, and 18.
  • i is selected from the group consisting of 2, 3, 5, and 8, selected from 2, 3, 5, 8, and 18, selected from 3, 5, and 11, or selected from the group consisting of 3, 5, 11, 17, and 18.
  • means for calculating the average of the number of effective read reads on the human reference genome is not particularly limited.
  • determining whether the chromosome is based on an average of the number of read reads at a specified position of the ith chromosome of the human reference genome is determined according to an operation value of an exemplary algorithm described below There are chromosomal imbalances, and screening, diagnosis, or risk stratification of cervical cancer.
  • the designated position of the human chromosome i (Chri) is the long arm and the short arm of the human chromosome i (Chri), which is aligned to the chromosome y (Chri) of the human reference genome.
  • cov Chrip The average number of effective read reads on the long arm is represented by cov Chrip , which compares the number of valid read reads on the short arm of chromosome y (Chri) of the human reference genome.
  • the average is expressed in cov Chriq .
  • i is at least 1, 2, at least 3, at least 4, at least 5, at least 6, or all of 2, 3, 5, 8, 11, 17, and 18.
  • i is selected from the group consisting of 2, 3, 5, and 8, selected from 2, 3, 5, 8, and 18, selected from 3, 5, and 11, or selected from the group consisting of 3, 5, 11, 17, and 18.
  • the R value (R Chri ) of human chromosome i (Chri) is calculated according to the following formula:
  • p represents a long arm
  • q represents a short arm
  • Chr is an abbreviation for chromosome
  • i is selected from 2, 3, 5, 8, 11, 17, and 18.
  • the Z score (Z Chri ) of the human chromosome i (Chri) is calculated according to the following formula 2:
  • ⁇ R Chri is the average of the R values corresponding to healthy populations (non-cervical diseases (except cervicitis) and non-other cancer patients);
  • ⁇ R Chri is the standard deviation of the R value corresponding to a healthy population (non-cervical disease (except cervicitis) and non-other cancer patients).
  • fraction C is calculated according to the following equation 3 (CScore):
  • i is at least 1, 2, at least 3, at least 4, at least 5, at least 6, or all of 2, 3, 5, 8, 11, 17, and 18. In one embodiment, i is selected from the group consisting of 2, 3, 5, and 8, selected from 2, 3, 5, 8, and 18, selected from 3, 5, and 11, or selected from the group consisting of 3, 5, 11, 17, and 18.
  • R value In the context of the present invention, the above-mentioned "R value”, “Z minute”, and/or “C minute” are sometimes referred to as “chromosome equilibrium state score”. In the context of the present invention, the above-mentioned “C score” is sometimes referred to as “cancer score”.
  • the absolute value of the Z-score when the absolute value of the Z-score is ⁇ 3, it is determined that the ith chromosome has a chromosome imbalance; and in the case where the absolute value of the Z-score is ⁇ 3, it is determined that the The chromosome i has no chromosome imbalance.
  • the methods of the invention can also be combined with other methods of determining whether a chromosome has a chromosomal imbalance.
  • the method of screening, diagnosing, or risk stratifying cervical cancer of the present invention may also be combined with other methods of diagnosing cervical cancer.
  • the other methods of diagnosing cervical cancer can include high risk HPV detection and cytological examination of cervical exfoliated cells.
  • the method for detecting high-risk HPV may include: morphological observation, immunohistochemistry, dot hybridization, blotting in situ hybridization, PCR/RFLP, PCR/Southern, and the like.
  • the cytological examination of cervical exfoliated cells can include a Thin-Cytologic Test (TCT).
  • a total of 107 subjects were included, including 40 patients diagnosed with cervical cancer by histopathology, diagnosed as benign cervical lesions (including cervical intraepithelial neoplasia, benign cervical tumors, cervix) 41 individuals with cysts, etc., 26 healthy individuals (non-cervical diseases (except cervicitis) and non-other cancer individuals) as controls.
  • a total of 167 subjects were included, including: 44 patients diagnosed with cervical cancer by histopathology, diagnosed as benign cervical diseases (including cervical intraepithelial neoplasia, benign cervical tumors, cervix) 69 individuals with cysts, etc., 54 healthy individuals (non-cervical diseases (except cervicitis) and non-other cancer individuals) as controls.
  • a total of 167 subjects were included, including 42 patients diagnosed with cervical cancer by histopathology, diagnosed as benign cervical lesions (including cervical intraepithelial neoplasia, benign cervical tumors, cervix) 68 individuals with cysts, etc., 57 healthy individuals (non-cervical diseases (except cervicitis) and non-other cancer individuals) as controls.
  • the inner wall of the above-mentioned subject was brushed clockwise 10 times, and then the brush head of the sampling brush was immersed in a ThinPrep cell preservation solution (Hologic) to make a sticky
  • the exfoliated tissue of the inner wall of the cervix attached to the brush head is released into the cell preservation solution to form a tissue mixture.
  • the cervical exfoliated cells were isolated by centrifuging the tissue mixture twice at 1600 g.
  • Genomic DNA was extracted from the cervical exfoliated cells collected as above using a DNA extraction kit (Qiagen) according to the protocol of the kit.
  • the amplified DNA fragment library obtained in Example 2 was sequenced from one or both ends of the DNA fragment library, and the Adapter and the sample tag were subtracted from the measured sequence ( Barcode), and removing noise (such as low mass regions) to obtain a sequence of sample DNA fragments, ie, effective read reads.
  • Example 3 The effective read reads obtained in Example 3 were compared to the human reference genome using BWA-MEM software ( http://bio-bwa.sourceforge.net ), and the alignment result was 200 kb per segment. The size is written to multiple files in *.bin (or *.bam) format.
  • the algorithm used in this embodiment is as follows.
  • R value (R Chri ) of the human chromosome i (Chri) is calculated according to the following formula:
  • Chr is an abbreviation for chromosome, wherein i is selected from 2, 3, 5, 8, 11, 17, and 18.
  • the Z score (Z Chri ) of the human chromosome i (Chri) is calculated according to the following formula 2:
  • ⁇ R Chri is the average of the R values corresponding to 26 healthy populations (control);
  • ⁇ R Chri is the standard deviation of the R values corresponding to 26 healthy populations (controls).
  • i is selected from 2, 3, 5, 8, 11, 17, and 18.
  • i is selected from 2, 3, 5 and 8, selected from 2, 3, 5, 8 and 18, selected from 3, 5 and 11, or selected from 3, 5, 11, 17 and 18.
  • the absolute value of the Z-score is ⁇ 3, it is determined that the chromosome i has a chromosome imbalance; and in the case where the absolute value of the Z-score is ⁇ 3, it is determined that the chromosome ith has no chromosome imbalance.
  • Example 2 As described in Example 1, in the first round of studies, a total of 107 subjects were included, including: 40 patients diagnosed with cervical cancer by histopathology, diagnosed as benign cervical diseases (including cervical epithelium) 41 individuals (inside the tumors, cervical cysts, etc.) (shaded in Table 2 below), 26 healthy individuals (non-cervical diseases (except cervicitis) and non-other cancer individuals) as controls.
  • the results of detection by the method of the present invention against the above-mentioned 107 subjects are shown in Table 2 below.
  • Table 2 Z and C scores calculated from each sample when i is 2, 3, 5, and 8, or 2, 3, 5, 8, and 18, as well as sensitivity, specificity, missed diagnosis rate, Misdiagnosis rate and accuracy
  • Example 1 in the second round of studies, a total of 167 subjects were included, including: 44 patients diagnosed with cervical cancer by histopathology, diagnosed as benign cervical lesions (including cervical epithelium) 69 individuals (inside the tumors, cervical cysts, etc.) (shaded in Table 3 below), healthy individuals (non-cervical diseases (except cervicitis) and non-other cancer individuals) 54 as controls.
  • the results of detection by the method of the present invention for the above 167 subjects are shown in Table 3 below.
  • Table 3 Z and C scores calculated from each sample when i is 2, 3, 5, and 8, or 2, 3, 5, 8, and 18, as well as sensitivity, specificity, missed diagnosis rate, Misdiagnosis rate and accuracy
  • Example 1 in the third round of studies, a total of 167 subjects were included, including 42 patients diagnosed with cervical cancer by histopathology, diagnosed as benign cervical lesions (including cervical epithelium). Individuals with internal tumors, benign cervical tumors, cervical cysts, etc., 68 (shaded in Table 4 below), and 57 healthy individuals (non-cervical diseases (except cervicitis) and non-other cancer individuals) as controls.
  • Table 4 The results of detection by the method of the present invention for the above 167 subjects are shown in Table 4 below.
  • Table 4 Z and C scores calculated from each sample when i is 3, 5, and 11, or 3, 5, 11, 17, and 18, as well as sensitivity, specificity, missed diagnosis rate, misdiagnosis rate And accuracy
  • One or more chromosomes selected from chromosomes 2, 3, 5, 8 and 18, selected from chromosomes 2, 3, 5 and 8 extracted from cervical exfoliated cells of the subject selected
  • the R value of DNA from one or more of chromosomes 3, 5 and 11 or one or more chromosomes selected from chromosomes 3, 5, 11, 17 and 18, and further calculates the Z score based on the value of R C, based on the Z and C points can quickly and easily determine that a chromosome of the subject has a chromosome imbalance, and then screen the cervical cancer with high sensitivity, specificity and accuracy, and low missed diagnosis rate and misdiagnosis rate. Check, diagnose or risk grading.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Oncology (AREA)
  • Hospice & Palliative Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本发明涉及通过高通量测序来判断某个染色体是否存在染色体不平衡的方法、存储有用于执行该方法的指令的计算机可读介质、包括该计算机可读介质的计算设备及包括该计算设备的系统,还涉及通过高通量测序来对宫颈癌进行筛查、诊断或风险分级的方法、存储有用于执行该方法的指令的计算机可读介质、包括该计算机可读介质的计算设备及包括该计算设备的系统。本发明还涉及用于对宫颈癌进行筛查、诊断或风险分级的一组染色体,及用于检测该组染色体的染色体不平衡的试剂在制备对宫颈癌进行筛查、诊断或风险分级的诊断剂中的用途和用于检测该组染色体的染色体不平衡的装置在制备对宫颈癌进行筛查、诊断或风险分级的设备中的用途。

Description

宫颈癌的判断方法及系统 【技术领域】
本发明涉及宫颈癌的判断方法及系统。
【背景技术】
宫颈癌是最常见的妇科肿瘤之一,其发病率呈逐年升高的趋势。在已知的众多癌症中,宫颈癌是唯一确定病因的恶性肿瘤,人类乳头瘤病毒(HPV)高危型持续感染是导致宫颈癌发生的主要原因。宫颈癌是可通过常规手段早期诊断及预防的癌症。目前宫颈癌诊断的主要方法有高危型HPV检测和宫颈脱落细胞的细胞学检查。
对高危型HPV的检测方法主要有:实时荧光定量PCR法、第二代杂交捕获法、酶切信号放大法等。
对宫颈脱落细胞的细胞学检查主要有液基薄层细胞学检测(Thin-Cytologic Test,TCT),其优点是无创性、对部分宫颈癌能做出较准确的判断;缺点是灵敏性低,主观性大,诊断出的不明意义的非典型鳞状上皮细胞(ASC-US)和非典型腺细胞(AGC)仍较多。
本领域目前尚无依赖于高通量测序方法简便快速地判断患宫颈癌的风险的方法,而本发明填补了这一技术空白。
【发明内容】
染色体不平衡是恶性肿瘤的特征之一,其是指相对于常见的二倍体基因组发生的基因组结构变异,可包括染色体数量的改变,如多倍体或单倍体;也包括染色体局部的改变,如拷贝数增加或拷贝数缺失等。目前尚无利用染色体不平衡来方便快捷地对宫颈癌进行诊断的方法。
本发明人发现,通过高通量测序可方便快捷地判断某个染色体是否存在染色体不平衡,进而可对宫颈癌进行筛查、诊断或风险分级。
大体上,本发明提供判断某个染色体是否存在染色体不平衡的方法、存储有用于执行该方法的指令的计算机可读介质、包括该计算机可读介质的计算设备及包括该计算设备的系统,还提供对宫颈癌进行筛查、诊断或风险分级的方法、存储有用于执行该方法的指令的计算机可读介质、包括该计算机可读介质的计算设备及包括该计算 设备的系统。本发明还提供用于对宫颈癌进行筛查、诊断或风险分级的一组染色体,及用于检测该组染色体的染色体不平衡的试剂在制备对宫颈癌进行筛查、诊断或风险分级的诊断剂中的用途,及用于检测该组染色体的染色体不平衡的装置在制备对宫颈癌进行筛查、诊断或风险分级的设备中的用途。
一方面,本发明提供判断来自受试者(例如人)的样品的第2、3、5、8、11、17和18号染色体中至少1条是否存在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异是否高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异是否高于或等于阈值)的方法。在上述方法的一个具体实施方式中,例如,可将来自受试者的样品的第2、3、5、8、11、17和18号染色体中至少1条的染色体结构信息(例如测定染色体不平衡、染色体长臂拷贝数与短臂拷贝数的差异、或染色体长臂覆盖度与短臂覆盖度的差异所需的结构信息)与来自健康个体的相应染色体的染色体结构信息进行比较,以确定来自所述个体的样品中上述染色体是否存在染色体不平衡。
本发明还提供确定染色体不平衡的方法,可包括:测定染色体长臂拷贝数与短臂拷贝数的差异,在染色体长臂拷贝数与短臂拷贝数的差异高于或等于阈值的情况下,判断为存在染色体不平衡。
本发明还提供确定染色体不平衡的方法,可包括:测定染色体长臂覆盖度与短臂覆盖度的差异,在染色体长臂覆盖度与短臂覆盖度的差异高于或等于阈值的情况下,判断为存在染色体不平衡。
在本发明的一个具体实施方式中,可通过以下方式判断染色体不平衡:
将受试者(例如人)的全基因组数据序列(例如高通量测序技术获得的全基因组数据序列)比对到同种受试者的参考基因组(例如人的参考基因组Hg19),并例如按照10~1000kb/段(优选50~800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin);
分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);
根据以下公式,计算R值:
Figure PCTCN2018098557-appb-000001
Figure PCTCN2018098557-appb-000002
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、 3、5、8、11、17和18。
在上述方法的另一实施方式中,可进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000003
其中,
μR Chri是对应于健康群体的R值的平均值;
σR Chri是对应于健康群体的R值的标准偏差。
在上述方法的一个具体实施方式中,所述i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
在上述方法的一个具体实施方式中,在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡。
在上述方法的一个具体实施方式中,所述方法可包括如下步骤:
(a)从宫颈细胞提取基因组DNA,对该基因组DNA进行片段化,及构建DNA片段文库;
(b)对所构建的DNA片段文库进行高通量测序而得到读长(reads);
(c)将读长(reads)比对到参考基因组,并例如按照10~1000kb/段(优选50~800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin);
(d)分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);及
(e)根据以下公式计算人第i号染色体(Chri)的染色体结构信息,即R值(R Chri):
Figure PCTCN2018098557-appb-000004
Figure PCTCN2018098557-appb-000005
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、5、8、11、17和18;
进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计 算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000006
其中,
μR Chri是对应于健康群体的R值的平均值;
σR Chri是对应于健康群体的R值的标准偏差,
其中所述i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18,
其中,在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡。
另一方面,本发明提供对宫颈癌进行筛查、诊断或风险分级的方法,所述方法可包括判断来自受试者(例如人)的样品的第2、3、5、8、11、17和18号染色体中至少1条是否存在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异是否高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异是否高于或等于阈值),例如,将来自受试者的样品的第2、3、5、8、11、17和18号染色体中至少1条的染色体结构信息(例如测定染色体不平衡、染色体长臂拷贝数与短臂拷贝数的差异、或染色体长臂覆盖度与短臂覆盖度的差异所需的结构信息)与来自健康个体的相应染色体的染色体结构信息进行比较,以确定来自所述个体的样品中上述染色体是否存在染色体不平衡,其中,在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异高于或等于阈值)的情况下,判断为受试者患有宫颈癌或者存在患宫颈癌的风险。
在上述对宫颈癌进行筛查、诊断或风险分级的方法的一个具体实施方式中,可通过以下方式判断染色体不平衡:
将受试者(例如人)的全基因组数据序列(例如高通量测序技术获得的全基因组数据序列)比对到参考基因组(例如人的参考基因组Hg19),并例如按照10~1000kb/段(优选50~800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin);
分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);
根据以下公式,计算R值:
Figure PCTCN2018098557-appb-000007
Figure PCTCN2018098557-appb-000008
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、3、5、8、11、17和18。
在上述对宫颈癌进行筛查、诊断或风险分级的方法的另一实施方式中,可进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000009
其中,
μR Chri是对应于健康群体(非宫颈疾病(宫颈炎除外)且非其他癌症患者)的R值的平均值;
σR Chri是对应于健康群体(非宫颈疾病(宫颈炎除外)且非其他癌症患者)的R值的标准偏差,以及
任选地进一步基于上述Z分(Z Chri),根据以下公式3计算C分(CScore):
Figure PCTCN2018098557-appb-000010
在上述对宫颈癌进行筛查、诊断或风险分级的方法的一个具体实施方式中,其中所述i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
在上述对宫颈癌进行筛查、诊断或风险分级的方法的一个具体实施方式中,在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡。
在上述对宫颈癌进行筛查、诊断或风险分级的方法的一个具体实施方式中,所述方法可包括如下步骤:
(a)从宫颈细胞提取基因组DNA,对该基因组DNA进行片段化,及构建DNA片段文库;
(b)对所构建的DNA片段文库进行高通量测序而得到读长(reads);
(c)将读长(reads)比对到参考基因组,并例如按照10~1000kb/段(优选50~ 800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin);
(d)分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);及
(e)根据以下公式计算人第i号染色体(Chri)的染色体结构信息,即R值(R Chri):
Figure PCTCN2018098557-appb-000011
Figure PCTCN2018098557-appb-000012
其中,
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、3、5、8、11、17和18;
进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000013
其中,
μR Chri是对应于健康群体的R值的平均值;
σR Chri是对应于健康群体的R值的标准偏差;及
任选地进一步基于上述Z分(Z Chri),根据以下公式3计算C分(CScore):
Figure PCTCN2018098557-appb-000014
其中所述i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18,
其中,在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡。
再一方面,本发明提供计算机可读介质,其上存储有指令,其中当所述指令被处理器执行时,使得计算机执行以下操作:
判断来自受试者(例如人)的样品的第2、3、5、8、11、17和18号染色体中至 少1条是否存在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异是否高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异是否高于或等于阈值),并任选地基于该判断结果对宫颈癌进行筛查、诊断或风险分级;
例如,将来自受试者的样品的第2、3、5、8、11、17和18号染色体中至少1条的染色体结构信息(例如测定染色体不平衡、染色体长臂拷贝数与短臂拷贝数的差异、或染色体长臂覆盖度与短臂覆盖度的差异所需的结构信息)与来自健康个体的相应染色体的染色体结构信息进行比较,以确定来自所述个体的样品中上述染色体是否存在染色体不平衡,其中,在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异高于或等于阈值)的情况下,判断为受试者患有宫颈癌或者存在患宫颈癌的风险。
在上述计算机可读介质中存储的所述会被处理器执行的指令通过以下方式判断染色体不平衡:
将受试者(例如人)的全基因组数据序列(例如高通量测序技术获得的全基因组数据序列)比对到参考基因组(例如人的参考基因组Hg19),并例如按照10~1000kb/段(优选50~800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin);
分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);
根据以下公式,计算R值:
Figure PCTCN2018098557-appb-000015
Figure PCTCN2018098557-appb-000016
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、3、5、8、11、17和18。
在上述计算机可读介质的另一实施方式中,可进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000017
其中,
μR Chri是对应于健康群体的R值的平均值;
σR Chri是对应于健康群体的R值的标准偏差,以及
任选地进一步基于上述Z分(Z Chri),根据以下公式3计算C分(CScore):
Figure PCTCN2018098557-appb-000018
在上述计算机可读介质的一个具体实施方式中,所述i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
在上述计算机可读介质的一个具体实施方式中,其中在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡;而
当满足以下条件之一时,判断为受试者患宫颈癌的风险高:
Z分的绝对值≥3;或者
C分>0;
当满足以下条件之一时,判断为受试者患宫颈癌的风险低:
Z分的绝对值<3;或者
C分=0。
再一方面,本发明提供计算设备,其可包括:上述计算机可读介质及处理器。
再一方面,本发明提供系统,其可包括:
上述计算设备、及
测序装置,其用于接收来自试验样品的核酸以提供来自该样品的核酸序列信息(例如,通过高通量测序技术获得的核酸序列信息)。
在上述系统的一个具体实施方式中,所述测序装置为高通量测序仪。
又一方面,本发明提供用于对宫颈癌进行筛查、诊断或风险分级的一组染色体,该组染色体包含第2、3、5、8、11、17和18号染色体中至少1条。
在上述一组染色体的一个具体实施方式中,所述染色体为第2、3、5和8号染色体的组合,第2、3、5、8和18号染色体的组合,第3、5和11号染色体的组合,或者第3、5、11、17和18号染色体的组合。
又一方面,本发明提供检测第2、3、5、8、11、17和18号染色体中至少1条的染色体不平衡(优选染色体长臂拷贝数与短臂拷贝数的差异,更优选染色体长臂覆盖度与短臂覆盖度的差异)的试剂在制备对宫颈癌进行筛查、诊断或风险分级的诊断剂中的用途。
又一方面,本发明涉及检测第2、3、5、8、11、17和18号染色体中至少1条的染色体不平衡(优选染色体长臂拷贝数与短臂拷贝数的差异,更优选染色体长臂覆盖度与短臂覆盖度的差异)的装置在制备对宫颈癌进行筛查、诊断或风险分级的设备中的用途。
【发明效果】
本发明通过高通量测序得以方便快捷地判断某个染色体是否存在染色体不平衡,进而以高灵敏度、特异性和准确度及低漏诊率和误诊率对宫颈癌进行筛查、诊断或风险分级。
【具体实施方式】
【定义】
在本发明的情景中,“宫颈癌”可包括任何类型的宫颈癌。本领域常见的宫颈癌类型可包括:鳞癌型(分为III级:I级为高分化鳞癌型,II级为中分化鳞癌型(非角化性大细胞型),III级为低分化鳞癌型(小细胞型))、腺癌型和腺鳞癌型(癌组织中含有腺癌型和鳞癌型两种类型)等三种类型。在本发明的情景中,宫颈癌还可包括任何个体的宫颈癌。在一个实施方式中,所述个体选自人和非人哺乳动物。
在本发明的情景中,“宫颈细胞”可包括位于宫颈口或宫颈管内壁的任何部位的细胞及从可能发生病变的宫颈的任何部位脱落的细胞。在一个实施方式中,宫颈细胞是通过人工方式从自宫颈口或宫颈管内壁脱落的细胞,也称为“宫颈脱落细胞”。
在本发明的情景中,“染色体”是指是细胞核中载有遗传信息的物质,在显微镜下呈圆柱状或杆状,主要由DNA和蛋白质组成。从着丝粒到染色体两端之间的部分称为染色体臂,如果着丝粒不在染色体的中央,则可区分为长臂(p)和短臂(q)。两臂的长度对于鉴别染色体是重要的。
在本发明的情景中,“染色体不平衡”是指相对于常见的二倍体基因组发生的基因组结构变异,可包括染色体数量的改变,如多倍体或单倍体;也包括染色体局部的改变,如染色体部分片段的扩增、缺失、插入或易位等。狭义的染色体不平衡则指非整倍性。在二倍体中,非整倍体变异有四种主要类型:(1)非整倍性缺体性:丢失一对同源染色体,即细胞的染色体数为2n-2;(2)非整倍性单体性:丢失单条染色体,即细胞的染色体数为2n-1;(3)非整倍性三体性:增加一条额外的染色体,即染色体组中有一条染色体具有三个拷贝。即细胞的染色体数为2n+1;(4)非整倍性 四体性:增加一对额外的染色体,使染色体组中有一条染色体具有四个拷贝。即细胞的染色体数为2n+2。
在一个实施方式中,染色体结构信息是反映染色体拷贝数变异的结构信息。
在本发明的情景中,“DNA”即脱氧核糖核酸(Deoxyribonucleic acid)是染色体的主要组成成分,同时也是主要遗传物质。
在本发明的情景中,“DNA片段文库”是指样品DNA片段经末端补齐、在5’端加一个磷酸集团、在3’端加一个腺嘌呤核苷酸(A),再在两端连接接头(Adapter)而得到的双链DNA。
在本发明的情景中,“接头(Adapter)”是指连接到样品DNA片段两端的固定序列,其中含有与测序芯片互补的序列部分、测序引物序列及样本标签(barcode)等。
在本发明的情景中,“样本标签(barcode)”是指在上述接头(Adapter)内的约5~15bp、优选约6~12bp、更优选约7~10bp、最优选约8bp的用来区分不同样本的标签序列。
在本发明的情景中,“高通量测序(High-throughput sequencing)”(又被称为下一代测序(Next-generation sequencing))是指能一次并行对几十万到几百万条DNA分子进行序列测定的测序技术。
在本发明的情景中,“读长(reads)”是指通过高通量测序测得的DNA片段文库中样品DNA片段(减去文库制备阶段连接上去的序列后的片段)的序列及其长度。
在本发明的情景中,“序列比对”是指使读长(reads)通过序列一致性原则对齐到参考基因组(例如人参考基因组)上。
在本发明的情景中,“参考基因组”是可从公共数据库获得的与样品DNA同种生物的全基因组序列。在一个实施方式中,所述参考基因组是人或非人哺乳动物的参考基因组。在一个实施方式中,所述公共数据库无特别限定。在优选的实施方式中,所述公共数据库是NCBI的GenBank。
在本发明的情景中,“染色体覆盖度”是指染色体长/短臂所有段(bin)的读长(reads)数的平均值。
在本发明的情景中,“宫颈良性疾病个体”是指患宫颈良性疾病的个体,其中所述宫颈良性疾病含宫颈上皮内瘤变、宫颈良性肿瘤、宫颈囊肿等。
在本发明的情景中,“健康群体”是指非宫颈疾病(但宫颈炎除外)且非其他癌症患者群体。在一个实施方式中,所述健康群体可包括宫颈炎群体。在一个实施方式 中,所述群体是人或非人哺乳动物群体。在一个实施方式中,所述非人哺乳动物可包括牛、马、猪、羊、狗、猫、猴、鼠等。
在本发明的情景中,“灵敏性”是指通过本发明的方法检测出的阳性样本占病理性诊断为宫颈癌的样本数量的百分比。在医学诊断中,灵敏性可通过如下公式表示,反映正确判断患者的比率:
灵敏性=真阳性人数/(真阳性人数+假阴性人数)×100%。
简言之,如果真阳性、假阳性、真阴性和假阴性分别以a、b、c、d来表示,则灵敏性、特异性、漏诊率、误诊率和准确度的关系可如下所示。
表1
Figure PCTCN2018098557-appb-000019
采用本方法筛查结果为阳性的病例数中,真阳性(a)表示病理诊断为患病(如患宫颈癌),同时本方法结果也为阳性的病例数;假阳性(b)表示病理诊断为无病(如非宫颈癌),同时本方法结果也为阳性的病例数;假阴性(c)表示病理诊断为患病(如患宫颈癌),本方法结果也为阴性的病例数;真阴性(d)表示病理诊断为无病(如非宫颈癌),同时本方法结果也为阴性的病例数。
灵敏性sen=a/(a+c);
特异性sep=d/(b+d);
漏诊率=c/(a+c);
误诊率=b/(b+d);
准确度=(a+d)/(a+b+c+d)
如本领域技术人员所知晓,灵敏性和特异性的值越高越好;漏诊率和误诊率值越低越好。
在本发明的情景中,“特异性”是指接受本方法检测的受检样本得出阴性检测结果的样本占病理检测诊断为非宫颈癌的样本的百分比。在医学诊断中,特异性可通过如下公式表示,反映正确判断非患者的比率:
特异性=真阴性人数/(真阴性人数+假阳性人数)×100%。
在本发明的情景中,“漏诊率”又称假阴性率,是指在受检群体中进行某疾病(如 宫颈癌)的筛检或诊断时,实际患病(如患宫颈癌)的受试者,而按本诊断方法及标准被定为非患者的百分率。在医学诊断中,漏诊率可通过如下公式表示:
漏诊率=假阴性人数/(真阳性人数+假阴性人数)×100%。
在本发明的情景中,“误诊率”又称假阳性率,是指在受检群体中进行某疾病(如宫颈癌)的筛检或诊断时,实际没患病(如宫颈癌)的受试者,而按本诊断方法及标准被定为患者的百分率。在医学诊断中,误诊率可通过如下公式表示:
误诊率=假阳性人数/(真阴性人数+假阳性人数)×100%。
在本发明的情景中,“约”表示偏差不超过所述特定数值或范围的正负10%。
在本发明的情景中,除非另外明确定义,单数形式“一个”、“一种”以及“所述”包括复数形式的指代物。类似地,除非另外明确定义,词语“或”旨在包括“和”。
【本发明的一个判断流程】
在本发明的一个实施方式中,判断某个染色体有染色体不平衡的与否及对宫颈癌进行筛查、诊断或风险分级的流程可包括:
(A)宫颈脱落细胞的采集;
(B)DNA提取、片段化及文库构建,也即从采集到的宫颈脱落细胞提取基因组DNA,对该基因组DNA进行片段化,及构建DNA片段文库;
(C)高通量测序,也即对所构建的DNA片段文库进行高通量测序;
(D)序列比对,也即将经高通量测序测得的样品DNA片段的序列(有效读长(reads))比对到人参考基因组,并例如按照10~1000kb/段(优选50~800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin),分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);及
(E)数据分析,也即基于第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq)的根据算法的运算值来判断所述染色体是否存在染色体不平衡、及对宫颈癌进行筛查、诊断或风险分级。
接下来,依次说明上述(A)~(E)的过程。
【A.宫颈脱落细胞的采集】
在本发明中,可通过本领域常用的方法采集受试者的宫颈脱落细胞。在一个实 施方式中,采集宫颈脱落细胞的方法可包括使用宫颈采样器刷下子宫颈内壁及宫颈口细胞,并将宫颈采样器刷浸入细胞保存液中,而使粘附在宫颈采样器刷的刷头上的宫颈脱落细胞游离到细胞保存液中而形成细胞混合液。通过对所述细胞混合液实施常规离心来分离得到其中的宫颈脱落细胞。
在本发明中,宫颈采样器的型号和样式无特别限定,只要能采集所需量的宫颈脱落细胞即可。在一个实施方式中,可采用任何可商购的宫颈采样器。在一个实施方式中,可采用Hologic公司的ThinPrep一次性宫颈采样器。
在本发明中,细胞保存液的组成无特别限定,只要能临时保存宫颈脱落细胞即可。在一个实施方式中,可采用任何可商购的宫颈脱落细胞用细胞保存液,也可根据常规方法配制所述宫颈脱落细胞用细胞保存液。在一个实施方式中,可采用Hologic公司的ThinPrep细胞保存液作为宫颈脱落细胞的细胞保存液。
在本发明中,对宫颈脱落细胞混合液进行离心的力度和次数不特别限定,只要能实现宫颈脱落细胞的分离即可。在一个实施方式中,以1200~2000g、更优选1400~1800g、最优选1600g的离心力对宫颈脱落细胞混合液进行1~5次、优选2次的离心。
【B.DNA提取、片段化及文库构建】
在本发明中,可采用任何本领域常规方法从宫颈脱落细胞提取基因组DNA。
在本发明中,可采用任何本领域常规方法对基因组DNA进行片段化及构建DNA片段文库。在优选的实施方式中,采用任何可商购的试剂盒对基因组DNA进行片段化及构建DNA片段文库。在一个实施方式中,采用Kapa公司的HyperPlus试剂盒对基因组DNA进行片段化及构建DNA片段文库。在一个实施方式中,利用试剂盒对基因组DNA进行片段化及构建DNA片段文库的过程可包括:
(i)对基因组DNA实施片段化(Fragmentation),由此得到小于800bp、优选100~600bp、更优选、100~500bp、更优选100~400bp、更优选100~300bp、更优选120~200bp、更优选150~180bp的DNA片段;
(ii)对得到的DNA片段实施末端修饰:
●将粘末端修复成平末端(End Repair),
●在经如上修复的DNA片段的5’端加一个磷酸集团,及
●在经如上修复的DNA片段的3’端加一个腺嘌呤核苷酸(A)(A-tailing);
(iii)在经如上修饰的DNA片段的末端连接接头(Adapter)和样本标签(barcode),其中接头(Adapter)的尺寸是100~200bp、优选100~150bp、更优 选120bp;
(iv)片段大小选择(Fragment Selection):对如上连接产物实施琼脂糖凝胶电泳,选取片段大小为优选200~800bp、更优选200~700bp、更优选200~600bp、更优选200~500bp、更优选220~350bp、更优选280~320bp的条带进行切胶回收,利用任何可商购的试剂盒回收正确连接接头和样本标签的DNA片段(即DNA片段文库);及
(v)文库扩增(Library Amplification):通过聚合酶链式反应(PCR)对如上正确连接接头和样本标签的DNA片段进行扩增。
【C.高通量测序】
在本发明中,只要能实现对DNA片段文库的高通量测序,对所采用的测序方法及仪器无特别限制。在一个实施方式中,采用可商购的测序仪对DNA片段文库进行高通量测序。在一个实施方式中,可采用Illumina公司的测序仪、Apply Biosystems(ABI)公司的测序仪、Roche公司的测序仪、Helicos公司的测序仪、或Complete Genomics公司的测序仪对DNA片段文库进行高通量测序。在优选的实施方式中,采用Illumina公司的测序仪对DNA片段文库进行高通量测序。
在本发明中,在测序完成后,从测得的序列减去接头(Adapter)和样本标签(barcode),并且去除噪音(如低质量区域)而得到样品DNA片段的序列,即有效读长(reads)。
【D.序列比对】
在本发明中,将有效读长(reads)比对到人参考基因组的手段无特别限制,可采用任何本领域常规手段进行该序列比对。在一个实施方式中,可采用BWA-MEM软件( http://bio-bwa.sourceforge.net)进行所述序列比对。
在本发明中,可将序列比对结果以每段(bin)任何合适的大小写入任何合适的文件格式。在一个实施方式中,将序列比对结果以每段10~1000kb、50~500kb、优选100~300kb、更优选200kb的大小写入多个例如*.bin(或*.bam)格式的文件。
在一个实施方式中,从所保存的诸多序列比对结果文件(例如,上述*.bin(或*.bam)文件)中选取人第i号染色体(Chri)的指定位置所覆盖的多个序列比对结果文件(例如,上述*.bin(或*.bam)文件),并计算所选取的序列比对结果文件(例如,上述*.bin(或*.bam)文件)中比对到人参考基因组上的有效读长(reads)的个数的平均数(cov Chri)。在一个实施方式中,所选取的人第i号染色体(Chri)的指 定位置是人第i号染色体(Chri)的长臂和短臂,比对到人参考基因组的第i号染色体(Chri)的长臂上的有效读长(reads)的个数的平均数用cov Chrip表示,比对到人参考基因组的第i号染色体(Chri)的短臂上的有效读长(reads)的个数的平均数用cov Chriq表示。在一个实施方式中,i是2、3、5、8、11、17和18中的至少1个、至少2个、至少3个、至少4个、至少5个、至少6个或全部。在一个实施方式中,i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。在本发明中,计算比对到人参考基因组上的有效读长(reads)的个数的平均数的手段无特别限制。
【E.数据分析】
在一个实施方式中,基于所述比对到人参考基因组的第i号染色体的指定位置上的读长(reads)的个数的平均数根据下文说明的例示算法的运算值判断所述染色体是否存在染色体不平衡、及对宫颈癌进行筛查、诊断或风险分级。在一个实施方式中,所述人第i号染色体(Chri)的指定位置是人第i号染色体(Chri)的长臂和短臂,比对到人参考基因组的第i号染色体(Chri)的长臂上的有效读长(reads)的个数的平均数用cov Chrip表示,比对到人参考基因组的第i号染色体(Chri)的短臂上的有效读长(reads)的个数的平均数用cov Chriq表示。在一个实施方式中,i是2、3、5、8、11、17和18中的至少1个、至少2个、至少3个、至少4个、至少5个、至少6个或全部。在一个实施方式中,i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
(1)例示算法
在一个实施方式中,根据以下公式计算人第i号染色体(Chri)的R值(R Chri):
Figure PCTCN2018098557-appb-000020
Figure PCTCN2018098557-appb-000021
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、3、5、8、11、17和18。
在一个实施方式中,基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000022
其中,
μR Chri是对应于健康群体(非宫颈疾病(宫颈炎除外)且非其他癌症患者)的R值的平均值;
σR Chri是对应于健康群体(非宫颈疾病(宫颈炎除外)且非其他癌症患者)的R值的标准偏差。
在一个实施方式的另一个实施方式中,可进一步基于上述Z分(Z Chri),根据以下公式3计算C分(CScore):
Figure PCTCN2018098557-appb-000023
在一个实施方式中,i是2、3、5、8、11、17和18中的至少1个、至少2个、至少3个、至少4个、至少5个、至少6个或全部。在一个实施方式中,i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
在本发明的情景中,有时也将上述“R值”、“Z分”和/或“C分”称为“染色体平衡态分值”。在本发明的情景中,有时也将上述“C分”称为“癌症评分”。
(2)判断标准
在一个实施方式中,根据上述例示算法,在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡。
在一个实施方式中,根据上述例示算法,
当满足以下条件之一时,判断为受试者患宫颈癌的风险高:
Z分的绝对值≥3;或者
C分>0;
当满足以下条件之一时,判断为受试者患宫颈癌的风险低:
Z分的绝对值<3;或者
C分=0。
【对于本发明的方法的进一步说明】
在一个实施方式中,也可将本发明的方法与其他判断某个染色体是否存在染色体不平衡的方法相组合。在一个实施方式中,也可将本发明的对宫颈癌进行筛查、诊断或风险分级的方法与其他诊断宫颈癌的方法组合。在一个实施方式中,所述其他诊断宫颈癌的方法可包括高危型HPV检测和宫颈脱落细胞的细胞学检查。在一个实施 方式中,所述对高危型HPV的检测方法可包括:形态学观察法、免疫组化法、点杂交法、吸印原位杂交法、PCR/RFLP法、PCR/Southern法等。在一个实施方式中,所述对宫颈脱落细胞的细胞学检查可包括薄层液基细胞学检测(Thin-Cytologic Test,TCT)。
【实施例】
接下来,通过实施例进一步说明本发明,但本发明不限于这些实施例。
【实施例1:宫颈脱落细胞的采集】
在第1轮研究中,共有研究对象107名,其中包括:通过组织病理学方法被诊断为宫颈癌的患者40名,被诊断为宫颈良性疾病(含宫颈上皮内瘤变、宫颈良性肿瘤、宫颈囊肿等)的个体41名,作为对照的健康个体(非宫颈疾病(宫颈炎除外)且非其他癌症个体)26名。
在第2轮研究中,共有研究对象167名,其中包括:通过组织病理学方法被诊断为宫颈癌的患者44名,被诊断为宫颈良性疾病(含宫颈上皮内瘤变、宫颈良性肿瘤、宫颈囊肿等)的个体69名,作为对照的健康个体(非宫颈疾病(宫颈炎除外)且非其他癌症个体)54名。
在第3轮研究中,共有研究对象167名,其中包括:通过组织病理学方法被诊断为宫颈癌的患者42名,被诊断为宫颈良性疾病(含宫颈上皮内瘤变、宫颈良性肿瘤、宫颈囊肿等)的个体68名,作为对照的健康个体(非宫颈疾病(宫颈炎除外)且非其他癌症个体)57名。
利用ThinPrep一次性宫颈采样器(Hologic公司)的采样刷在上述研究对象的子宫颈内壁顺时针刷10圈,随后将该采样刷的刷头浸入ThinPrep细胞保存液(Hologic公司)中,而使粘附在刷头上的宫颈内壁脱落组织游离到细胞保存液中而形成组织混合液。通过对该组织混合液实施2次于1600g的离心来分离得到其中的宫颈脱落细胞。
【实施例2:DNA提取、片段化及文库构建】
采用DNA提取试剂盒(Qiagen公司),根据该试剂盒自带的操作流程,从如上采集到的宫颈脱落细胞提取基因组DNA。
使用HyperPlus试剂盒(Kapa公司),根据该试剂盒自带的操作流程,对如上提取到的DNA进行片段化及文库构建,具体过程包括:
(i)对基因组DNA实施片段化(Fragmentation),得到150~180bp的DNA片段;
(ii)对得到的DNA片段实施末端修饰:
●将粘末端修复成平末端(End Repair),
●在经如上修复的DNA片段的5’端加一个磷酸集团,及
●在经如上修复的DNA片段的3’端加一个腺嘌呤核苷酸(A)(A-tailing);
(iii)在经如上修饰的DNA片段的末端连接接头(Adapter)和样本标签(barcode);
(iv)片段大小选择(Fragment Selection):对如上连接产物实施琼脂糖凝胶电泳,选取片段大小为280~320bp条带(其中,样品DNA片段的尺寸是150~180bp,接头和样本标签的尺寸是120bp)进行切胶回收,利用QIAquick胶回收试剂盒(QIAGEN,28706)来回收正确连接接头和样本标签的DNA片段(即DNA片段文库);及
(v)文库扩增(Library Amplification):通过聚合酶链式反应(PCR)对如上正确连接接头和样本标签的DNA片段进行扩增。
【实施例3:高通量测序】
使用Illumina测序仪,对于在实施例2中得到的经扩增的DNA片段文库,自该DNA片段文库的一端或两端开始进行测序,从测得的序列减去接头(Adapter)和样本标签(barcode),并且去除噪音(如低质量区域)而得到样品DNA片段的序列,即有效读长(reads)。
【实施例4:序列比对】
(1)有效读长(reads)与人参考基因组的比对
使用BWA-MEM软件( http://bio-bwa.sourceforge.net),将实施例3中得到的有效读长(reads)比对到人参考基因组,并将该比对结果以每段200kb的大小分别写入多个*.bin(或*.bam)格式的文件中。
(2)对比对到人参考基因组的读长(reads)的个数的计算
从(1)中得到诸多*.bin(或*.bam)文件中选取人第i号染色体(Chri)的长臂和短臂所覆盖的多个*.bin(或*.bam)文件,并分别计算第i号染色体长臂覆盖到 的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq)。
【实施例5:数据分析】
(1)算法
本实施例中所用的算法如下所示。
具体而言,根据以下公式计算人第i号染色体(Chri)的R值(R Chri):
Figure PCTCN2018098557-appb-000024
Figure PCTCN2018098557-appb-000025
其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,其中i选自2、3、5、8、11、17和18。
进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
Figure PCTCN2018098557-appb-000026
其中,
μR Chri是对应于26例健康群体(对照)的R值的平均值;
σR Chri是对应于26例健康群体(对照)的R值的标准偏差,
其中
i选自2、3、5、8、11、17和18。
进一步基于上述Z分(Z Chri),根据以下公式3计算C分(CScore):
Figure PCTCN2018098557-appb-000027
其中
i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
(2)判断标准
在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡。
当满足以下条件之一时,判断为受试者患宫颈癌的风险高:
Z分的绝对值≥3;或者
C分>0;
当满足以下条件之一时,判断为受试者患宫颈癌的风险低:
Z分的绝对值<3;或者
C分=0。
(3)计算结果
如在实施例1中所述,在第1轮研究中,共有研究对象107名,其中包括:通过组织病理学方法被诊断为宫颈癌的患者40名,被诊断为宫颈良性疾病(含宫颈上皮内瘤变、宫颈良性肿瘤、宫颈囊肿等)的个体41名(在下表2中用阴影表示),作为对照的健康个体(非宫颈疾病(宫颈炎除外)且非其他癌症个体)26名。针对上述107名研究对象,通过本发明的方法检测的结果如下表2所示。
表2:当i是2、3、5和8,或者是2、3、5、8和18时的从各样品计算得出的Z分和C分、以及灵敏性、特异性、漏诊率、误诊率和准确度
Figure PCTCN2018098557-appb-000028
Figure PCTCN2018098557-appb-000029
Figure PCTCN2018098557-appb-000030
Figure PCTCN2018098557-appb-000031
如在实施例1中所述,在第2轮研究中,共有研究对象167名,其中包括:通过组织病理学方法被诊断为宫颈癌的患者44名,被诊断为宫颈良性疾病(含宫颈上皮 内瘤变、宫颈良性肿瘤、宫颈囊肿等)的个体69名(在下表3中用阴影表示),作为对照的健康个体(非宫颈疾病(宫颈炎除外)且非其他癌症个体)54名。针对上述167名研究对象,通过本发明的方法检测的结果如下表3所示。
表3:当i是2、3、5和8,或者是2、3、5、8和18时的从各样品计算得出的Z分和C分、以及灵敏性、特异性、漏诊率、误诊率和准确度
Figure PCTCN2018098557-appb-000032
Figure PCTCN2018098557-appb-000033
Figure PCTCN2018098557-appb-000034
Figure PCTCN2018098557-appb-000035
Figure PCTCN2018098557-appb-000036
Figure PCTCN2018098557-appb-000037
如在实施例1中所述,在第3轮研究中,共有研究对象167名,其中包括:通过组织病理学方法被诊断为宫颈癌的患者42名,被诊断为宫颈良性疾病(含宫颈上皮内瘤变、宫颈良性肿瘤、宫颈囊肿等)的个体68名(在下表4中用阴影表示),作为对照的健康个体(非宫颈疾病(宫颈炎除外)且非其他癌症个体)57名。针对上述167名研究对象,通过本发明的方法检测的结果如下表4所示。
表4:当i是3、5和11,或者是3、5、11、17和18时的从各样品计算得出的Z分和C分、以及灵敏性、特异性、漏诊率、误诊率和准确度
Figure PCTCN2018098557-appb-000038
Figure PCTCN2018098557-appb-000039
Figure PCTCN2018098557-appb-000040
Figure PCTCN2018098557-appb-000041
Figure PCTCN2018098557-appb-000042
Figure PCTCN2018098557-appb-000043
【结论】
通过计算从受试者的宫颈脱落细胞提取的选自2、3、5和8号染色体的一个或多个、选自2、3、5、8和18号染色体的一个或多个染色体、选自3、5和11号染色体的一个或多个、或者选自3、5、11、17和18号染色体的一个或多个染色体的DNA的R值,并基于该R值进一步算出Z分和C分,可基于所述Z分和C分简便快捷地判断受试者的某个染色体有染色体不平衡、进而以高灵敏度、特异性和准确度及低漏诊率和误诊率对宫颈癌进行筛查、诊断或风险分级。
尽管本发明的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公开的所有教导,可对细节进行各种修改和变动,并且这些改变均在本发明的保护范围之内。本发明的全部范围由所附权利要求及其任何等同物给出。

Claims (12)

  1. 用于对宫颈癌进行筛查、诊断或风险分级的一组染色体,该组染色体包含第2、3、5、8、11、17和18号染色体中至少1条。
  2. 权利要求1的一组染色体,其为第2、3、5和8号染色体的组合,第2、3、5、8和18号染色体的组合,第3、5和11号染色体的组合,或者第3、5、11、17和18号染色体的组合。
  3. 计算机可读介质,其上存储有指令,其中当所述指令被处理器执行时,使得计算机执行以下操作:
    判断来自受试者(例如人)的样品的第2、3、5、8、11、17和18号染色体中至少1条是否存在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异是否高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异是否高于或等于阈值);
    例如,将来自受试者的样品的第2、3、5、8、11、17和18号染色体中至少1条的染色体结构信息(例如测定染色体不平衡、染色体长臂拷贝数与短臂拷贝数的差异、或染色体长臂覆盖度与短臂覆盖度的差异所需的结构信息)与来自健康个体的相应染色体的染色体结构信息进行比较,以确定来自所述个体的样品中上述染色体是否存在染色体不平衡(在染色体不平衡(例如染色体长臂拷贝数与短臂拷贝数的差异高于或等于阈值,再如染色体长臂覆盖度与短臂覆盖度的差异高于或等于阈值)的情况下,判断为受试者患有宫颈癌或者存在患宫颈癌的风险)。
  4. 权利要求3的计算机可读介质,其中通过以下方式判断染色体不平衡:
    将受试者(例如人)的全基因组数据序列(例如高通量测序技术获得的全基因组数据序列)比对到参考基因组(例如人的参考基因组Hg19),并例如按照10~1000kb/段(优选50~800kb/段,更优选100~500kb/段,更优选150~300kb/段,最优选200kb/段),平均分成多个段(例如bin);
    分别计算第i号染色体长臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chrip)和染色体短臂覆盖到的段(例如bin)的读长(reads)的平均数(cov Chriq);
    根据以下公式,计算R值:
    Figure PCTCN2018098557-appb-100001
    Figure PCTCN2018098557-appb-100002
    其中p代表长臂,q代表短臂,Chr为染色体(chromosome)的缩写,i选自2、3、5、8、11、17和18。
  5. 权利要求4的计算机可读介质,其中
    进一步基于上述人第i号染色体(Chri)的R值(R Chri),根据以下公式2计算人第i号染色体(Chri)的Z分(Z Chri):
    Figure PCTCN2018098557-appb-100003
    其中,
    μR Chri是对应于健康群体的R值的平均值;
    σR Chri是对应于健康群体的R值的标准偏差,以及
    任选地进一步基于上述Z分(Z Chri),根据以下公式3计算C分(CScore):
    Figure PCTCN2018098557-appb-100004
  6. 权利要求3~5之任一项的计算机可读介质,其中所述i选自2、3、5和8,选自2、3、5、8和18,选自3、5和11,或者选自3、5、11、17和18。
  7. 权利要求3~6之任一项的计算机可读介质,其中
    在Z分的绝对值≥3的情况下,判断为该第i号染色体有染色体不平衡;在Z分的绝对值<3的情况下,判断为该第i号染色体无染色体不平衡;而
    当满足以下条件之一时,判断为受试者患宫颈癌的风险高:
    Z分的绝对值≥3;或者
    C分>0;
    当满足以下条件之一时,判断为受试者患宫颈癌的风险低:
    Z分的绝对值<3;或者
    C分=0。
  8. 计算设备,其包含:
    权利要求3~7之任一项的计算机可读介质、及
    处理器。
  9. 系统,其包括:
    权利要求8的计算设备、及
    测序装置,其用于接收来自试验样品的核酸以提供来自该样品的核酸序列信息(例如,通过高通量测序技术获得的核酸序列信息)。
  10. 权利要求9的系统,其中所述测序装置为高通量测序仪。
  11. 检测第2、3、5、8、11、17和18号染色体中至少1条的染色体不平衡(优选染色体长臂拷贝数与短臂拷贝数的差异,更优选染色体长臂覆盖度与短臂覆盖度的差异)的试剂在制备对宫颈癌进行筛查、诊断或风险分级的诊断剂中的用途。
  12. 检测第2、3、5、8、11、17和18号染色体中至少1条的染色体不平衡(优选染色体长臂拷贝数与短臂拷贝数的差异,更优选染色体长臂覆盖度与短臂覆盖度的差异)的装置在制备对宫颈癌进行筛查、诊断或风险分级的设备中的用途。
PCT/CN2018/098557 2017-12-29 2018-08-03 宫颈癌的判断方法及系统 WO2019128233A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201711498314 2017-12-29
CN201711498314.9 2017-12-29
CN201810351270.5 2018-04-19
CN201810351270 2018-04-19

Publications (1)

Publication Number Publication Date
WO2019128233A1 true WO2019128233A1 (zh) 2019-07-04

Family

ID=67062975

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/098557 WO2019128233A1 (zh) 2017-12-29 2018-08-03 宫颈癌的判断方法及系统

Country Status (2)

Country Link
CN (1) CN109988833A (zh)
WO (1) WO2019128233A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102203283A (zh) * 2008-07-21 2011-09-28 新诊断学股份有限公司 宫颈细胞的细胞学分析方法
CN104313136A (zh) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 一种无创人肝癌早期检测与鉴别诊断方法及系统
CN104428425A (zh) * 2012-05-04 2015-03-18 考利达基因组股份有限公司 测定复杂肿瘤全基因组绝对拷贝数变异的方法
CN105653898A (zh) * 2016-01-12 2016-06-08 江苏格致生命科技有限公司 一种基于大规模数据挖掘的癌症检测试剂盒及检测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5919624A (en) * 1997-01-10 1999-07-06 The United States Of America As Represented By The Department Of Health & Human Services Methods for detecting cervical cancer
WO2014133369A1 (ko) * 2013-02-28 2014-09-04 주식회사 테라젠이텍스 유전체 서열분석을 이용한 태아 염색체 이수성의 진단 방법 및 장치
CN106156543B (zh) * 2016-06-22 2018-11-27 厦门艾德生物医药科技股份有限公司 一种肿瘤ctDNA信息统计方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102203283A (zh) * 2008-07-21 2011-09-28 新诊断学股份有限公司 宫颈细胞的细胞学分析方法
CN104428425A (zh) * 2012-05-04 2015-03-18 考利达基因组股份有限公司 测定复杂肿瘤全基因组绝对拷贝数变异的方法
CN104313136A (zh) * 2014-09-30 2015-01-28 江苏亿康基因科技有限公司 一种无创人肝癌早期检测与鉴别诊断方法及系统
CN105653898A (zh) * 2016-01-12 2016-06-08 江苏格致生命科技有限公司 一种基于大规模数据挖掘的癌症检测试剂盒及检测方法

Also Published As

Publication number Publication date
CN109988833A (zh) 2019-07-09

Similar Documents

Publication Publication Date Title
JP6760917B2 (ja) 多型カウントを用いたゲノム画分の分析
AU2018212272B2 (en) Diagnostic applications using nucleic acid fragments
US9784742B2 (en) Means and methods for non-invasive diagnosis of chromosomal aneuploidy
US9493839B2 (en) Non-invasive cancer diagnosis
TWI670495B (zh) 一種鑑定樣本中腫瘤負荷的方法和系統
WO2016095093A1 (zh) 肿瘤筛查方法、目标区域变异检测方法和装置
HUE030510T2 (hu) Magzati kromoszómális aneuploidia diagnosztizálása genomszekvenálás alkalmazásával
TW201718874A (zh) 血漿dna之單分子定序
EP3372686B1 (en) Biomarker for detection of lung adenocarcinoma and use thereof
TW201920683A (zh) 利用游離病毒核酸改善癌症篩選
WO2010118559A1 (zh) 一种癌症筛检的方法
CN105067822A (zh) 用于食管癌诊断的标志物
WO2016209703A1 (en) Head and neck squamous cell carcinoma assays
WO2012027483A2 (en) Defining diagnostic and therapeutic targets of conserved free floating fetal dna in maternal circulating blood
JP2016531596A (ja) 循環している癌のバイオマーカー及びその使用
CN111349699A (zh) 试剂盒及从宫颈分泌物中检测brca基因突变的方法
WO2012167112A2 (en) Gastric cancer biomarkers
CN109457031B (zh) BRCA2基因g.32338309A>G突变体及其在乳腺癌辅助诊断中的应用
CN116162709A (zh) Nmibc预后预测模型及其构建方法和应用
JP2013542733A (ja) 結腸直腸癌のスクリーニング方法
WO2019128233A1 (zh) 宫颈癌的判断方法及系统
JP6612509B2 (ja) 大腸癌の予後診断を補助する方法、記録媒体および判定装置
CN112695081A (zh) 原发性胆汁性胆管炎新的易感基因及其应用
US20110086773A1 (en) Diagnostic methods for oral cancer
CN113621695B (zh) Rif患者的子宫内膜容受性的标志物及其应用和检测试剂盒

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18897467

Country of ref document: EP

Kind code of ref document: A1