CN111378756A - Marker for screening benign and malignant pulmonary nodules and application thereof - Google Patents

Marker for screening benign and malignant pulmonary nodules and application thereof Download PDF

Info

Publication number
CN111378756A
CN111378756A CN202010368425.3A CN202010368425A CN111378756A CN 111378756 A CN111378756 A CN 111378756A CN 202010368425 A CN202010368425 A CN 202010368425A CN 111378756 A CN111378756 A CN 111378756A
Authority
CN
China
Prior art keywords
hnrnpd
slc38a1
mcemp1
traf5
morf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010368425.3A
Other languages
Chinese (zh)
Inventor
周寅
马瑞芹
杜晓利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Biomedlab Co ltd
Original Assignee
Shanghai Biomedlab Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biomedlab Co ltd filed Critical Shanghai Biomedlab Co ltd
Priority to CN202010368425.3A priority Critical patent/CN111378756A/en
Publication of CN111378756A publication Critical patent/CN111378756A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/166Oligonucleotides used as internal standards, controls or normalisation probes

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Pathology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Epidemiology (AREA)
  • Oncology (AREA)
  • Biomedical Technology (AREA)
  • Hospice & Palliative Care (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a marker for screening benign and malignant pulmonary nodules and application thereof. Markers include, but are not limited to, the genes CEACAM4, S100a9, FCGR1A, roddi, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38a1, TARBP1, SMA4, HNRNPD or SMC 1A.

Description

Marker for screening benign and malignant pulmonary nodules and application thereof
Technical Field
The invention relates to the technical field of molecular biology, in particular to a marker for screening benign and malignant pulmonary nodules and application thereof.
Background
Lung cancer is the leading cause of cancer death in urban population in China, and according to yearbook data counted by Ministry of health in 2011, the death rate of lung cancer in 2010 in China is 46.46 people/10 ten thousand people, and the death rate of malignant tumors is the first. The prognosis of lung cancer is closely related to the clinical stage at the time of diagnosis, wherein, the treatment effect on the ultra-early stage type Ia peripheral in-situ lung cancer is the best, the 5-year survival rate of patients after operation is up to 90%, and the total 5-year survival rate of patients in II-IV is reduced from 34% to below 5%. Therefore, the key to improve the cure rate of lung cancer and reduce the death rate is early discovery.
The current method of high resolution imaging examination such as low dose helical CT (LDCT) is a main means for early screening of lung cancer. LDCT sensitivity is high, with micro-nodules (nodule size less than 10mm) being found in about 20-30% of subjects in large-scale early lung cancer screening practices, but only 3.8% of them are malignant. At present, the clinical judgment of the benign and malignant properties of the pulmonary nodules is very difficult, mainly because the sizes of the pulmonary nodules are less than 10mm, biopsy sampling and pathological examination are difficult to carry out by means of fine needle puncture and the like, and the diagnostic value of the obtained results is very limited even if PET-CT examination is carried out.
Blood is the largest tissue organ of the human body, and blood cells are few, and are cell types capable of exchanging information with almost all tissue cells. When malignant diseases such as injury, inflammation and tumor of tissues and organs in vivo occur, a series of specific changes of microenvironment around cells of diseased tissues occur. When blood flows through each tissue organ, the microenvironment of pathological tissue cells exchanges information with blood cells, and the blood cells directly or indirectly respond to the changes to generate corresponding gene expression changes to participate in information transmission and exchange of various systemic systems such as immune system. This change in gene expression in blood cells is far before the body produces a clear sign change and contains some gene expression changes characteristic of the disease. Therefore, by closely monitoring the expression profile of blood cell genes, molecular information of early malignant diseases such as in vivo tumors and the like can be sensitively captured, and gene expression signals (markers) characteristic to the diseases are screened out, so that reliable evidence is provided for early detection and monitoring of the diseases. And peripheral blood gene expression detection is used as a simple and noninvasive detection mode, so that the detection is easy to accept by a detected person, the detection compliance is high, and the method has great application value for early screening/diagnosis of malignant tumors.
The detection of serum tumor markers related to lung cancer, such as carcinoembryonic antigen (CEA), neuron-specific enolase (NSE), squamous cell carcinoma antigen (SCC-Ag), cytokeratin 19 fragment and the like, has a certain reference value for the auxiliary diagnosis of middle and late stage lung cancer, but has little value for the diagnosis of stage Ia lung cancer. And the specificity of the serum protein marker on lung cancer detection is also poor, and other benign lesions such as pneumonia can cause abnormal concentration of the protein marker, thereby leading to false positive detection results. Therefore, in order to accurately discriminate the benign and malignant diseases of the micro-nodules and discover the ultra-early micro-nodule lung cancer, a blood detection technology and a product which can accurately discriminate the benign and malignant diseases of the micro-nodules are needed in the field.
Disclosure of Invention
Aiming at the defects that the prior art lacks a biomarker for accurately screening benign and malignant pulmonary nodules, particularly fine nodules and the like, the invention provides the peripheral blood gene marker suitable for screening the benign and malignant pulmonary nodules, the marker has good sensitivity and specificity for screening the benign and malignant pulmonary nodules, can accurately distinguish the malignant nodules, and has important significance for timely diagnosis, prognosis improvement and death rate reduction. By using the marker, only 2ml of peripheral venous blood needs to be collected for detection, the detection process is simple and convenient, and the marker is particularly suitable for matching with imaging detection modes such as CT and the like to be used for screening the ultra-early lung cancer of large-scale people.
The invention also provides a screening method of the marker, and the marker obtained by the method has good sensitivity and specificity on pulmonary nodules.
The invention also provides a model for screening the pulmonary nodules and a construction method thereof.
Accordingly, the present invention provides an isolated nucleic acid molecule which is a lung nodule marker, said nucleic acid molecule being 13-15000bp in length, said nucleic acid molecule having a DNA sequence selected from one, any more or all of the following, or a variant thereof having at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99% identity thereto, or the corresponding RNA sequence of said DNA sequence or variant: (1) the coding sequence of CEACAM4 or a fragment thereof matching SEQ ID NO:43, or the complement thereof, (2) the coding sequence of S100A9 or a fragment thereof matching SEQ ID NO:44, or the complement thereof, (3) the coding sequence of FCGR1A or a fragment thereof matching SEQ ID NO:45, or the complement thereof, (4) the coding sequence of ROGDI or a fragment thereof matching SEQ ID NO:46, or the complement thereof, (5) the coding sequence of ZNF266 or a fragment thereof matching SEQ ID NO:47, or the complement thereof, (6) the coding sequence of TNFSF10 or a fragment thereof matching SEQ ID NO:48, or the complement thereof, (7) the coding sequence of MORF or a fragment thereof matching SEQ ID NO:49, or the complement thereof, (8) the coding sequence of MCEMP1 or a fragment thereof matching SEQ ID NO:50, or the complement thereof, (9) the coding sequence of TRAF5 or a fragment thereof matching SEQ ID NO:51, or a complement thereof, (10) the coding sequence of SLC38A1 or a fragment thereof matching SEQ ID NO:52, or a complement thereof, (11) the coding sequence of TARBP1 or a fragment thereof matching SEQ ID NO:53, or a complement thereof, (12) the coding sequence of SMA4 or a fragment thereof matching SEQ ID NO:54, or a complement thereof, (13) the coding sequence of HNRNPD or a fragment thereof matching SEQ ID NO:55, or a complement thereof, and (14) the coding sequence of SMC1A or a fragment thereof matching SEQ ID NO:56, or a complement thereof.
In one or more embodiments, the nucleic acid molecule is DNA or RNA.
In one or more embodiments, the nucleic acid molecule is 13-50bp, 13-45bp, 13-40bp, 13-35bp, 13-31bp in length.
In one or more embodiments, the nucleic acid molecule has a length of 13-14000bp, 13-13000bp, 13-12000bp, 13-11000bp, 13-10000bp, 13-9710 bp.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode.
In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the coding sequence of CEACAM4 is shown in SEQ ID NO 1.
In one or more embodiments, the coding sequence of S100A9 is set forth in SEQ ID NO 2.
In one or more embodiments, the coding sequence of FCGR1A is depicted in SEQ ID NO 3.
In one or more embodiments, the coding sequence for ROGDI is shown in SEQ ID NO 4.
In one or more embodiments, the coding sequence for ZNF266 is set forth in SEQ ID NO 5.
In one or more embodiments, the coding sequence for TNFSF10 is set forth in SEQ ID NO 6.
In one or more embodiments, the coding sequence of MORF is as shown in SEQ ID NO 7.
In one or more embodiments, the coding sequence for MCEMP1 is shown in SEQ ID NO 8.
In one or more embodiments, the coding sequence for TRAF5 is set forth in SEQ ID NO 9.
In one or more embodiments, the coding sequence for SLC38A1 is shown in SEQ ID NO 10.
In one or more embodiments, the coding sequence of TARBP1 is set forth in SEQ ID NO. 11.
In one or more embodiments, the coding sequence for SMA4 is set forth in SEQ ID No. 12.
In one or more embodiments, the coding sequence of HNRNPD is shown in SEQ ID NO 13.
In one or more embodiments, SMC1A or the coding sequence thereof is as set forth in SEQ ID NO. 14.
In one or more embodiments, the nucleic acid molecule has a DNA sequence selected from any of the following, or a variant thereof having at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99% identity thereto, or their corresponding RNA sequence:
a 1: (3) (7), (8), (12) and (13),
a 2: (1) (7), (8), (10) and (11),
a 3: (1) (5), (7), (9) and (11),
a 4: (3) (6), (7), (8), (11) and (12),
a 5: (1) (3), (6), (8), (11) and (13),
a 6: (6) (9), (10), (11), (12) and (14),
a 7: (5) (6), (7), (8), (9), (10) and (13),
a 8: (4) (8), (9), (10), (11), (13) and (14),
a 9: (1) (3), (5), (7), (10), (12) and (13),
a 10: (2) (4), (7), (9), (10), (12) and (13),
a 11: (5) (6), (7), (8), (9), (10) and (13),
a 12: (3) (7), (8), (9), (11), (12), (13) and (14),
a 13: (1) (4), (7), (8), (9), (10), (12) and (13),
a 14: (6) (7), (8), (9), (10), (11), (12) and (14),
a 15: (1) (3), (4), (7), (9), (10), (12), (13) and (14),
a 16: (1) (4), (5), (8), (9), (10), (12), (13) and (14),
a 17: (2) (3), (8), (9), (10), (11), (12), (13) and (14),
a 18: (3) (4), (6), (7), (8), (10), (11), (12), (13) and (14),
a 19: (1) (2), (3), (4), (5), (9), (10), (11), (13) and (14),
a 20: (1) (2), (4), (5), (6), (8), (9), (10), (11) and (14),
a 21: (1) (2), (4), (5), (6), (7), (8), (11), (12), (13) and (14),
a 22: (1) (2), (3), (6), (8), (9), (10), (11), (12), (13) and (14),
a 23: (1) (2), (3), (4), (5), (6), (7), (8), (12), (13) and (14),
a 24: (3) (4), (5), (6), (7), (8), (9), (10), (11), (12), (13) and (14),
a 25: (1) (2), (3), (4), (6), (7), (8), (9), (10), (11), (13) and (14),
a 26: (1) (3), (4), (5), (6), (7), (8), (9), (10), (11), (12) and (13),
a 27: (1) (2), (3), (4), (5), (7), (8), (9), (10), (11) (12), (13) and (14),
a 28: (1) (2), (3), (4), (5), (6), (7), (8), (9), (10), (12), (13) and (14),
a 29: (1) (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (13) and (14), or
a 30: (1) (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12), (13) and (14).
In one or more embodiments, the nucleic acid molecule has the sequence shown in SEQ ID NOS: 1-14 or its corresponding RNA sequence. In one or more embodiments, the nucleic acid molecule has the sequence shown in SEQ ID NOS 43-56 or its corresponding RNA sequence.
In one or more of the embodiments above, the DNA sequence comprises a sense strand or an antisense strand of genomic DNA.
In one or more of the above embodiments, the nucleic acid molecule is used as an internal standard or control for detecting the transcription level, translation level, amount, methylation of the corresponding DNA sequence or RNA sequence in a sample.
In a second aspect, the invention provides a reagent for detecting a presence of a sample
(1) The transcriptional level, translational level, content or methylation of one or more of the nucleic acid molecules described in the first aspect herein; or
(2) The content or activity of the expression product of (1).
In one or more embodiments, the sample is from a mammal, preferably a human.
In one or more embodiments, the agent is an agent used in one or more methods selected from the group consisting of: PCR, DNA sequencing, RNA sequencing, DNA hybridization, RNA hybridization, Southern, Northern, Western, immunoprecipitation, ELISA, restriction enzyme analysis, high resolution melting curve, mass spectrometry, preferably RT-qPCR, RNA sequencing or chip-based methods.
In one or more embodiments, the agent is selected from one or more of the following: buffer, reverse transcriptase, polymerase, dNTP, primer, probe, restriction endonuclease, fluorescent dye, fluorescent quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard and reference substance.
In one or more embodiments, the agent is a primer. In one or more embodiments, the primer recognizes one or more of the nucleic acid molecules described herein in the first aspect. In one or more embodiments, the primer can be a primer for genome sequencing, such as a whole genome sequencing primer or a sequencing primer for a portion of a genome, and can also be a primer for amplifying a nucleic acid molecule described herein. Preferably, the primer has a sequence shown by one or more or all of SEQ ID NO 15-42.
In one or more embodiments, the agent is a nucleic acid probe. In one or more embodiments, the probe hybridizes to a nucleic acid molecule described herein in the first aspect under stringent conditions. Preferably, both ends of the sequence of the probe respectively comprise a fluorescent reporter group and a quencher group. Preferably, the nucleic acid probe has one or more or all of the sequences shown in SEQ ID NO 43-56 or the corresponding RNA sequences thereof.
The invention also provides a composition comprising an agent as described in the second aspect herein. Preferably, the composition comprises a primer and/or a nucleic acid probe as described in the second aspect herein.
The invention also provides gene chips comprising the nucleic acid probes and/or polypeptide probes described herein.
The invention also provides a kit comprising an agent as described in the second aspect herein and optionally a nucleic acid molecule as described in the first aspect herein. The kit is used for screening benign and malignant pulmonary nodules. In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the kit further comprises primers and/or probes for detecting the reference sequence. Preferably, the reference sequence is the sequence of GAPDH or a fragment thereof. Preferably, the primers of the reference sequence have the sequences shown in SEQ ID NO.57 and/or 58; the probe of the reference sequence has the sequence shown in SEQ ID NO.59 or its corresponding RNA sequence.
In one or more embodiments, the kit further comprises a fluorescent probe that specifically binds to the PCR amplified fragment or a non-specifically binding dye.
The third aspect of the invention also provides the use of one or more or all of the genes selected from the following or their DNA or RNA sequences in the genome of an animal in the preparation of an agent and/or a kit for screening benign and malignant lung nodules: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC 1A.
In one or more embodiments, the reagent detects in a sample (1) the transcription level, translation level, amount, or methylation of the gene; or (2) the content or activity of the expression product of (1); preferably, the reagent is a primer or a probe.
In one or more embodiments, the DNA or RNA sequence of the gene in the genome of the animal is as described herein for the nucleic acid molecule of the first aspect.
In one or more embodiments, the gene is selected from any one of the following groups:
a 1: FCGR1A, MCEMP1, MORF, SMA4 and HNRNPD,
a 2: CEACAM4, MCEMP1, MORF, SLC38A1 and TARBP1,
a 3: CEACAM4, TRAF5, MORF, ZNF266 and TARBP1,
a 4: FCGR1A, MCEMP1, MORF, TARBP1, TNFSF10 and SMA4,
a 5: CEACAM4, FCGR1A, MCEMP1, TARBP1, TNFSF10 and HNRNPD,
a 6: SLC38A1, TARBP1, TNFSF10, TRAF5, SMA4 and SMC1A,
a 7: HNRNPD, TNFSF10, SLC38A1, MCEMP1, ZNF266, TRAF5 and MORF,
a 8: MCEMP1, ROGDI, SLC38A1, TARBP1, TRAF5, HNRNPD and SMC1A,
a 9: CEACAM4, FCGR1A, MORF, SLC38A1, ZNF266, SMA4 and HNRNPD,
a 10: MORF, ROGDI, S100A9, SLC38A1, TRAF5, SMA4 and HNRNPD,
a 14: TRAF5, MCEMP1, MORF, SLC38A1, TNFSF10, ZNF266, and HNRNPD,
a 12: FCGR1A, MCEMP1, MORF, TARBP1, TRAF5, SMA4, HNRNPD and SMC1A,
a 13: CEACAM4, MCEMP1, MORF, ROGDI, SLC38A1, TRAF5, SMA4 and HNRNPD,
a 14: MCEMP1, MORF, SLC38A1, TARBP1, TNFSF10, TRAF5, SMA4 and SMC1A,
a 15: CEACAM4, FCGR1A, MORF, ROGDI, SLC38A1, TRAF5, SMA4, HNRNPD and SMC1A,
a 16: CEACAM4, MCEMP1, ROGDI, SLC38A1, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 17: FCGR1AMCEMP1, S100A9, SLC38A1, TARBP1, TRAF5, SMA4, HNRNPD and SMC1A,
a 18: FCGR1A, MCEMP1, MORF, ROGDI, SLC38A1, TARBP1, TNFSF10, SMA4, HNRNPD and SMC1A,
a 19: CEACAM4, FCGR1A, ROGDI, S100A9, SLC38A1, TARBP1, TRAF5, ZNF266, HNRNPD and SMC1A,
a 20: CEACAM4, MCEMP1, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266 and SMC1A,
a 21: CEACAM4, MCEMP1, MORF, ROGDI, S100A9, TARBP1, TNFSF10, ZNF266, SMA4, HNRNPD and SMC1A,
a 22: CEACAM4, FCGR1A, MCEMP1, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, SMA4, HNRNPD and SMC1A,
a 23: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, TNFSF10, ZNF266, SMA4, HNRNPD and SMC1A,
a 24: FCGR1A, MCEMP1, MORF, ROGDI, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 25: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, HNRNPD and SMC1A,
a 26: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, SMA4 and HNRNPD,
a 27: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 28: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TNFSF10, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 29: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, HNRNPD and SMC1A or
a 30: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, SMA4, HNRNPD and SMC 1A.
In one or more embodiments, the animal is a mammal, preferably a human.
In one or more embodiments, the genome is the human genome hg 19.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
The invention also provides the use of the nucleic acid molecule according to claim 1 for the production of a reagent and/or a kit for screening benign and malignant lung nodules.
The invention also provides the use of an agent as described herein, which detects in a sample (1) the transcription level, translation level, content or methylation of one or more of the nucleic acid molecules of the first aspect of the invention; or (2) the content or activity of the expression product of (1).
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments of use, the sample is from a mammal, preferably a human. In one or more embodiments, the sample is from a tissue, cell, or bodily fluid, such as non-tissue or blood. In one or more embodiments, the sample comprises plasma, serum, or blood.
In one or more embodiments, the agent is an agent used in one or more methods selected from the group consisting of: PCR, DNA sequencing, RNA sequencing, DNA hybridization, RNA hybridization, Southern, Western, immunoprecipitation, ELISA, restriction enzyme analysis, high resolution melting curve method, mass spectrometry, preferably RT-qPCR, RNA sequencing or chip-based methods.
In one or more embodiments, the agent is selected from one or more of the following: buffer, polymerase, dNTP, primer, probe, restriction enzyme, fluorescent dye, fluorescence quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard and contrast.
In one or more embodiments, the agent is a primer. In one or more embodiments, the primer recognizes one or more of the nucleic acid molecules described herein in the first aspect. In one or more embodiments, the primer can be a primer for genome sequencing, such as a whole genome sequencing primer or a sequencing primer for a partial genome, and can also be a PCR primer for amplifying a nucleic acid molecule described herein.
In one or more embodiments, the agent is a nucleic acid probe. In one or more embodiments, the probe hybridizes to a nucleic acid molecule described herein in the first aspect under stringent conditions. Preferably, both ends of the sequence of the probe respectively comprise a fluorescent reporter group and a quencher group.
In one or more embodiments, the agent is a polypeptide probe. In one or more embodiments, the polypeptide probe specifically binds to an expression product of a nucleic acid molecule as described herein in the first aspect.
In one or more embodiments, the kit is a real-time quantitative PCR kit, an RNA sequencing kit, or a kit containing a gene chip.
In one or more embodiments, the real-time quantitative PCR kit comprises primers for specifically amplifying one or more of the sequences shown in SEQ ID NO. 1-14. Preferably, the primer has a sequence shown by one or more or all of SEQ ID NO 15-42.
In one or more embodiments, the gene chip-containing kit comprises a nucleic acid probe that hybridizes to one or more sequences shown in SEQ ID NOS: 1-14 under stringent conditions. Preferably, the nucleic acid probe has one or more or all of the sequences shown in SEQ ID NO 43-56 or the corresponding RNA sequences thereof.
The invention also provides a method for screening benign and malignant pulmonary nodules, which comprises the following steps:
(1) obtaining expression measurements of one or more or all of the genes in the sample selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
(2) obtaining a score using the expression detection result by constructing a model,
(3) and screening whether the lung nodules are benign or malignant according to the scores.
In one or more embodiments, the expression detection result is an RNA content, an expression product content, or an RT-PCR result of the gene.
In one or more embodiments, the expression detection result is a detection result of a nucleic acid molecule or an expression product thereof as described in the first aspect herein, preferably the amount of RNA, the amount of an expression product of a nucleic acid molecule, or the RT-PCR result of a nucleic acid molecule.
In one or more embodiments, step (1) comprises performing an assay on the sample to obtain the expression assay result.
In one or more embodiments, the gene of step (1) is selected from any one of the groups a1-a30 described in the third aspect herein.
In one or more embodiments, the sample is a lung tissue sample or a lung nodule sample.
In one or more embodiments, the sample is blood, serum, or plasma. Preferably blood, serum or plasma of peripheral blood.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the model in step (2) is a logistic regression model.
In one or more embodiments, the model is as follows:
Figure BDA0002477295440000111
therefore, the temperature of the molten metal is controlled,
Figure BDA0002477295440000112
wherein, Delta Ct1To Δ CtnThe difference between the Ct values of the quantitative PCR cycles of the nth nucleic acid molecule and the reference sequence, b0To bnRespectively corresponding logistic regression model parameters; σ (z), i.e., P, is the score.
In one or more embodiments, step (3) comprises: when the score meets a threshold, the lung nodule is identified as benign or malignant. In one or more embodiments, a score greater than or equal to a threshold identifies a lung nodule as malignant and a score less than the threshold identifies a lung nodule as benign.
The present invention also provides an apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of:
(1) obtaining expression measurements of one or more or all of the genes in the sample selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
(2) obtaining a score using the expression detection result by constructing a model,
(3) and screening whether the lung nodules are benign or malignant according to the scores.
In one or more embodiments, the expression detection result is an RNA content, an expression product content, or an RT-PCR result of the gene.
In one or more embodiments, the expression detection result is a detection result of a nucleic acid molecule or an expression product thereof as described in the first aspect herein, preferably the amount of RNA, the amount of an expression product of a nucleic acid molecule, or the RT-PCR result of a nucleic acid molecule.
In one or more embodiments, step (1) comprises performing an assay on the sample to obtain the assay result.
In one or more embodiments, the sample is a lung tissue sample or a lung nodule sample.
In one or more embodiments, the sample is blood, serum, or plasma. Preferably blood, serum or plasma of peripheral blood.
In one or more embodiments, the model in step (2) is a logistic regression model.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the gene of step (1) is selected from any one of the groups a1-a30 described in the third aspect herein.
The present invention also provides a computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of:
(1) obtaining expression measurements of one or more or all of the genes in the sample selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
(2) obtaining a score using the expression detection result by constructing a model,
(3) and screening whether the lung nodules are benign or malignant according to the scores.
In one or more embodiments, the expression detection result is an RNA content, an expression product content, or an RT-PCR result of the gene.
In one or more embodiments, the expression detection result is a detection result of a nucleic acid molecule or an expression product thereof as described in the first aspect herein, preferably the amount of RNA, the amount of an expression product of a nucleic acid molecule, or the RT-PCR result of a nucleic acid molecule.
In one or more embodiments, step (1) comprises performing an assay on the sample to obtain the assay result.
In one or more embodiments, the sample is a lung tissue sample or a lung nodule sample.
In one or more embodiments, the sample is blood, serum, or plasma. Preferably blood, serum or plasma of peripheral blood.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the model in step (2) is a logistic regression model.
In one or more embodiments, the gene of step (1) is selected from any one of the groups a1-a30 described in the third aspect herein.
The invention also provides a system for screening benign and malignant pulmonary nodules, which is characterized by comprising the following components:
a collection device for obtaining expression detection results of one or more or all of the genes selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A, preferably the gene is selected from any of a1-a30 as described in the third aspect herein,
data processing means for obtaining a score using the detection result by constructing a model,
and the judging device is used for screening the benign and malignant pulmonary nodules according to the scores.
In one or more embodiments, the acquisition device is a device that inputs the results of the test
In one or more embodiments, the collection device is a detection device. The detection device comprises a sample processing device and a measuring device.
In one or more embodiments, the expression detection result is an RNA content, an expression product content, or an RT-PCR result of the gene.
In one or more embodiments, the expression detection result is a detection result of a nucleic acid molecule or an expression product thereof as described in the first aspect herein, preferably the amount of RNA, the amount of an expression product of a nucleic acid molecule, or the RT-PCR result of a nucleic acid molecule.
In one or more embodiments, step (1) comprises performing an assay on the sample to obtain the assay result.
In one or more embodiments, the sample is a lung tissue sample or a lung nodule sample.
In one or more embodiments, the sample is blood, serum, or plasma. Preferably blood, serum or plasma of peripheral blood.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the model in step (2) is a logistic regression model.
The invention also provides a method for screening the benign and malignant screening genes of the lung nodules, which comprises the following steps
(1) Acquiring gene expression data of a malignant lung nodule sample and a benign lung nodule sample,
(2) genes expressed in all samples were selected for T-test analysis,
(3) comparing the gene expression difference between the malignant lung nodule sample and the benign lung nodule sample, taking the gene with the statistical P value less than 0.05 and the gene expression change more than 1.1 times as a candidate gene,
optionally (4) selecting genes having a high correlation with malignant lung nodules as candidate genes in the ranking of the correlation between the gene expression data and the benign or malignant nature of the sample,
(5) screening candidate genes by using a logistic regression statistical analysis method,
optionally (6) screening genes with consistent expression changes of the real-time fluorescence quantitative PCR method and the gene chip method in the candidate genes obtained in the step (5),
and (7) optionally, utilizing a logistic regression statistical analysis method to construct a model to evaluate the screening effect of the genes obtained in the step (5) on the benign and malignant pulmonary nodules.
In one or more embodiments, the gene expression data is the amount of RNA, the amount of nucleic acid molecule expression product.
In one or more embodiments, the lung nodule benign and malignant screening gene is selected from one or more or all of the following: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC 1A. In one or more embodiments, the lung nodule benign and malignant screening gene is selected from any one of the groups a1-a30 described in the third aspect herein.
In one or more embodiments, the correlation in step (4) is expressed as a pearson correlation coefficient.
In one or more embodiments, step (5) comprises modeling the candidate genes using logistic regression statistical analysis and screening the candidate genes for AUC size.
In one or more embodiments, the model of step (5) and/or (7) is as follows
Figure BDA0002477295440000151
Wherein, Delta Ct1To Δ CtnDifference in Ct values of quantitative PCR cycles for n markers and the reference sequence, respectively, b0To bnRespectively corresponding logistic regression model parameters; σ (z), i.e., P, is the score.
In one or more embodiments, the sample is a lung tissue sample or a lung nodule sample.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the sample is blood, serum, or plasma. Preferably blood, serum or plasma of peripheral blood.
The invention also provides a method for constructing a model for screening benign and malignant pulmonary nodules, which comprises the following steps:
(1) obtaining expression measurements of one or more or all of the genes in the sample selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
(2) and calculating the ROC curve and the area value under the curve of each candidate gene combination, and establishing a model by using a logistic regression statistical analysis method.
In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the detection result is the amount of RNA, the amount of an expression product of a nucleic acid molecule, or the difference in the Ct values of quantitative PCR cycles of a nucleic acid molecule and an internal reference gene.
In one or more embodiments, step (1) comprises performing an assay on the sample to obtain the assay result.
In one or more embodiments, the model is as follows
Figure BDA0002477295440000161
Wherein, Delta Ct1To Δ CtnDifference in Ct values of quantitative PCR cycles for n markers and the reference sequence, respectively, b0To bnRespectively corresponding logistic regression model parameters; σ (z), i.e., P, is the score.
In one or more embodiments, the gene of step (1) is selected from any one of the groups a1-a30 described in the third aspect herein.
The invention also provides a model for screening benign and malignant pulmonary nodules, which is constructed according to the method for constructing the model for screening benign and malignant pulmonary nodules. In one or more embodiments, the pulmonary nodule is a pulmonary micronode. In one or more embodiments, the lung nodule is a lung nodule that is less than 10mm in diameter.
In one or more embodiments, the model is as follows
Figure BDA0002477295440000162
Wherein, Delta Ct1To Δ CtnDifference in Ct values of quantitative PCR cycles for n markers and the reference sequence, respectively, b0To bnRespectively corresponding logistic regression model parameters; σ (z), i.e., P, is the score.
Drawings
FIG. 1 shows the expression of different genes in benign and malignant samples. The gene is used as the abscissa, and the expression value of RT-PCR is used as the ordinate.
FIG. 2 is a ROC curve of an exemplary embodiment with false positive rate (1-specificity) as the abscissa and true positive rate (sensitivity) as the ordinate. The closer the AUC is to 1, the better the diagnostic effect.
Detailed Description
The inventors have made several attempts to finally determine that the expression difference of the 14 gene is most likely to represent the difference between benign and malignant lung nodules. These 14 genes are: CEACAM4(SEQ ID NO:1), S100A9(SEQ ID NO:2), FCGR1A (SEQ ID NO:3), ROGDI (SEQ ID NO:4), ZNF266(SEQ ID NO:5), TNFSF10(SEQ ID NO:6), MORF (SEQ ID NO:7), MCEMP1(SEQ ID NO:8), TRAF5(SEQ ID NO:9), SLC38A1(SEQ ID NO:10), TARBP1(SEQ ID NO:11), SMA4(SEQ ID NO:12), HNRNPD (SEQ ID NO:13) and SMC1A (SEQ ID NO: 14). Herein, the sequences shown as well as the sequence listing are considered as sense strands.
The nodule is a granulomatous disease with unknown etiology, multiple systems and multiple organs, and often invades organs such as lung, bilateral lung portal lymph nodes, eyes, skin and the like. A pulmonary micro-or nodule is a pulmonary nodule with a nodule size of less than 10 mm. The lung nodules may be malignant or benign, and malignant lung nodules also become lung cancers.
The "marker" as described herein may be either a nucleic acid molecule or an expression product thereof. The nucleic acid molecule can be isolated from animals, or artificially synthesized nucleic acid molecule with animal genome fragment sequence.
Herein, methods for detecting gene expression levels are well known in the art, such as PCR, DNA sequencing, RNA sequencing, DNA hybridization, RNA hybridization, Southern, Western, immunoprecipitation, ELISA, restriction enzyme analysis, high resolution melting curve methods, mass spectrometry, preferably RT real-time quantitative PCR, RNA sequencing or chip-based methods.
Accordingly, the present invention relates to an agent for detecting the expression level of a gene. Reagents used in the above-described methods for detecting the expression level of a gene are well known in the art. Illustratively, the agent for detecting the level of gene expression may comprise one or more of: buffer, polymerase, dNTP, primer, probe, restriction enzyme, fluorescent dye, fluorescence quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard and contrast. In the detection methods involving DNA amplification, the detection reagents include primers. The agent for detecting the expression level of a gene may also be a polypeptide probe such as an antibody. The polypeptide probe specifically binds to the expression product of a nucleic acid molecule as described herein, e.g.the polypeptide probe is an antibody that specifically binds to the expression product of a nucleic acid molecule as described herein in relation to the first aspect. The agent for detecting the expression level of a gene may also be a nucleic acid probe. Typically, the sequence of the nucleic acid probe is labeled at the 5 'end with a fluorescent reporter group and at the 3' end with a quencher group. The nucleic acid probes described herein can hybridize to a genomic sequence, a coding sequence, an RNA sequence, or a cDNA sequence of a gene described herein under stringent conditions. Stringent conditions for nucleic acid hybridization are known to those skilled in the art. Preferably, the conditions are such that the sequences are at least about 65%, 70%, 75%, 85%, 90%, 95%, 98% or 99% homologous to each other, typically remaining hybridized to each other. Non-limiting examples of stringent hybridization conditions are hybridization in a high salt buffer containing 6XSSC, 50mM Tris-HCl (pH7.5), 1m MEDTA, 0.02% PVP, 0.02% Ficolll, 0.02% BSA and 500mg/ml denatured salmon sperm DNA at 65 ℃ and optionally washed once or twice in 0.2XSSC, 0.01% BSA at 50 ℃.
In an exemplary embodiment, the present invention uses RT-qPCR to detect gene expression. The steps and required reagents for conventional RT-qPCR are known in the art. Typically, the steps of RT-qPCR include: reverse transcription, denaturation, annealing and extension. Reagents used for RT-qPCR include: buffer solution, reverse transcriptase, primer, probe, dNTP, DNA polymerase and reference substance.
Herein, the sample is from a mammal, preferably a human. The sample may be from any organ (e.g., lung), tissue (e.g., epithelial tissue, connective tissue, muscle tissue, and neural tissue), cell (e.g., lung nodule cells), or bodily fluid (e.g., blood, plasma, serum, interstitial fluid, urine). In an exemplary embodiment, the sample is peripheral blood.
The invention also relates to a kit for identifying a pulmonary nodule comprising an agent as described herein, in particular in the second aspect herein. The kit may further comprise a nucleic acid molecule as described herein, in particular according to the first aspect, as an internal standard or positive control. In addition to the reagents and nucleic acid molecules, the kit may also contain other reagents required for detecting expression. Illustratively, the kit may comprise one or more of: means and/or reagents for collecting a sample from a subject, means and/or reagents suitable for long-term storage of a sample and/or control, means and/or reagents for pre-treating a sample and/or control.
As used herein, a "primer" refers to a nucleic acid molecule having a specific nucleotide sequence that directs the synthesis at the initiation of nucleotide polymerization. The primers are typically two oligonucleotide sequences synthesized by man, one primer complementary to one DNA template strand at one end of the target region and the other primer complementary to the other DNA template strand at the other end of the target region, which functions as the initiation point for nucleotide polymerization. Primers designed artificially in vitro are widely used in Polymerase Chain Reaction (PCR), qPCR, sequencing, probe synthesis, and the like. Generally, the primers are designed such that the amplified products are 50-150 bp, 60-140, 70-130, 80-120bp in length. The primers contained in the reagents herein may be primers for genome sequencing, such as whole genome sequencing primers or sequencing primers directed to a region of the genome, and may also be PCR primers for amplifying a gene or a specific DNA region or fragment thereof. Methods for designing whole genome sequencing primers or PCR primers for a specific region or site in a region are known in the art.
The term "variant" or "mutant" as used herein refers to a polynucleotide that has a nucleic acid sequence altered by insertion, deletion or substitution of one or more nucleotides compared to a reference sequence, while retaining its ability to hybridize to other nucleic acids. A mutant according to any of the embodiments herein comprises a nucleotide sequence having at least 70%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 97% sequence identity to a reference sequence and retaining the biological activity of the reference sequence. Sequence identity between two aligned sequences can be calculated using, for example, BLASTn from NCBI. Mutants also include nucleotide sequences that have one or more mutations (insertions, deletions, or substitutions) in the reference sequence and in the nucleotide sequence, while still retaining the biological activity of the reference sequence. The plurality of mutations typically refers to within 1-10, such as 1-8, 1-5, or 1-3. The substitution may be a substitution between purine nucleotides and pyrimidine nucleotides, or a substitution between purine nucleotides or between pyrimidine nucleotides. The substitution is preferably a conservative substitution. For example, conservative substitutions with nucleotides of similar or analogous properties are not typically made in the art to alter the stability and function of the polynucleotide. Conservative substitutions are, for example, exchanges between purine nucleotides (A and G), exchanges between pyrimidine nucleotides (T or U and C). Thus, substitution of one or more sites with residues from the same in the polynucleotides of the invention will not substantially affect their activity.
In certain embodiments, the markers of the invention are screened by: 1) peripheral blood samples were collected from patients with malignant and benign micro-nodules. All patients with the micro-nodules are confirmed to have the nodule size smaller than or equal to 10mm through CT examination, and 2-3ml of peripheral blood is collected from each sample; 2) obtaining peripheral blood gene expression profile data of a sample by using a gene chip, carrying out normalization processing on the expression profile data, 3) selecting genes with proper expression (signal value between 100-10000) in all samples to carry out T test analysis, comparing peripheral blood gene expression difference of micro-nodule lung cancer and benign nodules, and carrying out subsequent analysis by taking the genes with statistical P value less than 0.05 and gene expression fold change more than 1.1 times as candidate genes; 4) analyzing the correlation between the candidate genes and the micro-nodular lung cancer, sorting according to the correlation coefficient of the genes and the micro-nodular lung cancer, and selecting a group of gene queues (gene queues I) with high correlation with the micro-nodular lung cancer; in addition, the remaining genes were correlated with the genes in cohort I, and another set of gene cohorts (gene cohort II) with high correlation to gene cohort I was selected; pairing the genes in the gene queue I and the gene queue II pairwise to form a series of candidate gene combinations; 5) evaluating the screening effect of each candidate gene combination on the micro-nodular lung cancer by using a logistic regression statistical analysis method, calculating a receiver operating characteristic curve (ROC curve) and an area value AUC under the curve of each candidate gene combination, and screening a series of gene combinations with good screening capability on the micro-nodular lung cancer; 6) verifying the screened serial gene combinations by using a real-time fluorescent quantitative PCR method, and reserving genes with consistent expression changes in quantitative PCR and gene expression profile chip detection as peripheral blood characteristic genes of the micro-nodular lung cancer, wherein the significance of the 14 lung cancer characteristic genes is shown in figure 1; 8) and (3) evaluating the diagnosis effect of any gene combination in the 14 genes on the micro-nodule lung cancer by using a logistic regression statistical analysis method, calculating the ROC AUC of the gene combination, and establishing a micro-nodule lung cancer discrimination model shown as the following:
Figure BDA0002477295440000191
wherein σ (z), P, is the risk value, i.e., score, for small-node lung cancer (malignant lung nodules); delta Ct1To Δ CtnDifference in Ct values of quantitative PCR cycles for n markers and the reference sequence, respectively, b0To bnRespectively corresponding logistic regression model parameters; z is a logistic regression log likelihood ratio (log likelihood ratio).
In one or more embodiments, the expression level of a subject sample is increased or decreased when compared to a control sample. The expression of the gene to be tested is mathematically analyzed to obtain a score. And for the detected sample, when the score is larger than the threshold value, judging that the result is positive, namely the malignant nodule, otherwise, judging that the result is negative, namely the benign nodule. Methods of conventional mathematical analysis and processes of determining thresholds are known in the art, and exemplary methods are mathematical models such as logistic regression models, support vector machines, and random forest models. For example, for differentially expressed markers, a logistic regression model, a support vector machine model, or a random forest model is constructed for two groups of samples, and the accuracy, sensitivity and specificity of the detection results and the area under the predictive value characteristic curve (ROC) (AUC) are counted by the model to calculate the prediction scores of the samples in the test set.
The person skilled in the art is aware of conventional methods for constructing logistic regression models. An exemplary model is as follows:
Figure BDA0002477295440000202
Figure BDA0002477295440000201
furthermore, a computer-readable storage medium storing a computer program is disclosed, the computer program stored on the storage medium being operative to perform the method for screening for malignancy and/or benign of a lung nodule as described herein. The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
Examples
The present invention will be described in further detail with reference to the following drawings and specific examples. In the following examples, the experimental methods without specifying specific conditions were generally carried out in the same manner as described in the conventional conditions.
Example 1 screening of peripheral blood characteristic Gene markers for Lung cancer
The screening of the lung cancer characteristic gene comprises the following steps:
1) peripheral blood samples were collected from 48 patients with malignant micro-nodules diagnosed with lung cancer by surgery and pathological examination, and 38 patients with benign micro-nodules (benign lesions in the lung). All patients with minimal nodules were confirmed by CT examination to have a nodule size less than or equal to 10mm, and 2-3ml of peripheral blood was collected per sample.
2) Total RNA of the sample is extracted by using a PAXgene Blood RNA Kit extraction Kit, the fragment integrity (RIN) of the RNA sample is detected by an Agilent bioanalyzer 2100 bioanalyzer, and the purity of the RNA sample is detected by a Nano1000 micro ultraviolet spectrophotometer. All RNA samples must meet the following quality control conditions: the RNA yield is more than 2 micrograms, the ratio of 28S/18S peak is more than 1, the RIN value is more than 7, and the absorbance ratio of 260nm/280nm is more than 1.8.
3) Detecting the peripheral blood total RNA sample by adopting an Affymetrix Gene Profiling Array U133Plus2 chip (human whole Gene Expression Profiling chip) to obtain peripheral blood Gene Expression Profiling data of the sample, then normalizing (normalizing) the peripheral blood Gene Expression Profiling data by adopting a MAS5 method in Affymetrix Expression Console software, eliminating system errors possibly generated in the detection process of the Expression Profiling chip and obtaining peripheral blood Gene Expression Profiling data which can be compared uniformly.
4) Eliminating over-high and over-low gene expression signals in the peripheral blood gene expression profile, selecting genes with proper expression (signal value between 100-10000) in all samples to perform T test analysis, comparing peripheral blood gene expression difference of micro-nodule lung cancer and benign nodules, and performing subsequent analysis by taking the genes with statistical P value less than 0.05 and gene expression fold change more than 1.1 times as candidate genes.
5) Analyzing the correlation between the candidate genes and the micro-nodular lung cancer, sorting according to the correlation coefficient of the genes and the micro-nodular lung cancer, and selecting a group of gene queues (gene queues I) with high correlation with the micro-nodular lung cancer; in addition, the remaining genes were correlated with the genes in cohort I, and another set of gene cohorts (gene cohort II) with high correlation to gene cohort I was selected. Then pairwise matching the genes in the gene array I and the gene array II to form a series of candidate gene combinations.
6) And evaluating the screening effect of each candidate gene combination on the micro-nodular lung cancer by using a logistic regression statistical analysis method, calculating a receiver operator characteristic Curve (ROC Curve) and an Area Under the Curve (Area Under Curve) of each candidate gene combination, and screening a series of gene combinations with good screening capability on the micro-nodular lung cancer.
7) The screened serial gene combinations are verified by a real-time fluorescent quantitative PCR method, genes with consistent expression change in quantitative PCR and gene expression profile chip detection are reserved as peripheral blood characteristic genes of the micro-nodule lung cancer, 14 lung cancer characteristic genes (the gene sequences are shown as SEQ ID NO. 1-SEQ ID NO. 14) such as CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A are mainly screened, and the 14 lung cancer characteristic genes are shown as a figure 1 in significance.
8) And (3) evaluating the diagnosis effect of any 7 gene combinations selected from the 14 genes on the micro-nodule lung cancer by using a logistic regression statistical analysis method, calculating the ROC AUC of the gene combinations, and establishing a micro-nodule lung cancer discrimination model shown as the following:
Figure BDA0002477295440000221
wherein σ (z), i.e., P, is the risk value for developing micro-node lung cancer (malignant lung nodules); b0To b7Respectively corresponding logistic regression model parameters; delta Ct1To Δ Ct7Respectively is the difference value of the quantitative PCR cycle number Ct values of 7 micro-nodule lung cancer gene markers and the reference gene; z is a logistic regression log likelihood ratio (log likelihoodoratio).
Example 2 detection of micro-Small node Lung cancer Using the selected Lung cancer characteristic Gene markers
This example exemplifies the use of 7 gene combinations: HNRNPD, TNFSF10, SLC38A1, MCEMP1, ZFN266, TRAF5, MORF for diagnosis.
The method comprises the following steps:
1) collecting peripheral blood samples of the sample to be detected: collecting peripheral blood samples of patients using BD PAXgene RNA blood collection tubes from QIAGEN;
2) extracting and purifying total RNA in a peripheral blood sample of a sample to be detected: total RNA in purified peripheral blood was extracted using PAXgene BloodRNA Kit from QIAGEN, and the integrity and yield of the extracted total RNA fragments were determined using an Agilent BioAnalyzer model 2100 micro-electrophoresis analyzer. Detecting the purity of the RNA sample by using a Nano1000 micro ultraviolet spectrophotometer;
3) reverse transcription reaction: cDNA was synthesized by reverse transcription reaction using High-Capacity cDNA reverse transcription kit from Life Technology, using total RNA as a template and oligo (dT) as a reverse transcription primer.
4) Fluorescent quantitative RT-PCR detection: this example exemplarily uses 7-gene Panel combination (HNRNPD, TNFSF10, SLC38a1, MCEMP1, ZFN266, TRAF5, MORF 7 gene combination) and the related sequence of the internal reference gene GAPDH to design corresponding specific primers and/or probes or use SYBRGreen dye that can non-specifically bind to PCR amplification fragments, and uses cDNA obtained by reverse transcription as amplification template to perform real-time fluorescence quantitative PCR reaction, and uses GAPDH gene as the internal reference gene (the amplification primer sequence of the internal reference gene GAPDH is shown in SEQ ID No. 57-SEQ ID No.58, and the probe sequence is shown in SEQ ID No. 59), to obtain the mRNA relative content of the 6 gene markers in peripheral blood samples. Table 1 below shows the fluorescent quantitative PCR reaction system. The primer and probe sequences designed for the lung cancer signature genes are listed in table 2 below.
TABLE 1 fluorescent quantitative PCR reaction System
Reagent Concentration of Volume of
Characteristic gene primer 800nM 2μL
Characteristic gene fluorescent probe 200nM 0.5μL
Internal reference gene GAPDH primer 800nM 2μL
Internal reference gene GAPDH fluorescent probe 200nM 0.5μL
2×PCR MasterMix 12.5μL
cDNA template 2.67ng/μL 7.5μL
Total of 25μL
TABLE 2 primer and probe sequences specific for lung cancer signature genes
Figure BDA0002477295440000231
Figure BDA0002477295440000241
Figure BDA0002477295440000251
5) And (3) diagnosis of the result of the sample to be detected: according to the relative mRNA content of 7 gene markers HNRNPD, TNFSF10, SLC38A1, MCEMP1, ZNF266 and TRAF5 detected by real-time fluorescence quantitative PCR in peripheral blood samples, calculating a detection result that the logistic regression risk value (P) of the sample exceeds a threshold value through a corresponding micro-nodule lung cancer discrimination model, and judging the sample to be positive, namely lung cancer; the detection result with the risk value (P) lower than the threshold value is judged as negative, namely non-lung cancer.
2. Results
A total of 114 cases of malignant micro-nodule Lung Cancer (Lung Cancer) and 82 cases of Benign micro-nodule (Benign) peripheral blood samples are collected, the relative expression levels of 7 gene markers of the micro-nodule Lung Cancer and an internal reference gene GAPDH in the peripheral blood are detected by using fluorescence quantitative PCR, the logistic regression log-likelihood ratio X value of each sample is calculated, the positive detection result is regarded as that the X value is more than or equal to 0, and the negative detection result is regarded as that the X value is not less than 0. The detection result is compared with the pathological detection result, the obtained lung cancer characteristic gene marker can better discriminate malignant lung small nodules (micro-nodule lung cancer) and benign micro nodules, the sensitivity and specificity of the lung cancer characteristic gene marker on the micro-nodule lung cancer detection are over 75 percent, and the specific detection result is shown in a figure 2 and a table 3.
TABLE 3
Figure BDA0002477295440000252
Based on 196 samples of data, the model constructed by 7 gene markers has the performance that the sensitivity, specificity and accuracy in a training set reach over 75 percent, and the AUC is 0.88; the sensitivity, specificity and accuracy of the test set are all over 75%, and AUC is 0.7.
Example 3 detection of micro-node Lung cancer Using various combinations of selected Lung cancer feature Gene markers
The diagnosis was performed using various combinations of genes in the same manner as described in example 2, and the results are shown in tables 4 and 5. Table 4 shows that various combinations of gene markers can be constructed based on 14 gene markers. Table 5 shows the sensitivity, specificity, accuracy and AUC of these combinations in the test set and training set. The sensitivity and specificity in the test set were greater than 60% for each group except groups 1 and 2.
Figure BDA0002477295440000271
Figure BDA0002477295440000281
Figure BDA0002477295440000291
Figure BDA0002477295440000301
The above-mentioned embodiments only express the embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Sequence listing
<110> Shanghai Berhao medical laboratory Co., Ltd
<120> marker for screening benign and malignant pulmonary nodules and application thereof
<130>201735
<160>59
<170>SIPOSequenceListing 1.0
<210>1
<211>1205
<212>DNA
<213>Artificial Sequence
<400>1
acagaaggag gaaggtcagc agccccgaca gccgacagtc acagcagctc tgacaagagc 60
gttcctggag cccagctcct ctccacagag gacaagcagg cagcagagac catgggcccc 120
ccctcagccg ctccccgtgg agggcacagg ccctggcagg ggctcctgat cacagcctca 180
cttttaacct tctggcaccc gcccaccact gtccagttca ctattgaagc cctgccgtcc 240
agtgctgcag agggaaagga tgttcttcta ctggcctgca atatttcaga aactattcaa 300
gcctattatt ggcacaaggg gaaaacggca gaagggagcc ctctcattgc tggttatata 360
acagacattc aagcaaatat cccaggggcc gcatacagtg gtcgagagac agtatacccc 420
aatggatccc tgctgttcca aaacatcacc ctggaggacg caggatccta caccctacga 480
accataaatg ccagttacga ctctgaccaa gcaactggcc agctccacgt acaccaaaac 540
aacgtcccag gccttcctgt gggggccgtc gctggcatcg tgactggggt cctggttggg 600
gtggctctgg tggccgccct ggtgtgtttt ctgcttctct ccaggactgg aagggccagc 660
atccagcgtg acctcaggga gcagccgccc ccagcctcca cccctggcca tggtccctct 720
cacagatcca ccttctcggc ccctctaccc agccccagaa cagccactcc catctatgag 780
gaattgctat actctgatgc aaacatttac tgccagatcg accacaaagc agatgtggtc 840
tcttaggttc ctctgggagc tgctcttgtg ggttgatgga gcgtccccga agctcccagc 900
cctggggacg gggaaggaca tggagcctga gccagagaac cagctctgag tcctgaggag 960
acacaggcct ggggacaggg agggatggga gtccctgctg aatatctgga gaccctgaca 1020
ggttgccctg ggctctgggt gggccgggac aaaggcctct catcaccaca ggaagcgggg 1080
gcttgcaagg aaagtgaatg ggcctgtggc ccacccgggg tcaccaggaa aggatctgaa 1140
taaagaggac ccttcctctc attggctctt tttctgctca cgggaactta gcagaaactc 1200
acctg 1205
<210>2
<211>573
<212>DNA
<213>Artificial Sequence
<400>2
aaacactctg tgtggctcct cggctttgac agagtgcaag acgatgactt gcaaaatgtc 60
gcagctggaa cgcaacatag agaccatcat caacaccttc caccaatact ctgtgaagct 120
ggggcaccca gacaccctga accaggggga attcaaagag ctggtgcgaa aagatctgca 180
aaattttctc aagaaggaga ataagaatga aaaggtcata gaacacatca tggaggacct 240
ggacacaaat gcagacaagc agctgagctt cgaggagttc atcatgctga tggcgaggct 300
aacctgggcc tcccacgaga agatgcacga gggtgacgag ggccctggcc accaccataa 360
gccaggcctc ggggagggca ccccctaaga ccacagtggc caagatcaca gtggccacgg 420
ccacggccac agtcatggtg gccacggcca cagccactaa tcaggaggcc aggccaccct 480
gcctctaccc aaccagggcc ccggggcctg ttatgtcaaa ctgtcttggc tgtggggcta 540
ggggctgggg ccaaataaag tctcttcctc caa 573
<210>3
<211>2268
<212>DNA
<213>Artificial Sequence
<400>3
aatatcttgc atgttacaga tttcactgct cccaccagct tggagacaac atgtggttct 60
tgacaactct gctcctttgg gttccagttg atgggcaagt ggacaccaca aaggcagtga 120
tcactttgca gcctccatgg gtcagcgtgt tccaagagga aaccgtaacc ttgcactgtg 180
aggtgctcca tctgcctggg agcagctcta cacagtggtt tctcaatggc acagccactc 240
agacctcgac ccccagctac agaatcacct ctgccagtgt caatgacagt ggtgaataca 300
ggtgccagag aggtctctca gggcgaagtg accccataca gctggaaatc cacagaggct 360
ggctactact gcaggtctcc agcagagtct tcacggaagg agaacctctg gccttgaggt 420
gtcatgcgtg gaaggataag ctggtgtaca atgtgcttta ctatcgaaat ggcaaagcct 480
ttaagttttt ccactggaat tctaacctca ccattctgaa aaccaacata agtcacaatg 540
gcacctacca ttgctcaggc atgggaaagc atcgctacac atcagcagga atatctgtca 600
ctgtgaaaga gctatttcca gctccagtgc tgaatgcatc tgtgacatcc ccactcctgg 660
aggggaatct ggtcaccctg agctgtgaaa caaagttgct cttgcagagg cctggtttgc 720
agctttactt ctccttctac atgggcagca agaccctgcg aggcaggaac acatcctctg 780
aataccaaat actaactgct agaagagaag actctgggtt atactggtgc gaggctgcca 840
cagaggatgg aaatgtcctt aagcgcagcc ctgagttgga gcttcaagtg cttggcctcc 900
agttaccaac tcctgtctgg tttcatgtcc ttttctatct ggcagtggga ataatgtttt 960
tagtgaacac tgttctctgg gtgacaatac gtaaagaact gaaaagaaag aaaaagtggg 1020
atttagaaat ctctttggat tctggtcatg agaagaaggt aatttccagc cttcaagaag 1080
acagacattt agaagaagag ctgaaatgtc aggaacaaaa agaagaacag ctgcaggaag 1140
gggtgcaccg gaaggagccc cagggggcca cgtagcagcg gctcagtggg tggccatcga 1200
tctggaccgt cccctgccca cttgctcccc gtgagcactg cgtacaaaca tccaaaagtt 1260
caacaacacc agaactgtgt gtctcatggt atgtaactct taaagcaaat aaatgaactg 1320
acttcaactg ggatacattt ggaaatgtgg tcatcaaaga tgacttgaaa tgaggcctac 1380
tctaaagaat tcttgaaaaa cttacaagtc aagcctagcc tgataatcct attacatagt 1440
ttgaaaaata gtattttatt tctcagaaca aggtaaaaag gtgagtgggt gcatatgtac 1500
agaagattaa gacagagaaa cagacagaaa gagacacaca cacagccagg agtgggtaga 1560
tttcagggag acaagaggga atagtataga caataaggaa ggaaatagta cttacaaatg 1620
actcctaagg gactgtgaga ctgagagggc tcacgcctct gtgttcagga tacttagttc 1680
atggcttttc tctttgactt tactaaaaga gaatgtctcc atacgcgttc taggcataca 1740
agggggtaac tcatgatgag aaatggatgt gttattcttg ccctctcttt tgaggctctc 1800
tcataacccc tctatttcta gagacaacaa aaatgctgcc agtcctaggc ccctgccctg 1860
taggaaggca gaatgtaact gttctgtttg tttaacgatt aagtccaaat ctccaagtgc 1920
ggcactgcaa agagacgctt caagtgggga gaagcggcga taccatagag tccagatctt 1980
gcctccagag atttgcttta ccttcctgat tttctggtta ctaattagct tcaggatacg 2040
ctgctctcat acttgggctg tagtttggag acaaaatatt ttcctgccac tgtgtaacat 2100
agctgaggta aaaactgaac tatgtaaatg actctactaa aagtttaggg aaaaaaaaca 2160
ggaggagtat gacacaaaaaaaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 2220
aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaa 2268
<210>4
<211>1418
<212>DNA
<213>Artificial Sequence
<400>4
agaaagattg gtggacagga gcagcggccg gtggggaggg cgctcggcgg cggcctgcgg 60
ccatggccac cgtgatggca gcgacggcgg cggagcgggc ggtgctggag gaggagttcc 120
gctggctgct gcacgacgag gtgcacgctg tgttgaagca gctgcaggac atcctcaagg 180
aggcctctct gcgcttcact ctgccgggct ccggcactga ggggcccgcc aagcaagaga 240
acttcatcct aggcagctgt ggcacagacc aggtgaaggg tgtgctgact ctgcaggggg 300
atgccctcag ccaggcggat gtgaacctga agatgccccg gaacaaccag ctgctgcact 360
tcgccttccg ggaggacaag cagtggaagc tgcagcagat ccaggatgcc agaaaccatg 420
tgagccaagc catttacctg cttaccagcc gggaccagag ctaccagttc aagacgggcg 480
ctgaggtcct caagctgatg gacgcagtga tgctgcagct gaccagagcc cgaaaccggc 540
tcaccacccc cgccaccctc accctccccg agatcgccgc cagcggcctc acgcggatgt 600
tcgcccctgc cctgccgtcc gacctgctgg tcaacgtcta catcaacctc aacaagctct 660
gcctcacggt gtaccagctg catgccctgc agcccaactc caccaagaac ttccgcccag 720
ctgggggcgc ggtgctgcat agccctgggg ccatgttcga gtggggctct cagcgcctgg 780
aggtgagcca cgtgcacaaa gtggagtgcg tgatcccctg gctcaacgac gccctggtct 840
acttcaccgt ctccctgcag ctctgccagc agctcaagga caagatctcc gtgttctcca 900
gctactggag ctacagaccc ttctgatcac agcacccagg agcttgtctc caggaaggcg 960
gccccgtccc ctactcatac ccaccacaga gcaccagcca gtgccaacgc caggctgcta 1020
tttatctccc tatcccaccc cctaccccac ctaacacatt tgcactgccg ggaatggaca 1080
ctggaagtgc caggaggaag gaaggctggt ttggtggggt agtggggagg tcagggaggc 1140
ggggccaagg gtgtcccaca ttcccaacac cgccctctga tcaccatggg aatctttgga 1200
ctcaggacag ggccaggcgc agggctctcc ctcctctccc cttcgctgtc ccctccccct 1260
ggagggcatg gtgtcggggg gtggcactga gctatgagtc ccggggatgg tgaggaacgc 1320
cacagacaga gccaccctag gagtgagtat agtgctggtg actgtgtttc atagccccag 1380
tccagggctg tctaagaaat aaagatcatc agactcca 1418
<210>5
<211>3992
<212>DNA
<213>Artificial Sequence
<400>5
gctgtgtttt ccggcgcttt ctccgcgccg ccaaacccgt tttcttcctg gtggcgtttg 60
ggcttaatac agctttggcg aggtcggatg acgggtggga gccagcggtg gaaggggtgg 120
cgaaagtacc ggtttgcccc aggccgccga ggggcctcct tagagagacc ttgcctgctc 180
cgctcgcgtc cgccggggcc gcgcgggtcc tcctggcgcc gccaggttca aaaagccact 240
cgagttgtca ctgcgacggc cctgggccag gagccgtttc gggatctgtc aaacaacgag 300
ttttcgtcgt tcgaatcagg ttgactggtc cttcatcccc ccaatctccc gtacctggcg 360
agtccagctc gtcgcggcct gtacctcccc acagagcttc aaaacgggct tttaggaaga 420
aggaatggag gctttgaagc aatgctaaga aaagagtgat atgcaagctg agaccaaaaa 480
tatggtatga tttagccata ctgaagggga aggaaataag agctgggcaa agcattctgt 540
gaattggctg actccacttc tatggtgaga gagaggagtg catcaaagat tactcccaga 600
gatggtttca gcatgttggc cagtctggtc tcagactcct gacctcaagt gatccaccca 660
cctcggcctc ccaaaatgct gggattacag gtataagcca ctgtgcctgg ccaaagatac 720
cgttaaccct ggataaagag aatggaggtt acctctgtcc gtgtagattc ctaagctgtc 780
ctggagtgat ccttggagta aaggaaaggt gctttgaagc acattcagcc atcagccctg 840
tgggatggca gccactgatt tgtcctatgg tctttacagg gacccagtct gccttcaaga 900
aaagacagaa gtagaaaggg tggtggctga ctgtctgaca aattgttatc aggtatgcag 960
gaagtatatc cttctccaaa atatcatact tgcatcacca ggtagacaca tttccttcta 1020
cacagaatta tcttcagagc ttcttaaagc aaataaagcc tccttcaagg actgagtccc 1080
tagtcgaatt cccggaagga gtggagcctg tcatattggt gagggtttgc cttgaatgtc 1140
atcccagtat ttcaatattg attaattagt cttccctcat ggtcccaact gcatagtttt 1200
tattttgttg agtgttctga cacatggtaa gggacatgaa agtatccttt gagataatct 1260
ttccattcat cagtgtttat ctagcatctg ctcaagagtg tgctgcagtg gagggaaatc 1320
agatgacctc ccagtctggt tgtgttacat acaatcatgt gtaagaagtg ccattcaagc 1380
cgtgtcactg gaggggactg acagattcag tgacttttga tgatctggct gtggacttca 1440
ccccagaaga atggacttta ctggacccaa ctcagagaaa cctctacaga gatgtgatgc 1500
tggagaacta caagaatttg gccacagtag gatatcagct cttcaaaccc agtctgatct 1560
cttggctgga acaagaagag tctaggacag tgcagagagg tgatttccaa gcttcagaat 1620
ggaaagtgca acttaaaacc aaagagttag cccttcagca ggatgttttg ggggagccaa 1680
cctccagtgg gattcaaatg ataggaagcc acaacggagg ggaggtcagt gatgttaagc 1740
aatgtggaga tgtctccagt gaacactcat gccttaagac acatgtgaga actcaaaata 1800
gtgagaacac atttgagtgt tatctgtatg gagtagactt ccttactctg cacaagaaaa 1860
cctctactgg agagcaacgt tctgtattta gtcagtgtgg aaaagccttc agcctgaacc 1920
cagatgttgt ttgccagaga acgtgcacag gagagaaagc ttttgattgc agtgactctg 1980
ggaaatcctt cattaatcat tcacaccttc agggacattt aagaactcac aatggagaaa 2040
gtctccatga atggaaggaa tgtgggagag gctttattca ctccacagac cttgctgtgc 2100
gtatacaaac tcacaggtca gaaaaaccct acaaatgtaa ggaatgtgga aaaggattta 2160
gatattctgc ataccttaat attcacatgg gaacccacac tggagacaat ccctatgagt 2220
gtaaggagtg tgggaaagcc ttcaccaggt cttgtcaact tactcagcac agaaaaactc 2280
acactggaga gaaaccttat aaatgtaagg attgtgggag agccttcact gtttcctctt 2340
gcttaagtca acatatgaaa atccatgtgg gtgagaagcc ttatgaatgc aaggaatgtg 2400
ggatagcctt cactagatct tctcaactta ctgaacattt aaaaactcac actgcaaagg 2460
atccctttga atgtaagata tgtggaaaat cctttagaaa ttcctcatgc ctcagtgatc 2520
actttcgaat tcacactgga ataaaaccct ataaatgtaa ggattgtggg aaagccttca 2580
ctcagaactc agaccttact aagcatgcac gaactcacag tggagagagg ccctatgaat 2640
gtaaggaatg tggaaaggcc tttgccagat cctctcgcct tagtgaacat acaagaactc 2700
acactggaga gaagcctttt gaatgtgtca aatgtgggaa agcctttgct atttcttcaa 2760
atcttagtgg acatttgaga attcacactg gagagaagcc ctttgagtgc ctggaatgtg 2820
gtaaagcatt tacgcattcc tccagtctta ataatcacat gcggacccac agcgccaaaa 2880
aaccattcac gtgtatggaa tgtggcaaag cttttaagtt tcccacgtgt gttaaccttc 2940
acatgcggat ccacactgga gaaaaaccct acaaatgtaa acagtgtggg aaatccttca 3000
gttactccaa ttcgtttcag ttacatgaac gaactcacac tggagagaaa ccctatgaat 3060
gtaaggagtg cgggaaagcc ttcagttctt ccagttcctt tcgaaatcat gaaagaaggc 3120
atgcggatga gagactgtca gcataaggaa tgtgggaaaa cctaaaggtg tccctgttct 3180
ctctgaagac atgaaaactc actggggaga aaccctatga atgtaaaaat gtggaagcaa 3240
ctttgtatct caggtcttaa tgaacacata tgaattcaca gtggagaaga ccctgcatca 3300
gggaatgtgg aaatgacttt gctgaattct caagccttac caaacacatc agaaatctca 3360
ctggagagaa actgtatgaa tgtagagaat ctgggaatac ctttctgaat cccacaaacc 3420
ttaatgtgtg tatgtgaact cacattggag agaaaccctg caatttaaat ggtatggtct 3480
ggatgatgcc ccactccata tttgtaagcc ctaagtccta gttccttaca ctataactgt 3540
atttggacat agggttttca aacaggtgag taacttcaaa tgaggttgtt gggttcgatc 3600
cctaatctga catcactggt gtccctataa gggaaactga aggaaggata cacatggaga 3660
agactgtgtg gatccaccag aagatggcca tctacaagcc aaggacagag acctggaaca 3720
gatgctttca ttatggcctc cagaggaaac caaccctgtc tccaccttga tattgcactt 3780
ccaggctcca gaactgtgag gcaataaatt tctcttggtt aaatcattca gtctgttatt 3840
ttgtacagca accctaggaa actaatactg tgaggaactt gggaaaagct ttagatcaag 3900
cttgtccaac ccgcaggcca ggatggcttt gaatgcagac caacacaaat ttttaagctt 3960
tcttcaaaca taataaaatt tttttgtgat ta 3992
<210>6
<211>1876
<212>DNA
<213>Artificial Sequence
<400>6
gaccggctgc ctggctgact tacagcagtc agactctgac aggatcatgg ctatgatgga 60
ggtccagggg ggacccagcc tgggacagac ctgcgtgctg atcgtgatct tcacagtgct 120
cctgcagtct ctctgtgtgg ctgtaactta cgtgtacttt accaacgagc tgaagcagat 180
gcaggacaag tactccaaaa gtggcattgc ttgtttctta aaagaagatg acagttattg 240
ggaccccaat gacgaagaga gtatgaacag cccctgctgg caagtcaagt ggcaactccg 300
tcagctcgtt agaaagatga ttttgagaac ctctgaggaa accatttcta cagttcaaga 360
aaagcaacaa aatatttctc ccctagtgag agaaagaggt cctcagagag tagcagctca 420
cataactggg accagaggaa gaagcaacac attgtcttct ccaaactcca agaatgaaaa 480
ggctctgggc cgcaaaataa actcctggga atcatcaagg agtgggcatt cattcctgag 540
caacttgcac ttgaggaatg gtgaactggt catccatgaa aaagggtttt actacatcta 600
ttcccaaaca tactttcgat ttcaggagga aataaaagaa aacacaaaga acgacaaaca 660
aatggtccaa tatatttaca aatacacaag ttatcctgac cctatattgt tgatgaaaag 720
tgctagaaat agttgttggt ctaaagatgc agaatatgga ctctattcca tctatcaagg 780
gggaatattt gagcttaagg aaaatgacag aatttttgtt tctgtaacaa atgagcactt 840
gatagacatg gaccatgaag ccagtttttt tggggccttt ttagttggct aactgacctg 900
gaaagaaaaa gcaataacct caaagtgact attcagtttt caggatgata cactatgaag 960
atgtttcaaa aaatctgacc aaaacaaaca aacagaaaac agaaaacaaa aaaacctcta 1020
tgcaatctga gtagagcagc cacaaccaaa aaattctaca acacacactg ttctgaaagt 1080
gactcactta tcccaagaga atgaaattgc tgaaagatct ttcaggactc tacctcatat 1140
cagtttgcta gcagaaatct agaagactgt cagcttccaa acattaatgc aatggttaac 1200
atcttctgtc tttataatct actccttgta aagactgtag aagaaagagc aacaatccat 1260
ctctcaagta gtgtatcaca gtagtagcct ccaggtttcc ttaagggaca acatccttaa 1320
gtcaaaagag agaagaggca ccactaaaag atcgcagttt gcctggtgca gtggctcaca 1380
cctgtaatcc caacattttg ggaacccaag gtgggtagat cacgagatca agagatcaag 1440
accatagtga ccaacatagt gaaaccccat ctctactgaa agtacaaaaa ttagctgggt 1500
gtgttggcac atgcctgtag tcccagctac ttgagaggct gaggcaagag aattgtttga 1560
acccgggagg cagaggttgc agtgtggtga gatcatgcca ctacactcca gcctggcgac 1620
agagcgagac ttggtttcaa aaaaaaaaaa aaaaaaaact tcagtaagta cgtgttattt 1680
ttttcaataa aattctatta cagtatgtca tgtttgctgt agtgctcata tttattgttg 1740
tttttgtttt agtactcact tgtttcataa tatcaagatt actaaaaatg ggggaaaaga 1800
cttctaatct ttttttcata atatctttga cacatattac agaagaaata aatttcttac 1860
ttttaattta atatga 1876
<210>7
<211>7633
<212>DNA
<213>Artificial Sequence
<400>7
gtgtttatgt ggaagcgaga tgaccggcag gaacctgccc caatgggctg cagagtggtt 60
agtgagtggg tgacagacag acccgtaggc caacgggtgg ccttaagtgt ctttggtctc 120
ctccaatgga gcagcggcgg ggcgggaccg cgactcgggt ttaatgagac tccattgggc 180
tgtaatcagt gtcatgtcgg attcatgtca acgacaacaa cagggggaca caaaatggcg 240
gcggcttagc tcctacccct ggcggcggcg gcagcggtgg cggaggcgac ggcacctcct 300
ccaggcggca gccgcagttt ctcaggcagc ggcagcgccc ccggcaggcg cggtggcggt 360
ggcgcgcagc cagatttgcc tgaagacctg gataatctcc atttttgtca tggactgtta 420
aaacgtttga agttccaatt ctggtcttga tttcccagtt aaagatgttc ttcacccgaa 480
tgcagtcttt cctgttggta aaataagaca accatcaaca ttgcctgttt gtctgctttt 540
gaatctctta aggatggatg tttgtaagat gttgcttaat acagtctgga atactctgtc 600
catttgttga attgtaaatg actttcaaat gtgcaagttc tgttaaatac aaagagaacc 660
tctatgggta acttttgtgt tgaagaagtc atttgtcaac catggtaaaa cttgcaaacc 720
cactttatac agagtggatt cttgaagcta tacagaaaat aaaaaagcaa aagcaaaggc 780
cctctgaaga gagaatctgc catgcggtca gtacttccca tgggttggat aagaagacag 840
tctctgaaca gctggaactc agtgttcagg atggctcagt tctcaaagtc accaacaaag 900
gccttgcctc ctataaggac ccagacaacc ctgggcgctt ttcatcagtt aaaccaggca 960
cttttcctaa gtcagccaag gggtctagag gatcatgtaa tgatctccgc aatgtggatt 1020
ggaataaact tttaaggaga gcaattgaag gacttgagga gccgaatggc tcctccctga 1080
agaacataga gaagtatctc agaagtcaaa gtgatctcac aagcaccacc aacaacccag 1140
cctttcagca gcggctgcga ctgggggcca aacgcgctgt gaataatggg aggttactga 1200
aagacggacc gcagtacagg gtcaattatg ggagcttaga tggcaaaggg gcacctcagt 1260
atcccagtgc attcccatcc tcgctcccac ctgtcagcct tctaccccat gagaaagacc 1320
agccccgtgc tgatcccatt ccaatatgta gcttctgttt ggggactaaa gaatcaaatc 1380
gtgaaaagaa accagaagaa ctcctctctt gtgcagattg tggcagtagt ggacacccat 1440
cctgtttgaa attttgtcct gaattaacaa caaatgtaaa ggccttaagg tggcagtgca 1500
tcgaatgcaa gacatgcagt gcctgtagag tccaaggcag aaatgctgat aatatgcttt 1560
tttgtgattc ctgtgataga ggatttcata tggaatgctg tgacccacca ctttccagaa 1620
tgccaaaagg gatgtggatt tgccaagtct gcagaccaaa gaaaaaggga agaaaactac 1680
ttcatgagaa agctgcacaa ataaaacgac gatatgcaaa acccattgga cgaccgaaaa 1740
ataaattaaa gcaacgattg ttgtctgtaa ccagtgatga aggatccatg aatgcattca 1800
caggaagggg gtcacctgat actgaaataa aaataaacat caaacaagaa agtgcagatg 1860
taaatgtgat tggaaacaag gatgtcgtta ctgaagagga tttggatgtt tttaagcagg 1920
cccaggaact ttcttgggag aaaatagagt gtgagagtgg ggtggaagac tgtggccggt 1980
acccttctgt gattgaattt ggtaaatatg aaatccaaac ctggtactcc tcgccttacc 2040
cacaggaata tgcaagatta ccaaagcttt acctgtgtga attctgtctt aaatatatga 2100
aaagtaaaaa tattttgcta agacactcca agaagtgtgg atggtttcat cctccagcaa 2160
atgaaattta ccgaaggaaa gacctttcag tatttgaggt tgatgggaat atgagcaaaa 2220
tttattgcca aaacctttgc ttgttagcca agctcttcct ggaccacaaa acgttgtatt 2280
atgatgtcga gccattcctt ttttatgtcc ttacaaaaaa tgatgaaaag ggctgtcatc 2340
tggttggata cttctctaag gaaaagcttt gccagcagaa gtataatgtc tcctgcataa 2400
tgatcatgcc ccagcaccaa aggcaaggat ttggacggtt tctcattgat ttcagctatt 2460
tgctttctag aagagaaggc caagcagggt ctcctgaaaa gcctctctcc gatctgggcc 2520
gtctctccta cctggcatat tggaagagcg tcatcttgga gtatctctac caccaccatg 2580
agaggcacat cagcatcaag gcaattagca gagcgacggg catgtgccca catgacattg 2640
ccaccactct gcagcacctc cacatgatcg acaagagaga tggcagattt gtcatcatta 2700
gacgggaaaa gttgatattg agccacatgg aaaagctgaa aacctgttcc agagccaatg 2760
aacttgatcc agacagtctg aggtggaccc caattttaat ttctaatgct gcagtgtctg 2820
aagaagagcg agaagctgag aaagaggctg agcggctaat ggaacaagct agctgctggg 2880
agaaggagga acaagaaatc ctgtcaacta gagctaacag taggcaatca cctgcaaaag 2940
tacaatcgaa aaataaatat ttgcattccc cggagagccg gccagtcaca ggggagcgag 3000
ggcagctgct ggagctgtct aaagagagca gtgaagaaga agaggaggag gaggacgagg 3060
aggaggaaga agaggaggaa gaagaggaag aggatgaaga ggaggaagaa gaggaagaag 3120
aagaagaaga agaagaaaat attcaaagct ctcccccaag attgacgaaa ccacagtcag 3180
ttgccataaa gagaaagagg ccttttgtac taaagaagaa aaggggtcgt aaacgcagga 3240
ggatcaacag cagtgtaaca acagagacca tttcagagac gacagaagta ctgaatgagc 3300
cctttgacaa ctcagatgaa gagaggccaa tgccacagct ggagcctacc tgtgagattg 3360
aagtggagga agatggcagg aagccagtcc tgagaaaagc attccagcat cagcctggga 3420
agaaaagaca aacagaggaa gaggaaggaa aagacaatca ttgcttcaag aatgctgacc 3480
cttgtagaaa caatatgaat gatgattcaa gtaacttgaa agaaggcagt aaagacaatc 3540
ccgaacctct aaagtgcaaa caagtgtggc caaaaggaac aaagcgcggt ctatctaagt 3600
ggaggcaaaa caaagagagg aagaccggat ttaaactgaa tttgtacacc ccgccagaaa 3660
cacccatgga gcctgacgag caggtaacag tggaagaaca gaaggagact tcagaaggaa 3720
aaaccagccc cagtcccatc aggattgagg aggaggtcaa ggaaactggg gaagccctgt 3780
tgcctcaaga ggaaaacaga agggaagaaa catgtgcccc tgtaagtcca aacacatcac 3840
caggtgaaaa accagaagat gatctcatca aacctgagga agaggaagag gaggaggagg 3900
aggaagagga agaagaggaa gaagaggaag gggaagaagaagaaggagga ggaaatgtag 3960
aaaaagatcc agatggtgct aaaagccaag aaaaagagga accagaaatc tccacggaaa 4020
aagaagactc tgcacgtttg gatgatcacg aagaggagga ggaagaggat gaagagccat 4080
cccacaacga ggaccatgat gccgatgacg aggatgacag ccacatggag tctgccgaag 4140
tggagaagga agagctgccc agagaaagct tcaaagaagt actggaaaac caggagactt 4200
ttttagacct taatgtgcag cctggtcact cgaacccaga ggtcttaatg gactgtggcg 4260
tcgacctgac agcttcttgt aacagtgagc ccaaggagct tgctggggac cctgaagctg 4320
tacccgaatc tgacgaggag ccacccccag gagaacaggc acagaagcag gaccaaaaga 4380
acagcaagga agtcgataca gagttcaaag agggaaaccc agcaaccatg gaaatcgact 4440
ctgagactgt ccaggccgtt cagtctttga cccaggagag cagcgaacag gacgacacct 4500
ttcaggattg tgccgagact caagaggcct gtagaagcct acagaactac acccgtgcag 4560
accaaagtcc acagattgcc accacgctcg acgattgcca acagtcggac cacagtagcc 4620
cagtttcatc cgtccactcc catcctggcc agtccgtacg ttctgtcaac agcccaagtg 4680
tccctgctct ggaaaacagc tacgcccaaa tcagcccaga tcaaagtgcc atctcagtgc 4740
catctctgca gaacatggaa accagtccca tgatggatgt cccatcagtt tcagatcatt 4800
cacagcaagt cgtagacagt ggatttagtg acctgggcag tatcgagagc acaactgaga 4860
actacgaaaa cccaagcagc tacgattcta ctatgggagg cagcatctgt ggaaacggct 4920
cttcacagaa cagctgctcc tatagcaacc tcacctccag cagtctgaca cagagcagct 4980
gtgctgtcac ccagcagatg tccaacatca gcgggagctg cagcatgctg cagcaaacca 5040
gcatcagctc ccctccgacc tgcagcgtca agtctcctca aggctgtgtg gtggagaggc 5100
ctccgagcag cagccagcag ctggctcagt gcagcatggc tgctaacttc accccaccca 5160
tgcagctggc tgaaatcccc gagacgagca acgccaacat tggcttatac gagcgaatgg 5220
gtcagagtga ttttggggct gggcattacc cgcagccgtc agccaccttc agccttgcca 5280
aactgcagca gttaactaat acacttattg atcattcatt gccttacagc cattccgctg 5340
ctgtgacttc ctatgcaaac agtgcctctt tgtccacacc attaagtaac acagggcttg 5400
ttcaactttc tcagtctcca cactccgtcc ctgggggacc ccaagcacaa gctaccatga 5460
ccccaccccc caacctgact cctcctccaa tgaatctgcc gccgcctctt ttgcaacgga 5520
acatggctgc atcaaatatt ggcatctctc acagccaaag actgcaaacc cagattgcca 5580
gcaagggcca catctccatg agaaccaagt cagcgtctct gtcaccagcc gctgccaccc 5640
atcagtcaca aatctatggg cgctcccaga ctgtagccat gcagggtcct gcacggactt 5700
taacgatgca aagaggcatg aacatgagtg tgaacctgat gccagcgcca gcctacaatg 5760
tcaactctgt gaacatgaac atgaacactc tcaacgccat gaatgggtac agcatgtccc 5820
agccaatgat gaacagtggc taccacagca atcatggcta tatgaatcaa acgccccaat 5880
accctatgca gatgcagatg ggcatgatgg gcacccagcc atatgcccag cagccaatgc 5940
agaccccacc ccacggtaac atgatgtaca cggcccccgg acatcacggc tacatgaaca 6000
caggcatgtc caaacagtct ctcaatggct cctacatgag aaggtagaca acgtgggcag 6060
tccacaaaac ctacggggca tcactattgg attgatctgc acaaatacct ttgaagagta 6120
cgatttcaaa accagcaatt ggtgtgaatg caaaaacatt tgttggcacc atttatttaa 6180
aaaaaaaaaa agctgtatgc agcagaaagc cttatacaag ttgtttttct ttttttcctt 6240
tttctttttt ttggtacctt catttctgtt acttttatat aaaattctct gcaaaggaag 6300
gcctctcttt ggactacaat ttggaggcag ccacttgttg tgcctgcttc tgttaaacaa 6360
tgtggatatc aagccccccc aaattatctg ttttaatatt gaacctagag cttttttttt 6420
cccttccctg tccactccat gtaaatgcct ttagcatttc agttattgta tattttgttt 6480
aaggtgacac ttcagcatgc cgctaatgtc tttgttagtg acagtgcatt ttgtagtact 6540
gtacaagtgt tgtgctaaca gtaagccatt tcttaagttt tttgccttga ttagggtgcc 6600
ctaatttgag ggttttaaaa aaaactatat ttttgttaat tataaaactg taaagagcta 6660
taaaagctat tcccatttgg ttagtcaaaa gggttttatt gctaaatgtt tggtgtaaag 6720
ttgagaccct tttccatttt ggtgacagat ttctttgggg aaaaaaggca gctttctgtt 6780
ttataaatgc agacttctgt ttattgaatg aagcatatct cagtgtttat ctgtcaggtt 6840
ttgaaacatt tcatatatgt ccaaatactt ggcaggattt aaaaaaaaat agtgaatttg 6900
gtgtaaagtt gctattttat ggaaatgcct ctaactttac attttcattc catctgtaga 6960
tttttctatc tttataaaat attggagtta ttttttaagg aaaaatagaa aagtagcttg 7020
tgaatagctc aaactaagct tacaaatcgc atgtaaaaaa gcaaaaaagt tatttgtgtc 7080
tgtttatatt gcttcctttt ttgtagcctt tgtacctgta cagggtgaca gtaagggcca 7140
agcaggagag gcgtaatcct tgtataaaat aggatccagc gacactcttg tatttatctg 7200
ttctcttttt agtcagtcac ttcaaaaaaa caaaaaacaa acaaaaaaaa gctgtacatt 7260
ttaacataaa ataaattatg atgagccatt tttagcctct tgtgtcctgt catattatga 7320
ttgatagaga atgaccaatg gaactgtatc atgtgtcacg cctcagaaca catacacatt 7380
ttgggaaaat aaattattta gtgtaaattg gagttatggg attttctgat ttgttttgac 7440
tttgggggag gggttggcaa taaataagag taatatctaa taaaaccatc acatatacca 7500
aatacctatt taataaatta atttataatg gattttaatg cttttcatga aagtttattt 7560
tatgcgagtg cataccttct gtatgccaat cattgtcttt aaaataaagt gaaattgttt 7620
tttaaaaaaa aaa 7633
<210>8
<211>1326
<212>DNA
<213>Artificial Sequence
<400>8
tggacaaatt tgcgggctgg ggaccatgga agtggaggaa atctacaagc accaggaagt 60
caagatgcaa gcaccagcct tcagggacaa gaaacagggg gtctcagcca agaatcaagg 120
tgcccatgac ccagactatg agaatatcac cttggccttc aaaaatcagg accatgcaaa 180
gggtggtcat tcacgaccca cgagccaagt cccagcccag tgcaggccgc cctcagactc 240
cacccaggtc ccctgctggt tgtacagagc catcctgagc ctgtacatcc tcctggccct 300
ggcctttgtc ctctgcatca tcctgtcagc cttcatcatg gtgaagaatg ctgagatgtc 360
caaggagctg ctgggcttta aaagggagct ttggaatgtc tcaaactccg tacaagcatg 420
cgaagagaga cagaagagag gctgggattc cgttcagcag agcatcacca tggtcaggag 480
caagattgat agattagaga cgacattagc aggcataaaa aacattgaca caaaggtaca 540
gaaaatcttg gaggtgctgc agaaaatgcc acagtcctca cctcaataaa tgagaggaca 600
ttgtggcagc caaagccaca acttggaaga tggggctgca cctgccaacg aagacgggaa 660
atgacccccc ccccccagcc tagtgtgaac ctgcccctcg tcccacgtat agaaaaacct 720
cgagtcatgg tgaatgagtg tctcggagtt gctcgtgtgt gtgtacacct gcgtgcgtgt 780
gtgtgcgtgt gtgcgcgtgt gttcgtgtat gtgcgtgtgt gcgtgcgcgt gtgtgtgcat 840
tttgcaaagg gtggacattt cagtgtatct cccagaaagg tgatgaatga ataggactga 900
gagtcacagt gaatgtggca tgcatgcctg tgtcatgtga catatgtgag tctcggcatg 960
tcacggtggg tggctgtgtc tgagcacctc cagcagatgt cactctgagt gtgggtgttg 1020
gtgacatgca ttgcacgggc ctgtctccct gtttgtgtaa acatactaga gtatactgcg 1080
gcgtgttttc tgtctaccca tgtcatggtg ggggagattt atctccgtac atgtgggtgt 1140
cgccatgtgt gccctgtcac tatctgtggc tgggtgaacg gctgtgtcat tatgagtgtg 1200
ccgagttatg ccaccctgtg tgctcagggc acatgcacac agacatttat ctctgcactc 1260
acattttgtg acttatgaag ataaataaag tcaagggaaa acagcgtcaa aaaaaaaaaa 1320
aaaaaa 1326
<210>9
<211>3988
<212>DNA
<213>Artificial Sequence
<400>9
agacgcacgt gagggaaatc agatgactgg acttgtagat actaacggtt ctgaagcgga 60
atggcttatt cagaagagca taaaggtatg ccctgtggtt tcatccgcca gaattccggc 120
aactccattt ccttggactt tgagcccagt atagagtacc agtttgtgga gcggttggaa 180
gagcgctaca aatgtgcctt ctgccactcg gtgcttcaca acccccacca gacaggatgt 240
gggcaccgct tctgccagca ctgcatcctg tccctgagag aattaaacac agtgccaatc 300
tgccctgtag ataaagaggt catcaaatct caggaggttt ttaaagacaa ttgttgcaaa 360
agagaagtcc tcaacttata tgtatattgc agcaatgctc ctggatgtaa tgccaaggtt 420
attctgggcc ggtaccagga tcaccttcag cagtgcttat ttcaacctgt gcagtgttct 480
aatgagaagt gccgggagcc agtcctacgg aaagacctga aagagcattt gagtgcatcc 540
tgtcagtttc gaaaggaaaa atgcctttat tgcaaaaagg atgtggtagt catcaatcta 600
cagaatcatg aggaaaactt gtgtcctgaa tacccagtat tttgtcccaa caattgtgcg 660
aagattattc taaaaactga ggtagatgaa cacctggctg tatgtcctga agctgagcaa 720
gactgtcctt ttaagcacta tggctgtgct gtaacggata aacggaggaa cctgcagcaa 780
catgagcatt cagccttacg ggagcacatg cgtttggttt tagaaaagaa tgtccaatta 840
gaagaacaga tttctgactt acacaagagc ctagaacaga aagaaagtaa aatccagcag 900
ctagcagaaa ctataaagaa acttgaaaag gagttcaagc agtttgcaca gttgtttggc 960
aaaaatggaa gcttcctccc aaacatccag gtttttgcca gtcacattga caagtcagct 1020
tggctagaag ctcaagtgca tcaattatta caaatggtta accagcaaca aaataaattt 1080
gacctgagac ctttgatgga agcagttgat acagtgaaac agaaaattac cctgctagaa 1140
aacaatgatc aaagattagc cgttttagaa gaggaaacta acaaacatga tacccacatt 1200
aatattcata aagcacagct gagtaaaaat gaagagcgat ttaaactgct ggagggtact 1260
tgctataatg gaaagctcat ttggaaggtg acagattaca agatgaagaa gagagaggcg 1320
gtggatgggc acacagtgtc catcttcagc cagtccttct acaccagccg ctgtggctac 1380
cggctctgtg ctagagcata cctgaatggg gatgggtcag ggagggggtc acacctgtcc 1440
ctatactttg tggtcatgcg aggagagttt gactcactgt tgcagtggcc attcaggcag 1500
agggtgaccc tgatgcttct ggaccagagt ggcaaaaaga acattatgga gaccttcaaa 1560
cctgacccca atagcagcag ctttaaaaga cctgatgggg agatgaacat tgcatctggc 1620
tgtccccgct ttgtggctca ttctgttttg gagaatgcca agaacgccta cattaaagat 1680
gacactctgt tcttgaaagt ggccgtggac ttaactgacc tggaggatct ctagtcactg 1740
ttatggggtg ataagaggac ttcttggggc cagaactgtg gaggagagca catttgatta 1800
tcatattgac ctggatttag actcaaagca catttgtatt tgcctttttc cttaacgttt 1860
gaagtcagtt taaaacttct gaagtgctgt ctttttacat tttactctgt cccagtttga 1920
aacttaaaac tcttagaata ttctcttatt atttatattt ttatatttct tgaaagatgg 1980
taagtttctt gaagtttttg gggcgtttct cttttactgg tgcttagcgc agtgtctcgg 2040
gcactctaaa tattgagtgt tatggaggac acagaggtag cagaatccca gttgaaaatg 2100
ttttgatatt ttattgtttg gcctattgat tctagacctg gccttaagtc tgcaaaagcc 2160
atctttataa ggtaggctgt tccagttaag tagtgggtga tgtagttaca aagataatat 2220
gctcagtttg gacctttttt tcagttaaat gctaaatata tgaaaattac tatacctcta 2280
agtattttca tgaaattcac cagcagtttg caagcacagt tttgcaaggc tgcataagaa 2340
ctggtgaatg gggtaagcat tttcattctt cctgctgaag taaagcagaa agtactgcat 2400
agtatatgag atatagccag ctagctaaag ttcagatttt gttaggttca accctatgaa 2460
aaaaactatt ttcataggtc aaaaatggta aaaaattagc agtttcataa gattcaacca 2520
aataaatata tatatacaca cacacataca tatacaccta tatatgtgtg tatacaaaca 2580
gttcgaatgt attttggtga cagtaataaa tcaatgtgag gatggataga atttagtata 2640
tgatagagaa aatgtcataa atggataaaa ggaatttaca acttgaggag aaaaccttta 2700
caatttccta tgggtgtcag aagtactctc agcgaaaact gatggctaaa acagtatcta 2760
ctattctctg ataacttttt ttttgagaca gagtttcatt gtcacccagg ctggagtaca 2820
gtggcatgat ctcagctcac tgcaaactct gcctcccgaa ttcaagtgat tctcctgcct 2880
cagcctcctg agtagctggg attacaggcg cccgtcacca cacccaggta atttttgtat 2940
ttttagtaga gacggagttt tgccatgttg gccaagctga tctcaaactc ctgacctcaa 3000
gtgatctgcc cgcctcggcc tcccaaagtg ctgagattac aggcatgacc caccgcgtca 3060
agcctctgac aactattgaa tttgtaagct gctatgcaaa tgggcattta tataaacttg 3120
tgatgtttct tgtcagaatt ctgagtactc tgtgaagaac agaaatgatc atattcttat 3180
gcatctatct gtatgggtct gaaggtgtat atacaaactg agatgagtcc ttatgactct 3240
tgataagcct gagtttaaca acaacaaaaa tgccaagttg tcctgagccc ttctgcgttg 3300
ttatgccact tccctactgc tcatatgcacgctggctccc ctgggcacgc aaggatgagt 3360
atgggccatg ggcccctgta gagctgctta cctggtgatg accatgcacc ttacaatttc 3420
tgaacagtta accctataga agcatgcttt atatgagtgt cttctgggaa gaggaacctt 3480
cttaatctct tctgtgggat tttcaaaatg ctaaagactc acactgcagc aatcatccca 3540
gatgattaaa ttcaaagaaa taggttcaca acaggaatat actgaagaac tagagtgtca 3600
ctgctggtga actgtggcac ggttgctcaa cacatcacct cggacaaatt caggaagcat 3660
ttctttagcc cacaagtcca gacccaggtg ctctgtatgt ttgtttttaa tattcatcat 3720
atccaagttc actctgtctt cctgagcagt ggaagatcat attgctgtaa cttcttttaa 3780
gtagttgatg tggaaaacat tttaaagtga atttgtcaaa atgctggttt tgtgttttat 3840
ccaacttttg tgcatatata taaagtatgt catggcatgg tttgcttagg agttcagagt 3900
tccttcatca tcgaaatagt gattaagtga tcccagaaca aggaatacta gagtaaaaag 3960
cacctctttt tcacaaaaaa aaaaaaaa 3988
<210>10
<211>7758
<212>DNA
<213>Artificial Sequence
<400>10
actgacacgc agctttggtt aaagagcggg cgcacaggag gggaggagac cgcgcgcggg 60
acggggagga atggcctgtc cgcgttaaac catcacaagc catggttgcg gaagggccac 120
gcgtccccca gtaggagaat gactccgatt cgtgaccctc agcgccggtg catgtcgata 180
tatttattga gtgtctactg tgtgccaggc actatatcta tgtgcataga aaaaccctgg 240
aaggccatac aacaatatat atagagtgat cgtctctgct tgctgagcta acaggggtgt 300
caagcttcca ttttggtatc tacttctaaa tacactcaga acaggagaaa tttggactaa 360
ttttcaaact acagacactt tctaatcatg atgcatttca aaagtggact cgaattaact 420
gagttgcaaa acatgacagt gcccgaggat gataacatta gcaatgactc caatgatttc 480
accgaagtag aaaatggtca gataaatagc aagtttattt ctgatcgtga aagtagaaga 540
agtctcacaa acagccattt ggaaaaaaag aagtgtgatg agtatattcc aggtacaacc 600
tccttaggca tgtctgtttt taacctaagc aacgccatta tgggcagtgg gattttggga 660
ctcgcctttg ccctggcaaa cactggaatc ctactttttc tggtactttt gacttcagtg 720
acattgctgt ctatatattc aataaacctc ctattgatct gttcaaaaga aacaggctgc 780
atggtgtatg aaaagctggg ggaacaagtc tttggcacca cagggaagtt cgtaatcttt 840
ggagccacct ctctacagaa cactggagca atgctgagct acctcttcat cgtaaaaaat 900
gaactaccct ctgccataaa gtttctaatg ggaaaggaag agacattttc agcctggtac 960
gtggatggcc gcgttctggt ggtgatagtt acctttggca taattctccc tctgtgtctc 1020
ttgaagaact tagggtatct tggctatact agtggatttt ccttgagctg tatggttttt 1080
ttcctaattg tggttattta caagaaattt caaattccct gcattgttcc agagctaaat 1140
tcaacaataa gtgctaattc aacaaatgct gacacgtgta cgccaaaata tgttaccttc 1200
aattcaaaga ccgtgtatgc tttacccacc attgcatttg catttgtttg ccacccgtca 1260
gtcctgccaa tttacagtga gcttaaagac cgatcacaga aaaaaatgca gatggtttca 1320
aacatctcct ttttcgccat gtttgttatgtacttcttga ctgccatttt tggctacttg 1380
acattctatg acaacgtgca gtccgacctc cttcacaaat atcagagtaa agatgacatt 1440
ctcatcctga cagtgcggct ggctgtcatt gttgctgtga tcctcacagt gccggtgtta 1500
tttttcacgg ttcgttcatc tttatttgaa ctggctaaga aaacaaagtt taatttatgt 1560
cgtcataccg tggttacctg catactcttg gttgttatca acttgttggt gatcttcata 1620
ccctccatga aggatatttt tggagtcgta ggagttacat ctgctaacat gcttattttc 1680
attcttcctt catctcttta tttaaaaatc acagaccagg atggagataa aggaactcaa 1740
agaatttggg ctgccctttt cttgggcctg ggggtgttgt tctccttggt cagcattccc 1800
ttggtcatct atgactgggc ctgctcatcg agtagtgacg aaggccactg aaacccgccg 1860
agaaaaagaa acatccctgt tgtctgctca gtcaagtccc cacacatcag caatctctca 1920
ccacttcttt tgcaagttta cagaagcaaa cagaaatgta caggatactt aaaatggaat 1980
aactttttgg ttgcaaaaca gagacatggt tctataatgc ttcatgtccc tccaagattt 2040
gagatcaatt tagggattgt gaaatttttt tttcaaattt catacaatca tatttcccag 2100
tacttttcac aatcattttt tacccatcta actctatgtt ttgtggcttc ccggtctctt 2160
agaactttga aaacatgata tacaataatg tttatttatt atacatccag attctgaaat 2220
aattttccta ctgatgttca gctcacacta tctgtacctt tttagaagag aaaagaatct 2280
tgaattgtat atatttattt tgctttacag aaaaaaatgg tttcgtaaat aatttgccta 2340
ttttggttaa catagcacat ggagataatc atctgaaagt tatagggcac tgccactgct 2400
gaatcagagc atgcccaata tttgaggtgg ctctgatttc ctggcagctg aactcgggta 2460
gtccagtggc ctagctggta ccacatctat tcccatccag agacattctc tggcaagtgt 2520
tctcagctga aaagtggttg gggatgattc ttaccttggt aattaaatga agctacacat 2580
ttgggtaatc tagcaaatga agtatttttt ccctcttggc aacttgtgtc agagttactc 2640
tggtctgagt caactttcgc tggggaaaac ctatggaacc tactgcaaaa agattgtcca 2700
aaatgcctaa gaaaatactc ctctgatgca tttagccttc aaccctacct gtcttgctga 2760
agggagaaaa atgttttagt acattatagg cccagcagct tttattcatg tccaccagct 2820
agttgcacag agaatcatgt gtacctaact aaggatgatc taggataagt aactcctgtt 2880
ttatattgag tattttaggg aagtctttaa aagacttgtt ttatatctat aaatctaggt 2940
tattacaaat acaagaattt tgtaccttaa ataagcctca tttctatttc ttcttcatta 3000
attctccatc tagtcttgtg aaaaaaaaaa aaaaaaaacc ctcagagata gtctttgtga 3060
agagcttctg acagaatcac tgagtacctt ccttccccca gatgaggaag acaagggggt 3120
ctcagtgtct gtgctgtctc ctcttctctt ccccaaccaa ggactgtgcc attactgccc 3180
gtctcaactg tccatgcagg aggacagagt tgcctggtac tcttaccctt gtccctctcc 3240
taaagggagc acaaggaaac tgaagagact gaaaaagaag agagtttgta gctgaaaaag 3300
aatagggata gcaaggaaac ccagaactgc attcccctaa gtggggccat cccatgtgat 3360
tgaattgtcc atagcttgcc tatggtgaga aatgtgcatg ctccgtgagc tggtctcttg 3420
aaacaggact tatgcttcct ctatattctg gttaaatttt ccaaacacat aagttcactg 3480
agcacagatt tcttatccag agacaagtag aatctaaccg cagactgttg gcagagtttc 3540
caggcactta gccatgttcc cttcctgact caaatcccca aaggccttca ctctcactga 3600
gaatcacact actgtcccat agataaggca ggcattgaag cacctgtcgt gatcctctag 3660
gggggagaat gaaaggttat ttcctgcatt gcatcatcat agcttttaat ataatgctac 3720
agaatcatat ccacattagg ttagagttca gatatttgga tatgaatacc taacctagcc 3780
atatccatgg ccatctctgt tcttttcagc aatgttttcc atattatatt agcaatgaca 3840
gaaacagaac aagccaagat ccagtcagtt cttgggagct tgtctagagc accaagtaat 3900
gaaatagcca ggtagtggga tgactgtacc tttaaaaata cataatttag tttgcaagct 3960
atattatgct actttctatt ttccttgtta ctttatagca attcatttta ccctcacaaa 4020
gtcaatttag aaccttatca ttaactggga tgtgtagtga tatttttggg cctctgggtt 4080
tcatgtgtca atacaaggaa tatttattta aaatagattt atttagagga ggcacagtgt 4140
tgttgatctg tgtgacacca cccatatttt taaaaacctt tgtatgtttc tctaaatttg 4200
ttgttgactg aatataatag accctaccat aattcgtcaa atatcactga ttagttacat 4260
cctttgtgtg agattagctg taaagtatac tgctcttatt cttattcaga atagttaatt 4320
ggtagccaaa aatacatgta tcacagatgt taggtcgaat ttaaacagca cagtcaagtg 4380
ctatggaagt ttttctgcta aattagtaga ttaaagaata ctatacccta ggcatgggga 4440
gcagcacgtt ttcctttggt aggtaggatc tctatactag tgaacagtgc cagttccaca 4500
ctttggactt agaactgttc tctagttatt gtaacacaga atactgtcaa tccctaattt 4560
acttaatgtt acttattgga agtggggctg atgaaatacg cacaggaggg aaatctactg 4620
tgtttaggca caggcagccc cagtgtataa ggagatcata ttccaaaagg ttgtcagttg 4680
gttgtttgca acctggaatg tattttcctt tagagaccag gttatccatg gtggttaggc 4740
ccctagagca gctggaaaag atgatcaaac caataggtta gctgacatcg aataatgtaa 4800
taggtttgct aaagaatcta accatcaaat ataatattgt ttccagggag ggtgtttgtt 4860
cagagttgcc tgttagtaga atctggactg tccatcccag ccacatccca cctactggac 4920
agtaggggta gagatgccac cagactacag gtgaccagat tggcctgaaa catggtgtcc 4980
agtaaaaacg gggaggaata agtccatgac tgctggagag cttggtggta ggaggagggt 5040
gaaggagggt gagggggcct ctcatgggtc agaaacctcc agggacatcc ctcagctggt 5100
aactctgctg ttgtccggag ggttcaggct tggtgtgcct tcttctgttt gtctgacttt 5160
tggctcctat ggttcatctc ctgccctgcc caccacagaa aaggatgctg ctgcatagct 5220
tctccaatgt accagtcatt gaggccagcg cgcagcactt ttaatgtttt tagtgctagt 5280
aactgtgttt taactctcca ggaataagct gggaggttag aaaacaagaa aaaaggggga 5340
aaaaaaaacc tctccaccca ctttcctttt ttacagttac agaccctgtt gagaaaaaga 5400
agcacccaga tgggtgatgg tgatagggct tgtggtggac acagggttgt ggtaagattc 5460
ctcagtgctt cggcgagcac ctaagactat taccctggaa ctgactgctt tctagagctt 5520
gattcatgcc tctaagccag tggttctcaa agtgtggccc ctagaccagt agcatcagca 5580
tcatcgggga cctcattaga atgcagattc tcagccccct cccagacatg gtgaagtcag 5640
aagctctcag ggagggactc agcaaatgat tctaaagcgc atcaagtttg agaaccaccc 5700
taagccaatg agtctccctt ttacttccca ttagtcctcc ctcaccagat gattcctagg 5760
actcatggtg caacatacag gagccaggct ttagggaggg gaggaaagga gcctggcggc 5820
aggggtctct gcaatggcaa gaaagtgagt gaatccagcg ctgccacctg gttatcccct 5880
tctcgttgaa aatagcagcg caggtgcaaa ttcctaaaca caggctgcct gcactcggtg 5940
ttacctcggg ctgaggcatg ataatatttt attttttaaa gctgtgagga aatgaaagtg 6000
aggctttggt tggggcgagg cagttgtgaa agaaattaaa ataaaagacc ctttgtaaat 6060
gcagacttag aagaaatact gaatttgtgt cagaagttct ccagtgtttg tgttaatcgt 6120
gtggtgataa tcctgtcctc cttttaaagc gaattctcta ctgaaaggtc tgctctgctt 6180
aaggagctac aaactgctct caaaagaatg aaatactgag ttccaattca gtgaggcaca 6240
gtgttggact atggcacatt tagttggagt cggggggagg tcaggaatat gatcagataa 6300
tggattttat accttagagc aaaatctatt agtctctctc agtttatcaa tttaaatggc 6360
tttaggctta tagggggtgt aaactttaag aatataattc tcccattcaa gtttacagca 6420
aacatctagc caccttcaaa acaaagaata tacagaccat catttagcaa tactaataca 6480
tgattttcct tggggatggc aggtttgaga atcctttagc aacaggacat acttccccta 6540
aattacagtg aattatttat aacgagataa agctttcagg tacaagctga aggtggggtg 6600
tctaacaact aaaaactatc actaaatctc aaagagaaag ttcttgcaaa atatgtaaag 6660
ttcacaaggt gcagacattt tccttcttta ggcttttatc taaggaaggg ctatgaaact 6720
gggcccatct gtatacaggc tcaaatttac gtttttaaag gaagaatcta tgcagctgag 6780
gcttattgca ggaaatactt catttgtatg taaatattat tgtaaataaa taggaggctg 6840
tatatttttg aagttgttga tctgcactga aatagaagtc tctaggatct gcatataaac 6900
aataaatgtt tcctagaaat tagtggtttt gtttgggaat tagaaaaatt tacattctct 6960
cccaggtaac atagttctct caatgtaaac ttggaacctg aaaccctact taatttagag 7020
aaagaaatgt ctgagaaata gttcccttga tttcttatgc tggcattaag atcaattatt 7080
taacagatag tcctctgaga tacaagtaag gaatatttag acacaaatcc ttgtcctata 7140
gaaacagata tacccaaatg acaaattgtt tcttcataat gatcttccat ttttaacact 7200
gataccacta cacagtacat atgaaaacag aagctgggga gaagaatgtt tttttcacaa 7260
ttgaattgct gcatttctca aactttggga tctatgaaag cggggagagg gaacctgaat 7320
attaattatg agcacaaatt tgaaggaaag aaaacaaaga accattatct aatcaagctt 7380
tgaaagtcct gcatgtttgc cttttatttt agtgttgacg ccaacataga ctgtctaagg 7440
tatttttttc cccaaacact tgaatcttgg tcgttggtat gtaatccact ctctagagtc 7500
cagtgtactt tagacttcat ctgagtccaa tacatgtacc acactactgt tttattaatg 7560
taaaaacctt gtaaatgaat ttcagatggg tgatttaagt gagtcacaag tcacaaaact 7620
ttgctattca tagttaatca aatagaactg ggtttttttt ttcagagtgt ggtgtaaata 7680
aagaaatata agaagttctg ttctataact gctctgttaa catagttttt aaacattaaa 7740
aaatgtgaac taaaagta 7758
<210>11
<211>5206
<212>DNA
<213>Artificial Sequence
<400>11
cggtcctgtg cgcacgcatc gcacacgccg gcgccttcct ttgggagccc gggccggtgg 60
cgcgggcgct cggcaaatgg agtgggtgct cgcggaagcg ctgctctcgc agagccggga 120
cccccgggcc ctgcttgggg cgctgtgcca aggggaggca tccgcggagc gcgtggagac 180
gctgcgcttc cttctgcagc ggctcgagga cgaggaggcg cgcggcagcg ggggcgcagg 240
cgcgctcccg gaggcggcgc gcgaggtggc tgcagggtac ctcgtgccac tgctgcggag 300
cctgcgcgga cgccccgcgg gcggcccgga ccccagtctg cagcctcgcc accgccggcg 360
cgtgctgagg gcggcgggcg cggccctgcg ctcgtgcgtc cgcctggccg ggcgtccgca 420
gctggcggcc gcgctggctg aggaggcgct gcgcgatctg ctcgccgggt ggcgcgcgcc 480
tggcgccgag gctgccgtgg aagtgctagc agccgtcggg ccatgtttgc ggccccgcga 540
ggacgggccg ctactggagc gcgtggcggg gaccgccgtc gccctggcgc tgggcggggg 600
cggggacggg gatgaggccg ggcctgccga ggacgcggcg gcgctggtgg ccgggcgact 660
gctgccagtg ctggtccaat gtggcggggc ggcgctgcgg gccgtgtggg gcgggctggc 720
cgcgcctggg gcgtccctgg ggtccggccg cgtagaggag aagctgctgg tcctgagcgc 780
cctggccgag aagctgttgc ccgagcccgg cggcgaccgc gcccgcggcg cgcgcgaggc 840
gggcccggac gcccggcgct gctggcgctt ctggaggacg gtgcaggcgg ggctgggcca 900
ggcggacgcc ctgacgcgca agcgagcgcg ctacctgctg cagagggcgg tggaggtgtc 960
ggcggagctg ggggccgact gcacctgcgg gccccaggaa ggaaacggcc caagtctgtt 1020
ttggtggtct gagaggaaaa aagatgagct tctaaagttt tgggaaaatt atattttaat 1080
tatggagact ttagaaggaa atcagataca tgttataaag ccagttttac caaagctaaa 1140
caatctgttt gaatatgcgg tgtcagagga aaatggatgt tggctctttc acccatcctg 1200
gcatatgtgt atttataaaa gaatgtttga aagtgaaaac aaaatcctgt ccaaagaagg 1260
tgttatccat tttttggagc tgtatgaaac aaagattctt ccattttcac cagaattttc 1320
tgagtttatt attggaccat taatggatgc gctttcagag agctctctgt atagcaggtc 1380
cccaggccag ccaataggaa gctgttctcc attgggactg aaattacaga agtttttagt 1440
cacttatatt tctcttcttc cagaagaaat aaagagtagc ttcctattga agtttattcg 1500
gaagatgaca agtaggcatt ggtgtgctgt tcccattttg tttctatcta aggctttggc 1560
aaatgtccca agacataagg ccctgggtat agatgggctt cttgctctca gggatgttat 1620
tcattgcact atgatcacac atcagattct cctgagaggg gcagcccaat gctaccttct 1680
tcaaacagct atgaatttgc tagatgtgga gaaagtgtca ctttctgatg tctcaacttt 1740
tctcatgtct ctgagacaag aggaatcctt aggacgagga acttcattgt ggacagagct 1800
gtgtgactgg ctacgtgtta atgaaagcta ttttaagcca tcccctacgt gtagctccat 1860
tggacttcac aagacatctt taaatgctta tgtaaagagc attgttcaag agtatgttaa 1920
gtcatctgct tgggaaacag gagaaaactg ctttatgcct gattggtttg aagccaagct 1980
tgtttctctg atggtcttgc tggctgtgga tgtggaagga atgaagactc agtatagcgg 2040
aaagcagaga acagagaatg tattgcggat attcttagac cctcttctgg atgtgcttat 2100
gaagtttagt accaatgcct acatgccctt gctgaagact gacagatgcc tccagctgct 2160
gttgaagctg ttgaacacat gcaggttgaa aggttccagt gcccaagatg atgaggtgtc 2220
tactgttctt cagaactttt tcatgtctac tacagagagc atttctgaat ttattctcag 2280
aagacttact atgaatgagc taaatagtgt ttcagatctg gatcgttgcc atttatacct 2340
gatggtgtta actgagctta taaatctgca tttgaaggtt gggtggaaaa ggggtaaccc 2400
tatctggaga gttatttctc ttttgaaaaa tgcatccatt cagcatcttc aagagatgga 2460
cagtggacag gagccaacag ttggaagtca gattcagaga gtagtgagca tggctgcctt 2520
ggccatggtg tgtgaggcca tagaccagaa gcctgagctg cagctggact ctctccatgc 2580
tgggcccctg gaaagcttcc tttcctctct tcagctcaat cagacgctgc agaagcccca 2640
cgcagaggag cagagcagtt atgctcaccc cttggagtgc agcagtgttt tggaagaatc 2700
gtcatcttcc caaggatggg gaaaaatagt tgcacaatat attcatgatc aatgggtgtg 2760
cctctctttc ctgttgaaaa aatatcacac ccttatacca accacaggga gtgaaattct 2820
ggaaccgttt ctacctgccg ttcagatgcc aataaggact ttgcagtctg cactagaagc 2880
cctcacagtt ctttcttctg atcaagtttt accagtgttc cattgcttga aagtgttggt 2940
tcccaagctt ctgacttcct ctgaatcact ctgcatagag tcttttgaca tggcgtggaa 3000
aattatatct tctttaagca acactcagct gatattctgg gctaatttaa aagcttttgt 3060
tcagtttgtt tttgataaca aagttcttac cattgctgcc aaaatcaagg gccaggcata 3120
tttcaaaata aaagagatta tgtacaagat aattgaaatg tctgctataa agactggagt 3180
cttcaataca ctgataagtt actgctgtca gtcttggata gtgtctgctt caaatgtgtc 3240
ccaaggatct ttatcaagtg ctaaaaatta tagcgaactt atccttgagg cttgtatatt 3300
tggaactgtg tttaggcgtg atcaaagact tgttcaggat gtacagacct tcatagaaaa 3360
ccttggacat gactgtgcgg caaatattgt tatggaaaat actaagagag aagaccatta 3420
tgtgagaatt tgtgctgtca aattcctgtg tttattagat ggctccaata tgtcccacaa 3480
gttgtttatt gaggatcttg caatcaagct attagataaa gatgaattag tgtccaagtc 3540
caaaaaacgc tactatgtga attctctaca gcacagagtg aaaaaccgag tgtggcagac 3600
tctgctggta cttttcccta gacttgacca gaatttcttg aatggaatta ttgacaggat 3660
tttccaggct ggtttcacca acaatcaagc atccataaaa tattttatag aatggattat 3720
tatattgatt cttcataaat tccctcaatt tcttccaaag ttctgggatt gtttttctta 3780
tggtgaagaa aatcttaaaa caagcatttg tacgttttta gcagttttat cacatttaga 3840
cattattact caaaatattc cagaaaagaa actaattctg aagcaagccc ttatagttgt 3900
gctgcagtgg tgtttcaatc acaattttag tgttcgactg tatgctttag ttgctcttaa 3960
gaaactctgg actgtgtgta aagtgttaag tgttgaagaa tttgatgccc tgactcctgt 4020
gattgaatcc agcctccatc aagtggaaag catgcacgga gcagggaatg ccaagaagaa 4080
ttggcaacgc attcaggagc atttcttttt tgcaacattt cacccactca aggattattg 4140
tctagagacc atattttaca tccttccacg cctttcaggc cttattgaag atgaatggat 4200
caccattgat aaatttacca gattcactga tgttccttta gctgcgggat ttcagtggta 4260
cctttctcaa actcaactta gtaaactaaa accaggtgac tggtctcagc aagacatagg 4320
tactaatttg gtcgaagcag ataaccaagc ggagtggacc gacgttcaga agaagattat 4380
cccgtggaac agtcgtgttt ccgacttaga cctggagctc ctgtttcagg atcgtgctgc 4440
cagacttgga aagtcaatta gtagactcat cgttgtggcc tcgctcatcg acaaaccgac 4500
caatttagga ggactgtgca ggacctgtga ggtatttggg gcttcagtgc tcgttgttgg 4560
cagccttcag tgtatcagcg acaaacagtt tcagcacctc agtgtctctg cagaacagtg 4620
gcttcctcta gtggaggtaa aaccacctca gctaattgat tatctgcagc agaagaaaac 4680
agaaggttat accatcattg gagtggaaca aactgccaaa agtttagacc taacccaata 4740
ttgctttcct gagaaatctc tgctcttgtt gggaaatgaa cgtgagggaa ttccagcaaa 4800
tctgatccaa cagttggacg tttgtgtgga aattcctcaa cagggcatta tccgctccct 4860
gaatgtccat gtgagtggag ccctgctgat ctgggagtac accaggcagc agctgctctc 4920
gcacggagat accaagccat gatgtgcctt ccttagtgaa ctgctgctgc tgttcagact 4980
tttttaaaaa aaactatttg gactaaagaa acagattctg aaatttattg tgataatttg 5040
tatttctttt ttcttgcaat ttaatgccaa aagtttgcca tgtgccttaa acatattact 5100
atatattttc ccctttaata aacacttttt gttaaattgt attcttcctt taataaaata 5160
ttttaagcaa ttgtggaaat aaaacaatgg attttttaag agagaa 5206
<210>12
<211>1551
<212>DNA
<213>Artificial Sequence
<400>12
gtcaaccgcc caacggctga ggagtgcggc agcgcgccag agactggcgg gcagctccgc 60
ccgcggccgg gatgcactag gcaaagccag ctgggctcct gagtccggtg ggtacttgga 120
gaacttacta cgtctagctg gaggattgta aatgcaccaa tcagcatgct gtgtctagct 180
caagaactca agctccatga ggagatgttt cattgtcgag agcagtcatg atggcctgca 240
ctccacacaa tgcaacagag tgaaagagca ggttctgctt ctttggtgta gtcctgaagc 300
ttcctaagaa acttcacatc aggtgatgga taggagcaac cctgtaaaac cagccttaga 360
ctatttttca aacaggctgg tgaattacca gatctccgtc aagtgcagta accagttcaa 420
gttggaagtg tgtcttttga atgcagaaaa caaagtcgtg gacaaccagg ctgggaccca 480
gggccagctg aaggtgctgg gtgccaacct ctggtggccg tacctgatgc acgaacaccc 540
cgcctacctg tactcgtggg aggatggtga ttgctcacac caaagccttg gacccctccc 600
agcctgtgac ctttgtgacc aactccacct acgcagcaga caagggggct ctgtatgtgg 660
atgtgatccg tgtgaacagc tactactctt ggtatcgcaa ctacgggcac ctggagttga 720
ttcagctgca gctggccgcc cagtttgaga attggtgtaa gacatcacaa tcccattatt 780
cagagcgcgt atggagtgga aacgcttgta gggcttcacc aggtaagcgg tgttgaactt 840
tctgcttgtg tattctctct gggcagagat gccacttgcc tcccccacca tgccatctct 900
gaagaatatt acagaccatt ttggagcatg gtgaataaga aattttcacc ttaggagttc 960
agttgaatag tcatttttat atttgtgact gcaagtcact cttaggggct gtacttcctt 1020
agtactggta gcattattat ccaatggact tttatagctt tcattaggtt ttcttttgtt 1080
tttgttcttt aaagaacgtt ttacttatct tagtatttca tttttcatct atattatgag 1140
gcagtaagag tcttctgttt ttccaaagtt gagactgctt tatatttatt tcgtattgtc 1200
tacagctgta gtgttcaata cattagccac tagccacatg tggttattta aataagataa 1260
aataaaaatt ggccgggcgt ggtggctcac gccggtaatc ccagcacttt gggaggccga 1320
ggcgggcaga tcattaggtc aggagatcga gaccatcctt actaagacgg tgaaccccca 1380
tctctattaa aaatacaaaa aattagccgg gcgtggtggc gggcgcctgc agtcccagct 1440
actcaggagg ctgaggcagg agaatggcgt gaacctggga ggcagagttt gcagtgagcc 1500
gagatggcgc cactgcactc cagcctgggg gacagagcga gactccatct c 1551
<210>13
<211>3068
<212>DNA
<213>Artificial Sequence
<400>13
gtcggccatt ttaggtggtc cgcggcggcg ccattaaagc gaggaggagg cgagagcggc 60
cgccgctggt gcttattctt ttttagtgca gcgggagaga gcgggagtgt gcgccgcgcg 120
agagtgggag gcgaaggggg caggccaggg agaggcgcag gagcctttgc agccacgcgc 180
gcgccttccc tgtcttgtgt gcttcgcgag gtagagcggg cgcgcggcag cggcggggat 240
tactttgctg ctagtttcgg ttcgcggcag cggcgggtgt agtctcggcg gcagcggcgg 300
agacactagc actatgtcgg aggagcagtt cggcggggac ggggcggcgg cagcggcaac 360
ggcggcggta ggcggctcgg cgggcgagca ggagggagcc atggtggcgg cgacacaggg 420
ggcagcggcg gcggcgggaa gcggagccgg gaccgggggc ggaaccgcgt ctggaggcac 480
cgaagggggc agcgccgagt cggagggggc gaagattgac gccagtaaga acgaggagga 540
tgaaggccat tcaaactcct ccccacgaca ctctgaagca gcgacggcac agcgggaaga 600
atggaaaatg tttataggag gccttagctg ggacactaca aagaaagatc tgaaggacta 660
cttttccaaa tttggtgaag ttgtagactg cactctgaag ttagatccta tcacagggcg 720
atcaaggggt tttggctttg tgctatttaa agaatcggag agtgtagata aggtcatgga 780
tcaaaaagaa cataaattga atgggaaggt gattgatcct aaaagggcca aagccatgaa 840
aacaaaagag ccggttaaaa aaatttttgt tggtggcctt tctccagata cacctgaaga 900
gaaaataagg gagtactttg gtggttttgg tgaggtggaa tccatagagc tccccatgga 960
caacaagacc aataagaggc gtgggttctg ctttattacc tttaaggaag aagaaccagt 1020
gaagaagata atggaaaaga aataccacaa tgttggtctt agtaaatgtg aaataaaagt 1080
agccatgtcg aaggaacaat atcagcaaca gcaacagtgg ggatctagag gaggatttgc 1140
aggaagagct cgtggaagag gtggtggccc cagtcaaaac tggaaccagg gatatagtaa 1200
ctattggaat caaggctatg gcaactatgg atataacagc caaggttacg gtggttatgg 1260
aggatatgac tacactggtt acaacaacta ctatggatat ggtgattata gcaaccagca 1320
gagtggttat gggaaggtat ccaggcgagg tggtcatcaa aatagctaca aaccatacta 1380
aattattcca tttgcaactt atccccaaca ggtggtgaag cagtattttc caatttgaag 1440
attcatttga aggtggctcc tgccacctgc taatagcagt tcaaactaaa ttttttgtat 1500
caagtccctg aatggaagta tgacgttggg tccctctgaa gtttaattct gagttctcat 1560
taaaagaaat ttgctttcat tgttttattt cttaattgct atgcttcaga atcaatttgt 1620
gttttatgcc ctttccccca gtattgtaga gcaagtcttg tgttaaaagc ccagtgtgac 1680
agtgtcatga tgtagtagtg tcttactggt tttttaataa atccttttgt ataaaaatgt 1740
attggctctt ttatcatcag aataggaaaa attgtcatgg attcaagtta ttaaaagcat 1800
aagtttggaa gacaggcttg ccgaaattga ggacatgatt aaaattgcag tgaagtttga 1860
aatgttttta gcaaaatcta atttttgcca taatgtgtcc tccctgtcca aattgggaat 1920
gacttaatgt caatttgttt gttggttgtt ttaataatac ttccttatgt agccattaag 1980
atttatatga atattttccc aaatgcccag tttttgctta atatgtattg tgctttttag 2040
aacaaatctg gataaatgtg caaaagtacc cctttgcaca gatagttaat gttttatgct 2100
tccattaaat aaaaaggact taaaatctgt taattataat agaaatgcgg ctagttcaga 2160
gagattttta gagctgtggt ggacttcata gatgaattca agtgttgagg gaggattaaa 2220
gaaatatata ccgtgtttat gtgtgtgtgc ttatttgttt gaatgatttt attttccatt 2280
tctcaaaggt tttatttttt tggttagggc cttaaaattt caggactgtg attattagta 2340
tgtgtgccta aggaactttt tgagtcactc ttaagaaagt gaaactgaag agtctaagtg 2400
ataactatag gattaagtca gaattgtttt tcctgtcatt tgttggaagc ttcttgagtt 2460
ctgttattag cattcaggga attgataccc atcaacttga atggaaaatc gtttgtaggt 2520
attacttaag tgaatgttaa gagttccacc ctgagtggta atctaaggct gtgcagtcag 2580
ttacttcaga ctgctcagaa tagttcatta gaaaggtaac aaatgagaaa tgtattatta 2640
tacagttcta tagtagtgaa gtgatggaat acctttctta cttttgtgga gttacatctg 2700
atgctaagaa tttgacctcc aactaagcaa acattttaat gagcaaaagt tagtgttatt 2760
aaagtttttt tatgatagat ccaaattgag gacctgtgtc ctgtttttat aagattgcaa 2820
cccagctatg ctcatttgtt tatgttttgt atatggctgc ttttgtgtta cagtggtaga 2880
gtttagtagt taggacagag acctgcaaag caaaataatt tacagtctgg ccctttacag 2940
aaaagtttgc tgactcatgg tcaaaataaa tgaaaatttt ttgtgttagg gttgttaagc 3000
tagggttctt tttggtatca tatgcttatt ttatgtaaat ctctcaataa aaaattattt 3060
ttaagaga 3068
<210>14
<211>9710
<212>DNA
<213>Artificial Sequence
<400>14
agttctcggg cgtacggcgc ggcctgtcct actgccgccg gcgccgcggc cgtcatgggg 60
ttcctgaaac tgattgagat tgagaacttt aagtcgtaca agggtcgaca gattatcgga 120
ccatttcaga ggttcaccgc catcattgga cccaatggct ctggtaagtc aaatctcatg 180
gatgccatca gctttgtgct aggtgaaaaa accagcaacc tgcgggtaaa gaccctgcgg 240
gacctgatcc atggagctcc tgtgggcaag ccagctgcca accgggcctt tgtcagcatg 300
gtctactctg aggagggtgc tgaggaccgt acctttgccc gtgtcattgt aggaggttct 360
tctgagtaca agatcaacaa caaagtggtc caactacatg agtacagtga ggaattagag 420
aagttgggca ttctcatcaa agctcgtaac ttcctcgttt tccagggtgc tgtggaatct 480
attgccatga agaaccccaa agagaggaca gctctatttg aagagattag tcgttctggg 540
gagctggcgc aggagtatga caagcgaaag aaggaaatgg tgaaggctga agaggacaca 600
cagtttaatt accatcgcaa gaaaaatatt gcggctgaac gcaaggaagc aaagcaggag 660
aaagaagagg ctgaccggta ccagcgcctg aaggatgagg tagtacgggc tcaggtacag 720
ctgcagctct ttaagcttta ccataatgaa gtggaaattg agaagctcaa caaggaactg 780
gcctcaaaga acaaggagat cgagaaggac aagaagcgta tggacaaggt ggaggatgaa 840
ctgaaggaga agaagaagga gctgggcaaa atgatgcggg agcagcagca gattgagaag 900
gagatcaagg agaaggactc agaattgaac cagaagcggc ctcagtacat caaagccaag 960
gagaacacct cccacaaaat caagaagctg gaagcagcca agaagtctct gcagaatgct 1020
cagaagcact acaagaagcg taaaggtgac atggatgagc tggagaagga gatgctgtca 1080
gtggagaagg ctcggcagga gtttgaagaa cggatggaag aagagagtca gagtcagggc 1140
agagatttga cgttggagga gaatcaggtg aagaaatacc accggttgaa agaagaagcc 1200
agcaagagag cagctaccct ggcccaggag ctggagaaat tcaatcgaga ccagaaagct 1260
gaccaggacc gtctggatct ggaagaacgg aagaaagtag agacagaggc caagatcaag 1320
caaaagctgc gggaaattga agagaatcag aagcggattg agaaactgga ggaatacatc 1380
accactagca agcagtccct agaagagcag aagaagctag agggggagct gacagaggag 1440
gtggagatgg ccaagcggcg tattgatgaa atcaataagg agctgaacca ggtgatggag 1500
cagctagggg atgcccgcat cgaccgccag gagagcagcc gccagcagcg aaaggcagag 1560
ataatggaaa gcatcaagcg cctttaccct ggctctgtgt acggccgcct cattgaccta 1620
tgccagccca cacaaaagaa gtatcagatt gctgtaacca aggttttggg caagaacatg 1680
gatgccatta ttgtggactc ggagaagaca ggccgggact gtattcagta tatcaaggag 1740
cagcgtgggg agcctgagac cttcttgcct cttgactacc tggaggtgaa gcctacagat 1800
gagaaactcc gggagctgaa gggggccaag ctagtgattg atgtgattcg ctatgagcca 1860
cctcatatca aaaaggccct gcagtatgct tgtggcaatg cccttgtctg tgacaacgtg 1920
gaagatgccc gccgcattgc ctttggaggc caccagcgcc acaagacagt ggcactggat 1980
ggaaccctat tccagaagtc aggagtgatc tctggtgggg ccagtgacct gaaggccaag 2040
gcacggcgct gggatgagaa agcagtagac aagttgaaag agaagaagga gcgcttgaca 2100
gaggagctga aagagcagat gaaggcaaaa cggaaagagg cagagctgcg tcaggtgcag 2160
tctcaggccc atggactgca gatgcggctc aagtactccc agagtgacct agaacagacc 2220
aagacacgac atctagccct gaatctgcag gaaaaatcca agctggagag tgagctagcc 2280
aactttgggc ctcgcattaa tgatatcaag aggatcattc agagccgaga gagggaaatg 2340
aaagacttga aggagaagat gaaccaggta gaggatgagg tgtttgaaga gttttgtcgg 2400
gagattggtg tgcgcaacat ccgggagttt gaggaagaaa aggtgaaacg gcagaatgaa 2460
atcgccaaga agcgtttgga gtttgagaat cagaagactc gcttgggcat tcagttggat 2520
tttgaaaaga accaactgaa ggaggaccaa gataaagtac acatgtggga gcagacagtg 2580
aaaaaagatg aaaatgagat agaaaagctc aaaaaggagg aacaaagaca catgaagatc 2640
atagatgaga ccatggctca gctacaagac ctgaagaatc agcatctggc caagaagtcg 2700
gaagtgaatg acaagaatca tgagatggag gagattcgta agaaactcgg gggcgccaac 2760
aaggaaatga cccatttaca gaaggaggtg acagccattg agaccaagct tgaacagaag 2820
cgcagtgacc gtcacaactt gctacaggcc tgtaagatgc aggacattaa gttgccactg 2880
tcaaaaggca ccatggatga tattagtcag gaagagggta gctcccaggg ggaggactca 2940
gtgagtggtt cacagagaat ttccagtatc tatgcacgag aggccctcat tgagattgac 3000
tacggtgatc tgtgtgagga tctgaaggat gcccaggctg aggaagagat caagcaagag 3060
atgaacacac tgcagcagaa gctgaatgag cagcagagtg tgcttcagcg tattgccgcc 3120
cccaacatga aggccatgga aaagctggaa agtgtccgag acaagttcca ggagacctca 3180
gatgagtttg aagcagcccg aaagcgagca aagaaggcca agcaggcatt cgaacagatc 3240
aagaaggagc gctttgaccg cttcaatgct tgttttgaat ctgtggctac caacattgat 3300
gagatctata aggccctgtc ccgcaatagc agtgcccagg cattcctggg ccctgagaac 3360
cctgaagagc cctacttgga tggcatcaac tacaactgtg tggctcctgg gaaacgcttc 3420
cggcctatgg acaacttgtc aggcggggag aagacagtgg cagctctggc cctgctcttt 3480
gccatccaca gctacaagcc agcccccttc ttcgtcctgg atgagattga tgctgccttg 3540
gataacacca acattggcaa ggtggcaaat tacatcaagg agcagtcgac ttgcaacttc 3600
caggccatcg tcatctctct caaggaggag ttctacacca aggccgagag cctcattgga 3660
gtctatcctg agcaagggga ctgtgtgatc agcaaagtcc tgaccttcga cctcaccaag 3720
tacccagatg ccaaccccaa ccccaatgag cagtagcagt atttttgccc tcccgccctg 3780
tctggatccc taagctgtcc ctctcccaat ctctggatat ttgactccca accttccccc 3840
tacctcctgg ccctttttgg tgtagtcatg ggatttaggc actgctaatc aagcatgaag 3900
aggaacagag gtgatgttag gtctggagca aaaattcctg aacgacaggg agtattctgg 3960
cctctgaaag gaggtgctga gctgaacagg gccatctgtt catcacacac acccccttcc 4020
tccccctcat cacccataat cgtgggcccc ttgggcctct tgcccactgt gtgtgtgggt 4080
atgtatgtgt gtatgtatgt atccgcatgt gtgcatgtga gtatgtttgc aaaataataa 4140
aggatattgg agacctgttt tagaaggagc ctaggctgaa tttgattcca agagagctta 4200
ggatgacagc acccctgagc tgggcaaagg tactcaggac ctcataggag tcttaggcag 4260
ttacctgaaa ctgccttcat tcactcattt gtgtattcat tcatttatgt attcatcaga 4320
cacataccga acaccctcta tttgtcaggc tctgtgcttg gaatacagag ttgaatcaga 4380
catgatctct accctcctag taaggagata cagtgggttc atgaatgact atagttagct 4440
gaatgtcata tgtactttga atttgagaag tgggtgatcc cctctaggct tcctggaggt 4500
cacatttaag ctagaccttg acaaattggt aggatttggt caggcactag gagtggagca 4560
tgagctctgg ggacagacag ttatgggttc tggtcccact ttttatcact tactagttgt 4620
ttgaccttgg gcaagtcatt tgaccttctg tgcctcagtt tcctcatctg taaaatgggg 4680
ctaacaatat tacctacctc ataggattta atgatgtcaa gctcctcact ggaggcctta 4740
tcccttcgtg gagcccacta ggtgccgacc cctcagaata taaccctcat gcctggaccc 4800
ctgagagctt ctgatcccag ctattaggga cagaagaagc ctccaaatct ggaaggtgct 4860
gaatgccctg ctgactggga aagtttcagg gcactgatgg ggtctacctg gtaagcggag 4920
ggcctgagga aacctgtagc ttcaatcatg tctggtaacc gggtgcctga gccccaatct 4980
gggttgtgag gaaatagggg agaggtatcc tgggccacat cccagcctaa cacctgtgag 5040
gttcatttta ggaactaacc tcattagcta taaggatcat gcagaggcag caaagccggg 5100
tgcgatgagc tcagccttta ctcattcaca tacaccatca cactttaatt ccaatctgta 5160
tattgctttt taaaagttaa gtccattcta attacccaaa tatgcatgaa ttcattctcc 5220
ttttgagaag ttagattgtt aaagatagtc tcattcagct accaaccact ccttgatcct 5280
tcccttctta gtggctgttg tttgttgtac ttccgtttag actttgtttt aatgcttgta 5340
cgtacatatg tgaactcatt ggaaatattg tgtgtttaat gcaaatgata tattgaattg 5400
tttagcaatt tgttttcttt gcttaacgat gtttttgaga tctgtgcatg ttacttaatg 5460
tagctcaatc catcttctgt aattgctgta tagattgtca tcatatgatt accacatttt 5520
acttacgcat ttcttttgtg atggacatta agactgtttt taggttttgc tattacaaaa 5580
tactacacag gagcatcact atgcctgtgt gaaagtatat gtatgaaagt ttacctaggg 5640
ttgattccta gaagtggaat tgcaaagtca taggatattt atatattggt ttttaataat 5700
acttccaaat tgccctcctg tactatttac tcagtatttt tcttgaggtt gatctgaggt 5760
ctaacattgt tatcctatat cattttcatc ccaagtagtg atatctgtga aatcacaggt 5820
ttgatgtgtg ctaattatgt attcttctaa tacatattaa aagacataac tatcaaaaca 5880
aaataaattt gtctgttttc aaccaaagaa gtcacgtacc actggtggta ctgtgtgcca 5940
taatttggca aatgctggcc tttatggacg agcacaattc gggggtcaga cctggttcaa 6000
attctagctg tagaaacttg tgcaagttac ttcacctctg agcctaagtt tccacatctg 6060
taaaaggaga taataaacac ctaccttgca gtagtgaagc aaagagaaaa ttaaatatat 6120
atgaagcaat ttggctggca tctagatcat tcacagccct ttaaaggtca cctttgctgt 6180
tctccccact ttacagataa ggaaactgag gcccaaaaag gtttgaaccc aggtcttcca 6240
agtcattcaa gtgctttctc cactgtacag gtggttatca accttggctg cgcatcagaa 6300
tcgtttgtaa agctttttct ttttcctttt taaaaagtaa agcaatatat acacaggtaa 6360
aaaaataaaa tagtacagaa gggcttataa tgagaagcag cagttccctg cttgcacccc 6420
cacatccaaa ggatgtggag ctctttaaaa ataaattgct ctggtcccac ctctggaaat 6480
ctgattcagc cagcatggat aataacccag ataactaacc cctacctcac aggataaaaa 6540
ggattacatg agatgcctta ggctaaggcc ctggcacaca ggaacacatg tgctacaaag 6600
gagctttggg gacttaagtc ctgaggatcc aggaggtgag gtgacttgtc caagattcca 6660
ctggtttagt ggcagagcct agacttccac tcggatctat ttagtgcttg ccccctgctc 6720
tctcctgtcg tgccccacca cctcctggca tcacagggca accgttgtca aggctatgct 6780
cacgggaggc tgggcaccac agtgtttcca agagcaagct ggatccgagt agattcccta 6840
gggcttgttg gaggaactag tttgactccc ttatactgtg gacgcagtag ccttgctgta 6900
gggagttgaa gagtactcca caacagtatc ttaagtttaa ctgggcactt ccctctggaa 6960
atcacagtgt tgtgcaccag gaacacaaag atgagtcaaa tctttatcct gcctttgagg 7020
agctcactgt ttagttgggg aaaccatttg taaaacagcc attaaccata cagtgtgatc 7080
aacactgaca ggagcacagg aaaaacatct agcttatgtg aagattcaga gaaggcatcc 7140
tgtagtctag gtggtgatac ctgaactgag tcttgaggga cgggtaggaa ttagccagtt 7200
gaggaagtag aaggaatttc cagatattgg aaacagtatg catgaagaca tgaaggcaag 7260
aaacagcaaa acaaatactg aagcatgaag attcctgggg tggggggaaa gcagcaagaa 7320
aaggtagaga ggaaccagat tggaagaggg tcgtaaatgc atggctacag aattcagatt 7380
tgttttgtag gacagtgtgg ttcccaaact ggctgtatac cacaaacagg tacggcattc 7440
tgggccccgg cccctaaaac attcattaag tctggggtga agatttggaa tcttgaatgc 7500
ttataaaggt taccacatga ctagggtaca gccagatttg gaaaccatag cttgaaggca 7560
gtgagggagc catgaaatgg tttttaatag ggggactcca gatcagatgt gaacttaacc 7620
tgtttctggc tggctagcca accagcatgg aaaacagatt aggttagatg ttcatgctgt 7680
atgtgcccgt gcctgtagct tccctgttaa tcagcttctt acactactat atttgcttat 7740
tttgtctctg aataagcttt aggcaccaca agggtgggcc tggggatatt ttgcttacca 7800
gtatagcccc tgcaaaaaag cacagtgcct gacacaaaac aggcacccag taaagttttt 7860
gaatgaatga atgcatgagt gaatccattt gtgagagagc gaatggagat gacaagatta 7920
gctaggagac tggaaaaaga ccaggaggcc tgcactaggg caaaggccag taggaataga 7980
ttggaggtgt taaggtgtga actgttaagg taagatgata acttaatgac tgattattgg 8040
atgtggaggg tgactgagag gatagaatga gtacccatga atagccatga ttcctaccct 8100
gtcccagtca tctctttcct tatccatctc tgaaacaatc tgcttacatc ctcctcagca 8160
actggaattc ctcaagttag ttagacattc tgtgtgctgt gtggtctctc actgcccccc 8220
cactccccac ccctccacaa gccattgatt cattcatcca gttcaataaa tcttggctaa 8280
gcacctccag tgtgcagtaa ggctcttcca agccaggact ctgactccct ctttcctacc 8340
tcaagagatg tttttgaggg ctttcccagg taagagtcac atctcttata caataactta 8400
tagtgagata cccagaatgt cagacttgta agggaagact gcccaaaccc cttctgaggt 8460
cctcagaggg gaattaactt cctaaggtcc gactgctagg aagtgttgga gccagaaatg 8520
gaagctaggt ttcctttcta tgtcatctct ggagtcttga tcttgatcta tcccattgta 8580
gatcaggaca ggcagaggtg gtcagggaga aggtgggact taggttgaac cttgaaggtc 8640
aatgtattgg acaggtcaaa caagatggtt gccaattaca ctgccccctt ctggaaaccc 8700
ttagcaaacc tgccatgctt gcagtccctt ctaaggggtt tccttagcat aagttgccat 8760
gctctgtacc atgtgacctc acaatcctgg ccacagatag ctagatgtgg atagtgtctg 8820
gttcaagggc aaccaatctc taggctggcc agtggcctgt tagctggact ggcataagga 8880
cttcacctta caggggtggc atgtatcaaa tggcaaatgt atgaaacaac cagatctttc 8940
agggaggcag aatgtgagct attcagaaga agtgaacgtt aattagaatt taatgaggca 9000
ttagtggtgg tggatgaggg gtggccagaa actaaacagc aaaagcaaag agaaagctgc 9060
agaaaccata agtaagcaga ggtcatgaga catttgtata atgagatcac ggagccacag 9120
ggtggcagaa gccatgaagc agcaaggcaa caatgggcta gaagccatga agcaatagga 9180
gccacgagga acagaaaccg tgagacaaaa ctgactatga gatccacaaa gcagcagaag 9240
gcttgaatag ataagatcat gagacagtag aagcgatgag actgcaagaa ccacaaggta 9300
gccagaacca tgtggcaaca tggcaacagg aatggaagag gcagcaggag ctacaatgca 9360
gaaaagccat ggattaatag gaactgaagc gccgggagcc atgaagctgc aggacccatg 9420
aggcagaaaa agccatgggc tagcatcgag gggggcagaa agaagttagt cagtagcagt 9480
aggaggagta taaatacagc cagaaaggag ttgagtcacc aatttgggaa gcactagaga 9540
agggagcaac agatgcctgc agctgagggg gtgacaagat aagccaggct ctagagctgc 9600
tttggatcat gaaccatttt caagtttctg ttcttccatg aggctgcctg tgtagctgtt 9660
cttgtcttcc ttatttccct gtgaatgctt taataaatcc ccatcactaa 9710
<210>15
<211>23
<212>DNA
<213>Artificial Sequence
<400>15
cgagagacag tataccccaa tgg 23
<210>16
<211>18
<212>DNA
<213>Artificial Sequence
<400>16
tgcgtcctcc agggtgat 18
<210>17
<211>19
<212>DNA
<213>Artificial Sequence
<400>17
atgctgatggcgaggctaa 19
<210>18
<211>19
<212>DNA
<213>Artificial Sequence
<400>18
cgtcaccctc gtgcatctt 19
<210>19
<211>25
<212>DNA
<213>Artificial Sequence
<400>19
tcaccattct gaaaaccaac ataag 25
<210>20
<211>19
<212>DNA
<213>Artificial Sequence
<400>20
ctttcccatg cctgagcaa 19
<210>21
<211>19
<212>DNA
<213>Artificial Sequence
<400>21
gccaagggtg tcccacatt 19
<210>22
<211>22
<212>DNA
<213>Artificial Sequence
<400>22
tccaaagatt cccatggtga tc 22
<210>23
<211>21
<212>DNA
<213>Artificial Sequence
<400>23
aaatgaggtt gttgggttcg a 21
<210>24
<211>25
<212>DNA
<213>Artificial Sequence
<400>24
tccttccttc agtttccctt atagg 25
<210>25
<211>20
<212>DNA
<213>Artificial Sequence
<400>25
catcaaggag tgggcattca 20
<210>26
<211>22
<212>DNA
<213>Artificial Sequence
<400>26
tgaccagttc accattcctc aa 22
<210>27
<211>18
<212>DNA
<213>Artificial Sequence
<400>27
ccatgcagct ggctgaaa 18
<210>28
<211>22
<212>DNA
<213>Artificial Sequence
<400>28
tcgctcgtat aagccaatgt tg 22
<210>29
<211>19
<212>DNA
<213>Artificial Sequence
<400>29
ggtgcccatg acccagact 19
<210>30
<211>22
<212>DNA
<213>Artificial Sequence
<400>30
tttgcatggt cctgattttt ga 22
<210>31
<211>18
<212>DNA
<213>Artificial Sequence
<400>31
tggcacggtt gctcaaca 18
<210>32
<211>23
<212>DNA
<213>Artificial Sequence
<400>32
tggacttgtg ggctaaagaa atg 23
<210>33
<211>24
<212>DNA
<213>Artificial Sequence
<400>33
tgaaggaaag aaaacaaaga acca 24
<210>34
<211>26
<212>DNA
<213>Artificial Sequence
<400>34
cgtcaacact aaaataaaag gcaaac 26
<210>35
<211>21
<212>DNA
<213>Artificial Sequence
<400>35
gggaaatgaa cgtgagggaa t 21
<210>36
<211>22
<212>DNA
<213>Artificial Sequence
<400>36
gaatttccac acaaacgtcc aa 22
<210>37
<211>26
<212>DNA
<213>Artificial Sequence
<400>37
attaggtttt cttttgtttt tgttct 26
<210>38
<211>28
<212>DNA
<213>Artificial Sequence
<400>38
tagacaatac gaaataaata taaagcag 28
<210>39
<211>21
<212>DNA
<213>Artificial Sequence
<400>39
cccagctatg ctcatttgtt t 21
<210>40
<211>23
<212>DNA
<213>Artificial Sequence
<400>40
taaattattt tgctttgcag gtc 23
<210>41
<211>23
<212>DNA
<213>Artificial Sequence
<400>41
gttttctttg cttaacgatg ttt 23
<210>42
<211>22
<212>DNA
<213>Artificial Sequence
<400>42
agaaatgcgt aagtaaaatg tg 22
<210>43
<211>16
<212>DNA
<213>Artificial Sequence
<400>43
ccctgctgtt ccaaaa 16
<210>44
<211>14
<212>DNA
<213>Artificial Sequence
<400>44
ctgggcctcc cacg 14
<210>45
<211>16
<212>DNA
<213>Artificial Sequence
<400>45
cacaatggca cctacc 16
<210>46
<211>14
<212>DNA
<213>Artificial Sequence
<400>46
ccaacaccgc cctc 14
<210>47
<211>20
<212>DNA
<213>Artificial Sequence
<400>47
ctaatctgac atcactggtg 20
<210>48
<211>16
<212>DNA
<213>Artificial Sequence
<400>48
tcctgagcaa cttgca 16
<210>49
<211>13
<212>DNA
<213>Artificial Sequence
<400>49
ccccgagacg agc 13
<210>50
<211>19
<212>DNA
<213>Artificial Sequence
<400>50
tgagaatatc accttggcc 19
<210>51
<211>20
<212>DNA
<213>Artificial Sequence
<400>51
acctcggaca aattcaggaa 20
<210>52
<211>21
<212>DNA
<213>Artificial Sequence
<400>52
taatcaagct ttgaaagtcc t 21
<210>53
<211>19
<212>DNA
<213>Artificial Sequence
<400>53
cagcaaatct gatccaaca 19
<210>54
<211>30
<212>DNA
<213>Artificial Sequence
<400>54
tggaaaaaca gaagactctt actgcctcat 30
<210>55
<211>30
<212>DNA
<213>Artificial Sequence
<400>55
tgttttgtat atggctgctt ttgtgttaca 30
<210>56
<211>31
<212>DNA
<213>Artificial Sequence
<400>56
tgagatctgt gcatgttact taatgtagct c 31
<210>57
<211>24
<212>DNA
<213>Artificial Sequence
<400>57
accatgagaa gtatgacaac agcc 24
<210>58
<211>23
<212>DNA
<213>Artificial Sequence
<400>58
cacgatacca aagttgtcat gga 23
<210>59
<211>23
<212>DNA
<213>Artificial Sequence
<400>59
tcagcaatgc ctcctgcacc acc 23

Claims (10)

1. An isolated nucleic acid molecule which is a lung nodule marker, said nucleic acid molecule being 13-15000bp in length, said nucleic acid molecule having a DNA sequence selected from one, any more or all of: (1) the coding sequence of CEACAM4 or a fragment thereof matching SEQ ID NO:43, or the complement thereof, (2) the coding sequence of S100A9 or a fragment thereof matching SEQ ID NO:44, or the complement thereof, (3) the coding sequence of FCGR1A or a fragment thereof matching SEQ ID NO:45, or the complement thereof, (4) the coding sequence of ROGDI or a fragment thereof matching SEQ ID NO:46, or the complement thereof, (5) the coding sequence of ZNF266 or a fragment thereof matching SEQ ID NO:47, or the complement thereof, (6) the coding sequence of TNFSF10 or a fragment thereof matching SEQ ID NO:48, or the complement thereof, (7) the coding sequence of MORF or a fragment thereof matching SEQ ID NO:49, or the complement thereof, (8) the coding sequence of MCEMP1 or a fragment thereof matching SEQ ID NO:50, or the complement thereof, (9) the coding sequence of TRAF5 or a fragment thereof matching SEQ ID NO. 51, or a complement thereof, (10) the coding sequence of SLC38A1 or a fragment thereof matching SEQ ID NO. 52, or a complement thereof, (11) the coding sequence of TARBP1 or a fragment thereof matching SEQ ID NO. 53, or a complement thereof, (12) the coding sequence of SMA4 or a fragment thereof matching SEQ ID NO. 54, or a complement thereof, (13) the coding sequence of HNRNPD or a fragment thereof matching SEQ ID NO. 55, or a complement thereof, and (14) the coding sequence of SMC1A or a fragment thereof matching SEQ ID NO. 56, or a complement thereof.
2. The nucleic acid molecule of claim 1, wherein said nucleic acid molecule has a DNA sequence selected from any one of the following, or a variant thereof having at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, at least 99% identity thereto, or a corresponding RNA sequence thereof:
a 1: (3) (7), (8), (12) and (13),
a 2: (1) (7), (8), (10) and (11),
a 3: (1) (5), (7), (9) and (11),
a 4: (3) (6), (7), (8), (11) and (12),
a 5: (1) (3), (6), (8), (11) and (13),
a 6: (6) (9), (10), (11), (12) and (14),
a 7: (5) (6), (7), (8), (9), (10) and (13),
a 8: (4) (8), (9), (10), (11), (13) and (14),
a 9: (1) (3), (5), (7), (10), (12) and (13),
a 10: (2) (4), (7), (9), (10), (12) and (13),
a 11: (5) (6), (7), (8), (9), (10) and (13),
a 12: (3) (7), (8), (9), (11), (12), (13) and (14),
a 13: (1) (4), (7), (8), (9), (10), (12) and (13),
a 14: (6) (7), (8), (9), (10), (11), (12) and (14),
a 15: (1) (3), (4), (7), (9), (10), (12), (13) and (14),
a 16: (1) (4), (5), (8), (9), (10), (12), (13) and (14),
a 17: (2) (3), (8), (9), (10), (11), (12), (13) and (14),
a 18: (3) (4), (6), (7), (8), (10), (11), (12), (13) and (14),
a 19: (1) (2), (3), (4), (5), (9), (10), (11), (13) and (14),
a 20: (1) (2), (4), (5), (6), (8), (9), (10), (11) and (14),
a 21: (1) (2), (4), (5), (6), (7), (8), (11), (12), (13) and (14),
a 22: (1) (2), (3), (6), (8), (9), (10), (11), (12), (13) and (14),
a 23: (1) (2), (3), (4), (5), (6), (7), (8), (12), (13) and (14),
a 24: (3) (4), (5), (6), (7), (8), (9), (10), (11), (12), (13) and (14),
a 25: (1) (2), (3), (4), (6), (7), (8), (9), (10), (11), (13) and (14),
a 26: (1) (3), (4), (5), (6), (7), (8), (9), (10), (11), (12) and (13),
a 27: (1) (2), (3), (4), (5), (7), (8), (9), (10), (11) (12), (13) and (14),
a 28: (1) (2), (3), (4), (5), (6), (7), (8), (9), (10), (12), (13) and (14),
a 29: (1) (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (13) and (14), or
a 30: (1) (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12), (13) and (14).
3. A reagent for detecting a substance in a sample
(1) The level of transcription, the level of translation, the amount of one or more of the nucleic acid molecules of claim 1; or
(2) The content or activity of the expression product of (1),
preferably, the agent is an agent used in one or more of the following methods: PCR, DNA sequencing, RNA sequencing, DNA hybridization, RNA hybridization, Southern, Northern, Western, immunoprecipitation, ELISA, restriction enzyme analysis, high resolution melting curve, mass spectrometry, preferably RT-qPCR, RNA sequencing or chip-based methods,
more preferably, the agent is selected from one or more of the following: buffer, reverse transcriptase, polymerase, dNTP, primer, probe, restriction endonuclease, fluorescent dye, fluorescent quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard and reference substance.
4. A kit comprising the reagent of claim 2 and optionally the nucleic acid molecule of claim 1, preferably the kit further comprises primers and/or probes for detecting a reference sequence.
5. Use of one or more or all genes selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
preferably, the reagent detects (1) the transcription level, translation level, amount or methylation of the gene, or (2) the amount or activity of the expression product of (1) in the sample; more preferably, the reagent is a primer or a probe,
preferably, the pulmonary nodule is a pulmonary micronode,
preferably, the gene is selected from any one of the following groups:
a 1: FCGR1A, MCEMP1, MORF, SMA4 and HNRNPD,
a 2: CEACAM4, MCEMP1, MORF, SLC38A1 and TARBP1,
a 3: CEACAM4, TRAF5, MORF, ZNF266 and TARBP1,
a 4: FCGR1A, MCEMP1, MORF, TARBP1, TNFSF10 and SMA4,
a 5: CEACAM4, FCGR1A, MCEMP1, TARBP1, TNFSF10 and HNRNPD,
a 6: SLC38A1, TARBP1, TNFSF10, TRAF5, SMA4 and SMC1A,
a 7: HNRNPD, TNFSF10, SLC38A1, MCEMP1, ZNF266, TRAF5 and MORF,
a 8: MCEMP1, ROGDI, SLC38A1, TARBP1, TRAF5, HNRNPD and SMC1A,
a 9: CEACAM4, FCGR1A, MORF, SLC38A1, ZNF266, SMA4 and HNRNPD,
a 10: MORF, ROGDI, S100A9, SLC38A1, TRAF5, SMA4 and HNRNPD,
a 14: TRAF5, MCEMP1, MORF, SLC38A1, TNFSF10, ZNF266, and HNRNPD,
a 12: FCGR1A, MCEMP1, MORF, TARBP1, TRAF5, SMA4, HNRNPD and SMC1A,
a 13: CEACAM4, MCEMP1, MORF, ROGDI, SLC38A1, TRAF5, SMA4 and HNRNPD,
a 14: MCEMP1, MORF, SLC38A1, TARBP1, TNFSF10, TRAF5, SMA4 and SMC1A,
a 15: CEACAM4, FCGR1A, MORF, ROGDI, SLC38A1, TRAF5, SMA4, HNRNPD and SMC1A,
a 16: CEACAM4, MCEMP1, ROGDI, SLC38A1, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 17: FCGR1AMCEMP1, S100A9, SLC38A1, TARBP1, TRAF5, SMA4, HNRNPD and SMC1A,
a 18: FCGR1A, MCEMP1, MORF, ROGDI, SLC38A1, TARBP1, TNFSF10, SMA4, HNRNPD and SMC1A,
a 19: CEACAM4, FCGR1A, ROGDI, S100A9, SLC38A1, TARBP1, TRAF5, ZNF266, HNRNPD and SMC1A,
a 20: CEACAM4, MCEMP1, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266 and SMC1A,
a 21: CEACAM4, MCEMP1, MORF, ROGDI, S100A9, TARBP1, TNFSF10, ZNF266, SMA4, HNRNPD and SMC1A,
a 22: CEACAM4, FCGR1A, MCEMP1, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, SMA4, HNRNPD and SMC1A,
a 23: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, TNFSF10, ZNF266, SMA4, HNRNPD and SMC1A,
a 24: FCGR1A, MCEMP1, MORF, ROGDI, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 25: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, HNRNPD and SMC1A,
a 26: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, SMA4 and HNRNPD,
a 27: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 28: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TNFSF10, TRAF5, ZNF266, SMA4, HNRNPD and SMC1A,
a 29: CEACAM4, FCGR1AMCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, HNRNPD and SMC1A or
a 30: CEACAM4, FCGR1A, MCEMP1, MORF, ROGDI, S100A9, SLC38A1, TARBP1, TNFSF10, TRAF5, ZNF266, SMA4, HNRNPD and SMC 1A.
6. Use of the agent of claim 3 and optionally the nucleic acid molecule of claim 1 or 2 for the preparation of a kit for screening for benign or malignant lung nodules; preferably, the first and second electrodes are formed of a metal,
the pulmonary nodule is a pulmonary micronode; and/or
The kit is a real-time quantitative PCR kit, an RNA sequencing kit or a kit containing a gene chip.
7. An apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of:
(1) obtaining expression measurements of one or more or all of the genes in the sample selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
(2) obtaining a score using the expression detection result by constructing a model,
(3) the benign and malignant lung nodules are screened according to the scores,
preferably, the pulmonary nodule is a pulmonary micronode.
8. A system for screening lung nodules for benign and malignant status, comprising:
a collection device for obtaining expression detection results of one or more or all of the genes selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
data processing means for obtaining a score using the detection result by constructing a model,
a judging device for screening the benign or malignant pulmonary nodules according to the scores,
preferably, the pulmonary nodule is a pulmonary micronode.
9. A method for screening the screening gene of lung nodule for benign or malignant disease includes
(1) Acquiring gene expression data of a malignant lung nodule sample and a benign lung nodule sample,
(2) genes expressed in all samples were selected for T-test analysis,
(3) comparing the gene expression difference between the malignant lung nodule sample and the benign lung nodule sample, taking the gene with the statistical P value less than 0.05 and the gene expression change more than 1.1 times as a candidate gene,
optionally (4) selecting genes having a high correlation with malignant lung nodules as a gene group I, in order of correlation between gene expression data and benign or malignant property of the sample; taking genes with high correlation with the gene group I in the residual genes as a gene group II; then pairing the genes in the gene group I and the gene group II pairwise to form a candidate gene combination,
(5) screening candidate genes by using a logistic regression statistical analysis method,
preferably, the lung nodule is a lung micronode, and the benign-malignant screening gene of the lung nodule is selected from one or more or all of the following genes: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC 1A.
10. A method of constructing a model for screening benign and malignant lung nodules, comprising:
(1) obtaining expression measurements of one or more or all of the genes in the sample selected from the group consisting of: CEACAM4, S100A9, FCGR1A, ROGDI, ZNF266, TNFSF10, MORF, MCEMP1, TRAF5, SLC38A1, TARBP1, SMA4, HNRNPD and SMC1A,
(2) calculating ROC curve and area value under curve of each candidate gene combination, establishing model by using logistic regression statistical analysis method,
preferably, the pulmonary nodule is a pulmonary micronode.
CN202010368425.3A 2020-04-30 2020-04-30 Marker for screening benign and malignant pulmonary nodules and application thereof Pending CN111378756A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010368425.3A CN111378756A (en) 2020-04-30 2020-04-30 Marker for screening benign and malignant pulmonary nodules and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010368425.3A CN111378756A (en) 2020-04-30 2020-04-30 Marker for screening benign and malignant pulmonary nodules and application thereof

Publications (1)

Publication Number Publication Date
CN111378756A true CN111378756A (en) 2020-07-07

Family

ID=71216087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010368425.3A Pending CN111378756A (en) 2020-04-30 2020-04-30 Marker for screening benign and malignant pulmonary nodules and application thereof

Country Status (1)

Country Link
CN (1) CN111378756A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112094896A (en) * 2020-08-27 2020-12-18 上海市公共卫生临床中心 Marker and kit for diagnosing active tuberculosis and application of marker and kit
CN113345517A (en) * 2021-07-17 2021-09-03 湖南科技大学 DNA hybridization information storage encryption method based on dual-probe specific separation
WO2022241599A1 (en) * 2021-05-17 2022-11-24 Excellen Medical Technology Co., Ltd. Method of identifying lung cancer with methylation biomarker genes and radiological characteristic

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110331209A (en) * 2019-08-27 2019-10-15 北京泱深生物信息技术有限公司 Application of the biomarker in adenocarcinoma of lung diagnosis
CN112301130A (en) * 2020-11-12 2021-02-02 苏州京脉生物科技有限公司 Marker, kit and method for early detection of lung cancer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110331209A (en) * 2019-08-27 2019-10-15 北京泱深生物信息技术有限公司 Application of the biomarker in adenocarcinoma of lung diagnosis
CN112301130A (en) * 2020-11-12 2021-02-02 苏州京脉生物科技有限公司 Marker, kit and method for early detection of lung cancer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
仲崇明: "Logistic 回归在多指标联合诊断肺癌中的应用", 中国卫生统计, vol. 33, no. 6, 31 December 2016 (2016-12-31), pages 1045 - 1046 *
陈国连等: "肺癌术后静脉血栓栓塞影响因素的 Logistic 回归模型构建及临床预测价值", 护理实践与研究, vol. 18, no. 24, 31 December 2021 (2021-12-31), pages 3635 - 3639 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112094896A (en) * 2020-08-27 2020-12-18 上海市公共卫生临床中心 Marker and kit for diagnosing active tuberculosis and application of marker and kit
CN112094896B (en) * 2020-08-27 2024-02-09 上海市公共卫生临床中心 Active tuberculosis diagnosis marker, kit and application thereof
WO2022241599A1 (en) * 2021-05-17 2022-11-24 Excellen Medical Technology Co., Ltd. Method of identifying lung cancer with methylation biomarker genes and radiological characteristic
CN113345517A (en) * 2021-07-17 2021-09-03 湖南科技大学 DNA hybridization information storage encryption method based on dual-probe specific separation

Similar Documents

Publication Publication Date Title
CN109790583B (en) Methods for typing lung adenocarcinoma subtypes
KR102023584B1 (en) PREDICTING GASTROENTEROPANCREATIC NEUROENDOCRINE NEOPLASMS (GEP-NENs)
RU2721916C2 (en) Methods for prostate cancer prediction
DK2644712T3 (en) A method for diagnosing neoplasms
KR102103887B1 (en) A method for assessing risk of hepatocellular carcinoma using cpg methylation status of gene
KR20150090246A (en) Molecular diagnostic test for cancer
CN110241220B (en) Peripheral blood transcriptional gene marker for breast cancer detection and application thereof
CN112725454B (en) Bladder cancer patient overall survival rate prognosis model
CN111378756A (en) Marker for screening benign and malignant pulmonary nodules and application thereof
KR20140044341A (en) Molecular diagnostic test for cancer
BRPI0708534A2 (en) molecular assay to predict recurrence of colon cancer dukes b
AU2018210695A1 (en) Molecular subtyping, prognosis, and treatment of bladder cancer
CN101258249A (en) Methods and reagents for the detection of melanoma
WO2010030365A2 (en) Thyroid tumors identified
KR20140006898A (en) Colon cancer gene expression signatures and methods of use
CN107723368B (en) Group of genes for renal cell carcinoma molecular typing and application thereof
CA2726736A1 (en) Composition and method for determining esophageal cancer
CN109423515B (en) Gene markers for liver cancer detection and application thereof
AU2008203226A1 (en) Colorectal cancer prognostics
WO2019005847A1 (en) Biomarkers for the diagnosis and treatment of fibrotic lung disease
AU2004205270B2 (en) Colorectal cancer prognostics
KR101985864B1 (en) Composition for detecting Breast Cancer and Ovarian Cancer and uses thereof
US20230022236A1 (en) Chemical compositions and methods of use
US20030175761A1 (en) Identification of genes whose expression patterns distinguish benign lymphoid tissue and mantle cell, follicular, and small lymphocytic lymphoma
CN106460047B (en) For identifying the method and kit of colorectal polyp and colorectal cancer before cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination