WO2022205775A1 - Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium - Google Patents
Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium Download PDFInfo
- Publication number
- WO2022205775A1 WO2022205775A1 PCT/CN2021/117149 CN2021117149W WO2022205775A1 WO 2022205775 A1 WO2022205775 A1 WO 2022205775A1 CN 2021117149 W CN2021117149 W CN 2021117149W WO 2022205775 A1 WO2022205775 A1 WO 2022205775A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- immune
- index
- individual
- sequencing
- sequence
- Prior art date
Links
- 230000036039 immunity Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000012163 sequencing technique Methods 0.000 claims abstract description 91
- 210000002865 immune cell Anatomy 0.000 claims abstract description 50
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 45
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 45
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 45
- 101150008942 J gene Proteins 0.000 claims abstract description 23
- 230000003321 amplification Effects 0.000 claims description 44
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 44
- 102100035360 Cerebellar degeneration-related antigen 1 Human genes 0.000 claims description 25
- 101100112922 Candida albicans CDR3 gene Proteins 0.000 claims description 22
- 101150117115 V gene Proteins 0.000 claims description 21
- 210000004369 blood Anatomy 0.000 claims description 14
- 239000008280 blood Substances 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 11
- LUNQZVCDZKODKF-PFVVTREHSA-L copper acetic acid (2S)-6-amino-2-[[(2S)-2-[(2-aminoacetyl)amino]-3-(1H-imidazol-5-yl)propanoyl]amino]hexanoate (2S)-6-amino-2-[[(2S)-2-[(2-amino-1-oxidoethylidene)amino]-3-(1H-imidazol-5-yl)propanoyl]amino]hexanoate hydron Chemical compound [Cu+2].CC(O)=O.CC(O)=O.NCCCC[C@@H](C([O-])=O)NC(=O)[C@@H](NC(=O)CN)CC1=CN=CN1.NCCCC[C@@H](C([O-])=O)NC(=O)[C@@H](NC(=O)CN)CC1=CN=CN1 LUNQZVCDZKODKF-PFVVTREHSA-L 0.000 claims description 4
- 108010038983 glycyl-histidyl-lysine Proteins 0.000 claims description 4
- 108020004414 DNA Proteins 0.000 claims 2
- 102000053602 DNA Human genes 0.000 claims 1
- 239000000523 sample Substances 0.000 description 23
- 108090000623 proteins and genes Proteins 0.000 description 15
- 210000005259 peripheral blood Anatomy 0.000 description 13
- 239000011886 peripheral blood Substances 0.000 description 13
- 238000011156 evaluation Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 8
- 210000005006 adaptive immune system Anatomy 0.000 description 7
- 210000000265 leukocyte Anatomy 0.000 description 7
- 210000004698 lymphocyte Anatomy 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 201000010099 disease Diseases 0.000 description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000012165 high-throughput sequencing Methods 0.000 description 6
- 238000013399 early diagnosis Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 210000000987 immune system Anatomy 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 210000001744 T-lymphocyte Anatomy 0.000 description 4
- 239000000427 antigen Substances 0.000 description 4
- 108091007433 antigens Proteins 0.000 description 4
- 102000036639 antigens Human genes 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 210000000207 lymphocyte subset Anatomy 0.000 description 4
- 230000008707 rearrangement Effects 0.000 description 4
- 210000001519 tissue Anatomy 0.000 description 4
- 108060003951 Immunoglobulin Proteins 0.000 description 3
- 150000001413 amino acids Chemical class 0.000 description 3
- 238000000205 computational method Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 230000004727 humoral immunity Effects 0.000 description 3
- 102000018358 immunoglobulin Human genes 0.000 description 3
- 229940027941 immunoglobulin g Drugs 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 208000023275 Autoimmune disease Diseases 0.000 description 2
- 108091008875 B cell receptors Proteins 0.000 description 2
- 208000035473 Communicable disease Diseases 0.000 description 2
- 108010028780 Complement C3 Proteins 0.000 description 2
- 102000016918 Complement C3 Human genes 0.000 description 2
- 108010028778 Complement C4 Proteins 0.000 description 2
- 238000007400 DNA extraction Methods 0.000 description 2
- 238000002965 ELISA Methods 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- 206010035226 Plasma cell myeloma Diseases 0.000 description 2
- 108091008874 T cell receptors Proteins 0.000 description 2
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 2
- 210000003719 b-lymphocyte Anatomy 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000000601 blood cell Anatomy 0.000 description 2
- 210000001185 bone marrow Anatomy 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 230000007969 cellular immunity Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000011109 contamination Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000008260 defense mechanism Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 238000005206 flow analysis Methods 0.000 description 2
- 238000000684 flow cytometry Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000015788 innate immune response Effects 0.000 description 2
- 201000000050 myeloid neoplasm Diseases 0.000 description 2
- 238000003127 radioimmunoassay Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- 238000007671 third-generation sequencing Methods 0.000 description 2
- 102000006306 Antigen Receptors Human genes 0.000 description 1
- 108010083359 Antigen Receptors Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 101150097493 D gene Proteins 0.000 description 1
- 238000012270 DNA recombination Methods 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 101000917858 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-A Proteins 0.000 description 1
- 101000917839 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-B Proteins 0.000 description 1
- 208000029462 Immunodeficiency disease Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 102100029185 Low affinity immunoglobulin gamma Fc region receptor III-B Human genes 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004721 adaptive immunity Effects 0.000 description 1
- 230000001363 autoimmune Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000011712 cell development Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 210000002443 helper t lymphocyte Anatomy 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 230000036737 immune function Effects 0.000 description 1
- 208000026278 immune system disease Diseases 0.000 description 1
- 238000003018 immunoassay Methods 0.000 description 1
- 230000000951 immunodiffusion Effects 0.000 description 1
- 229940099472 immunoglobulin a Drugs 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 210000005007 innate immune system Anatomy 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 206010024378 leukocytosis Diseases 0.000 description 1
- 201000002364 leukopenia Diseases 0.000 description 1
- 231100001022 leukopenia Toxicity 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 210000003563 lymphoid tissue Anatomy 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004877 mucosa Anatomy 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 210000000822 natural killer cell Anatomy 0.000 description 1
- 210000000440 neutrophil Anatomy 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000006461 physiological response Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
- 108700012359 toxins Proteins 0.000 description 1
- 229960005486 vaccine Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Definitions
- the present invention relates to the field of biomedicine, and in particular, the present invention relates to a method, a device, an electronic device and a machine-readable storage medium for determining an individual's immunity index.
- Immunity is the body's own defense mechanism. It is the body's ability to identify and eliminate any foreign intrusion (viruses, bacteria, etc.)
- Ability is the physiological response of the human body to identify and exclude “others”.
- the immune system of the human body is maintained by the immune system, and the immune system is the best doctor in the world that the human body is born with.
- the immune system consists of two cooperating subsystems that provide innate and adaptive immunity.
- Innate immunity refers to a non-specific defense mechanism that protects the body from toxins or foreign substances (called antigens).
- antigens toxins or foreign substances
- the rapid response of the innate immune system also activates the adaptive system, which is the body's antigen-specific response to itself.
- the adaptive immune system consists of two main types of lymphocytes, called B cells and T cells. These lymphocytes have unique antigen receptors, each of which recognizes only one antigen, and this range of specificity is encoded by a fixed number of gene segments. Through a mechanism called V(D)J recombination, these genetic regions undergo irreversible somatic DNA recombination during cell development, resulting in the formation of mature lymphocytes with a single specificity.
- the immune repertoire refers to all the unique genetic rearrangements of T cell receptors (TCRs) and B cell receptors (BCRs) within the adaptive immune system.
- immune repertoire NGS detection provides technical support for evaluating the body's adaptive immune system in healthy or diseased states.
- Immunoglobulin and complement are the main effector components of humoral immunity. In the case of certain diseases (such as infections, autoimmune diseases, immunodeficiency diseases, etc.), the concentrations of these indicators will increase or decrease relative to the reference value, so that they can be evaluated. The clinical value of immunity and diagnosis of diseases.
- the five immunoassays target humoral immunity and cannot assess cellular immunity well.
- humoral immunity only the overall levels of IgG, IgA, IgM, and complement C3 and C4 can be detected, and in-depth analysis at the molecular sequence level cannot be performed.
- Lymphocyte subset analysis using flow cytometry and PCR technology to analyze the number and relative proportion of each subset of leukocytes in peripheral blood.
- flow cytometry or PCR technology By flow cytometry or PCR technology, the relative and absolute counts of immune cells in peripheral blood and their changes are monitored, and the immune status in disease states (such as tumors, infectious diseases, immune diseases, etc.) Assisting in diagnosis, tracking disease progression and deciding on medication timing.
- the most commonly detected subsets include T cells (CD3), B cells (CD19), NK cells (CD16+56), helper T cells (CD3+CD4+), and suppressor T cells (CD3+CD8+).
- lymphocyte subsets there are many types of lymphocyte subsets, and if a comprehensive analysis is carried out, the amount of peripheral blood that needs to be collected, the cost and the time are all unacceptable. It is difficult to obtain a comprehensive immune system status by analyzing only a few lymphocyte subsets. In addition, lymphocyte subsets have different normal reference ranges at different ages, and the results are affected by many factors, making clinical interpretation relatively difficult.
- an object of the present invention aims to solve one of the technical problems in the related art at least to a certain extent.
- an object of the present invention is to carry out high-sensitivity detection of the adaptive immune system of an individual at the molecular sequence level by means of the immune repertoire sequencing method. (Immune Age (IA)) to assess the health status of the individual body to achieve early health risk prediction.
- IA Immunune Age
- the present invention proposes a method for determining an individual immunity index.
- the method includes: (1) acquiring nucleic acid sequencing data of the individual to be tested; (2) by The sequencing result is compared with the reference sequence, and the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined; (3) based on the V/J sequence and the CDR sequence contained in the nucleic acid sample, the statistical characteristics are determined.
- the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; (4) based on the statistical characteristics, determine the an immune age value of an individual; and (5) determining an immunity index of the individual based on the immune age value.
- the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment.
- the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis.
- immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of illness, prediction of recurrence, and comprehensive evaluation of immunity.
- the present invention provides a device for determining an individual immunity index.
- the device includes: a sequencing data acquisition unit for acquiring nucleic acid sequencing data of an individual to be tested; sequencing A result analysis unit for determining the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence; a statistical unit for determining the V/J sequence contained in the nucleic acid sample based on the Sequence and CDR sequence, determine statistical characteristics, and the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; immune age a determining unit for determining an immune age value of the individual based on the statistical feature; and an immunity index determining unit for determining an immune index for the individual based on the immune age value.
- the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory storing machine-executable instructions executable by the processor, the The processor executes the machine-executable instructions to implement the aforementioned method of determining an immunity index of an individual.
- the present invention provides a machine-readable storage medium.
- the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and, when executed, the machine-executable instructions cause a processor to implement the method of determining an individual's immunity index as described in any preceding item.
- FIG. 1 is a schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention
- FIG. 2 is a partial schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention
- FIG. 3 is a schematic structural diagram of a device for determining an individual immunity index according to an embodiment of the present invention.
- Fig. 4 is a partial structural schematic diagram of a device for determining an individual immunity index according to an embodiment of the present invention.
- Fig. 5 is the prediction result of the immunity index of different age groups in the embodiment 2 of the present invention.
- FIG. 6 is a distribution diagram of the relationship between the immunity index and individual age in Example 2 of the present invention.
- Embodiments of the present invention are described in detail below.
- the embodiments described below are exemplary, only for explaining the present invention, and should not be construed as limiting the present invention. If no specific technique or condition is indicated in the examples, the technique or condition described in the literature in the field or the product specification is used.
- the reagents or instruments used without the manufacturer's indication are conventional products that can be obtained from the market.
- the present invention proposes a method for determining the immunity index of an individual. 1, according to an embodiment of the present invention, the method includes:
- nucleic acid sequencing data from the individual to be tested is first acquired for subsequent analysis.
- these nucleic acid sequencing data may contain the genetic information of immune cells, for example, according to embodiments of the present invention, blood samples containing immune cells or tissue samples containing immune cells (described herein) may be used.
- Tissue samples should be understood in a broad sense and can include at least a part of organs), such as non-encapsulated diffuse lymphoid tissue and lymph nodes contained in the submucosal mucosa of the intestinal tract, respiratory tract, urogenital tract, etc.
- nucleic acid sequencing data can be obtained by high-throughput sequencing.
- second- or third-generation sequencing platforms including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100.
- the sequencing process includes:
- RNA For blood or tissue samples, extract DNA or RNA. For each sample, take the starting amount of DNA or RNA, add primers (TCR or BCR) for a certain chain, and perform multiple PCR amplification. PCR is carried out for a total of two rounds. One round was PCR reaction with VJ-specific primers (with partial sequencing adapters), and the second round was sequencing adapters for ordinary PCR library construction. Afterwards, multiple samples are pooled together for sequencing, resulting in data for each sample. According to the embodiment of the present invention, a tag sequence may also be introduced in the second round of PCR, thereby realizing the distinction of sample batches.
- acquiring nucleic acid sequencing data may further include:
- a nucleic acid sample of the individual to be tested is obtained, and the nucleic acid sample includes at least one of DNA molecules and RNA molecules.
- RNA molecules include at least one of DNA molecules and RNA molecules.
- Those skilled in the art can use commercially available kits and follow the manufacturer's instructions for extraction of DNA molecules or RNA molecules. It can be understood by those skilled in the art that, after obtaining RNA molecules, reverse transcription can be easily used to obtain cDNA molecules.
- VJ-specific primers can be used to perform a first amplification process, so as to obtain a first amplification product.
- V gene and the J gene, the immune cell-specific sequences contained in the nucleic acid sample obtained in step S110 may be amplified by VJ-specific primers.
- VJ-specific primers refer to specific primers that can amplify V and J genes. For V and J genes, it is worth noting that for most loci, they are classified as families according to their degree of homology. Forms come together. These VJ-specific primers can be used to analyze the combinatorial diversity of V-J rearrangements at at least one locus selected from loci TRA, TRB, TRG, TRD, IgH, IgK, IgL, and the like.
- the VJ-specific primer used in the present invention has the following nucleotide sequence:
- the VJ-specific primer contains a portion of the sequence of the sequencing adapter. Therefore, it is convenient to introduce sequencing adapters into the amplification products through the second amplification process.
- a second amplification process is performed on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter.
- the second amplification process can be performed by using the common sequence in the first amplification product, and the primers used can be set to be suitable for introduction into sequencing adapters.
- the obtained second amplification product constitutes a sequencing library that can be used for sequencing.
- the second amplification product is sequenced to obtain sequencing results.
- the sequencing library (second amplification product) can be sequenced using a sequencing platform.
- nucleic acid sequencing data can be obtained by high-throughput sequencing.
- second- or third-generation sequencing platforms including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100. Paired-end sequencing is preferably used. It can improve the efficiency of subsequent analysis.
- the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined by aligning the sequencing result with the reference sequence.
- software such as SOAPnuke (v1.5.3) can be used to filter the linker contaminating sequences, low-quality bases and sequences on the raw sequencing data.
- the FASTQ file was converted into a FASTA file with a self-developed program for sequence splicing; finally, if the sequencing mode was paired-end sequencing, COPE (v1.5.3) and the self-developed program were used to assemble the sequences.
- blastall (v2.2.25) can be used to align the preprocessed FASTA sequence to the V(D)J reference gene sequence, and then the self-developed program is used to perform re-alignment and select the best alignment result , that is: use different methods to count the scores of the non-CDR3 and CDR3 regions, select the best hit with the highest score, and determine the attribution of the sequenced sequence by aligning with the CDR, V, and J reference sequences, so as to determine the CDR sequence and VJ sequence. of.
- the structure of immune molecules is analyzed. This part mainly includes two functions: error correction and region determination. First, the errors introduced in PCR and sequencing were corrected by self-developed programs, and then the CDR regions were determined using the rules of V/J gene reference sequences and conserved amino acids and the established computational methods.
- the CDR sequence can be determined by a common method.
- the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence. Because CDR3 has the greatest variation, it directly determines the antigen-binding specificity of TCR.
- the CDR3 of TCR is encoded by three genes V, D, and J. During the maturation of lymphocytes, various recombinant sequence fragments are formed through the rearrangement of V, D, and J genes, plus DNA base SNP, Indel Mutations create a diversity of T cells.
- V/J refers to at least a portion of the result of a V(D)J rearrangement for a particular cell, which may be a V gene sequence, a J gene sequence, or a V gene sequence.
- the combination of the gene sequence and the J gene sequence may also sandwich the D gene sequence between the V gene sequence and the J gene sequence.
- Statistical features are determined based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune Cell Homogeneity Index.
- At least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
- the type of immune cells is determined based on the CDR3 sequence.
- the immune cell homogeneity index is the Gini index.
- the immune repertoire feature data is counted, and the statistical features mainly include the following:
- V/J gene usage diversity i.e. Shannon_index(V-J);
- Immune cell homogeneity i.e. Clone_Gini.
- Shannon_index represents the Shannon index
- the calculation formula is as follows:
- CDR3 is taken as an example
- S represents the total number of unique CDR3s
- p(i) represents the frequency of CDR3s.
- Uniq_number represents the unique sequence number.
- Clone_Gini represents the Gini index, and the calculation formula is as follows:
- x refers to the frequency of each immune cell type
- n refers to the number of immune cell types.
- the immune age value of the individual is determined.
- the immune age value is determined based on at least one statistical feature using a maximum a posteriori probability estimate.
- step S400 it further includes: (4-1) using a predetermined immune age prediction coefficient distribution (mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution), based on each statistical feature, determine the immune age prediction coefficient corresponding to each statistical feature; and (4-2) According to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, ⁇ i represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, ⁇ 0 represents the bias term in the prediction model.
- a predetermined immune age prediction coefficient distribution mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution
- the MAP maximum a posteriori probability estimate, maximum a posteriori probability estimation
- ⁇ A means "not A"
- Biochemical indicators mainly include conventional indicators, such as macrobiochemical, blood routine and so on.
- the training data is mainly based on the characteristics of the selected features to determine the prior distribution of the parameters. If the selected features are continuous, in the case of a large amount of data, it is generally considered to be a normal distribution. If it is discrete, it is directly weighted according to the formula below. Just multiply.
- the selected members of the training set mainly include some indicators (V/J gene usage diversity, immune diversity, immune cell type, immune cell homogeneity) obtained from immune repertoire analysis and some biochemical indicators (large biochemical, blood routine, etc.).
- the immunity index of the individual is determined based on the immune age value.
- the immunity index is determined by the following formula:
- IA represents the immune age value determined in step S400
- IAmax represents the upper limit of IA in the predetermined group
- IAmin represents the lower limit of IA in the predetermined group.
- the technical solution After determining the immune index of the individual, the technical solution can realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and can realize non-invasive early diagnosis, curative effect evaluation, disease tracking, recurrence prediction and comprehensive immunity. Evaluate.
- the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment.
- the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis.
- immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of disease condition, prediction of recurrence, and comprehensive evaluation of immunity.
- the present invention provides a device for determining an individual immunity index.
- the device includes:
- the sequencing data acquisition unit 100 is used to acquire nucleic acid sequencing data of the individual to be tested; the sequencing result analysis unit 200 is used to determine the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with the reference sequence Statistical unit 300 for determining statistical features based on the V/J sequences and CDR3 sequences contained in the nucleic acid sample, the statistical features including at least one selected from the following: V/J gene usage diversity index, immune cell diversity index , the number of immune cell types, and the immune cell homogeneity index; the immune age determination unit 400 is used to determine the immune age value of the individual based on the statistical characteristics; the immune index determination unit 500 is used to determine the immune age value of the individual based on the immune age value. index.
- the sequencing data acquisition unit further includes: a nucleic acid sample acquisition module 110 , a first amplification module 120 and a second amplification module 130 , and a sequencing module 140 .
- the nucleic acid sample acquisition module 110 is used to acquire nucleic acid samples of the individual to be tested, and the nucleic acid samples include at least one of DNA molecules and RNA molecules;
- the first amplification module 120 is used to use VJ specific The first amplification process is performed on the primers to obtain the first amplification product;
- the second amplification module 130 is used for performing the second amplification process on the first amplification product to obtain the second amplification product, wherein the first amplification product is
- the second amplification product carries a sequencing adapter;
- the sequencing module 140 is used to sequence the second amplification product so as to obtain a sequencing result;
- the nucleic acid sample is obtained from an individual's blood or tissue sample.
- the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
- the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.
- At least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
- the type of immune cells is determined based on the CDR3 sequence.
- the immune cell homogeneity index is the Gini index.
- the immune age determination unit is adapted to determine the immune age value based on the at least one statistical feature using a maximum a posteriori probability estimate.
- the immune age determination unit is configured to: using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each statistical feature; and according to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, ⁇ i represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, ⁇ 0 represents the bias term in the pre-prediction model.
- the immunity index is determined by the following formula:
- IA represents the immune age value determined in the immune age determination unit
- IAmax represents the upper limit of IA in the predetermined population
- IAmin represents the lower limit of IA in the predetermined population.
- the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions. Instructions to implement the preceding method of determining an individual's immunity index.
- the present invention provides a machine-readable storage medium.
- the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and when executed, the machine-executable instructions cause a processor to implement any of the preceding methods of determining an immunity index of an individual.
- primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
- sequencing data is analyzed as follows:
- SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , “the base quality value of the read is less than or equal to 20", “the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
- V/J gene usage diversity i.e. Shannon_index(V-J);
- Immune cell homogeneity i.e. Clone_Gini.
- Shannon_index represents the Shannon index
- the calculation formula is as follows:
- CDR3 is taken as an example
- S represents the total number of unique CDR3s
- p(i) represents the frequency of CDR3s.
- Uniq_number represents the unique sequence number.
- Clone_Gini represents the Gini index, and the calculation formula is as follows:
- x refers to the frequency of each immune cell type
- n refers to the number of immune cell types.
- the MAP maximum a posteriori probability estimate
- the specific model is as follows:
- IA represents the immune age of the predicted sample
- IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
- primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
- sequencing data is analyzed as follows:
- SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , “the base quality value of the read is less than or equal to 20", “the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
- Shannon_index represents the Shannon index
- the calculation formula is as follows:
- CDR3 is taken as an example
- S represents the total number of unique CDR3s
- p(i) represents the frequency of CDR3s.
- Uniq_number represents the unique sequence number.
- the MAP maximum a posteriori probability estimate, maximum a posteriori probability estimation
- IA represents the immune age of the predicted sample
- IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
- the immunity index showed a downward trend with increasing age. Although the sample size of the age group greater than 50 is small, the decline trend of the immunity index shown in Figure 6 is not obvious, but the decline trend of the immunity index shown in Figure 5 is more obvious. Therefore, the results of this example show that the immunity index can be used as an index for evaluating the health index.
Abstract
A method and device for determining the immunity index of an individual, an electronic device, and a machine-readable storage medium. The method comprises: acquiring nucleic acid sequencing data of an individual to be tested (S100); determining a V/J sequence and a CDR sequence contained in a nucleic acid sample by comparing a sequencing result with a reference sequence (S200); determining statistical features on the basis of the V/J sequence and the CDR sequence contained in the nucleic acid sample (S300), the statistical features comprising at least one selected from among the following: the usage diversity index of a V/J gene, the diversity index of immune cells, the number of immune cell types, and the homogeneity index of immune cells; determining an immune age value of the individual on the basis of the statistical features (S400); and determining the immunity index of the individual on the basis of the immune age value (S500). The method for determining the immunity index of an individual can be implemented by using a small number of samples by sequencing.
Description
本发明涉及生物医学领域,具体的,本发明涉及确定个体免疫力指数的方法、设备、电子设备和机器可读存储介质。The present invention relates to the field of biomedicine, and in particular, the present invention relates to a method, a device, an electronic device and a machine-readable storage medium for determining an individual's immunity index.
免疫力是人体自身的防御机制,是人体识别和消灭外来侵入的任何异物(病毒、细菌等),处理衰老、损伤、死亡、变性的自身细胞,以及识别和处理体内突变细胞和病毒感染细胞的能力,是人体识别和排除“异己”的生理反应。人体的免疫力是依靠免疫系统来维护的,免疫系统是人体与生俱来拥有的世界上最好的医生。Immunity is the body's own defense mechanism. It is the body's ability to identify and eliminate any foreign intrusion (viruses, bacteria, etc.) Ability is the physiological response of the human body to identify and exclude "others". The immune system of the human body is maintained by the immune system, and the immune system is the best doctor in the world that the human body is born with.
免疫系统由两个相互配合的子系统组成,可提供先天免疫和适应性免疫。先天免疫是指保护人体免受毒素或异物(称为抗原)的非特异性防御机制。先天免疫系统的快速反应也会激活适应性系统,适应性系统是机体针对自身的抗原特异性反应。The immune system consists of two cooperating subsystems that provide innate and adaptive immunity. Innate immunity refers to a non-specific defense mechanism that protects the body from toxins or foreign substances (called antigens). The rapid response of the innate immune system also activates the adaptive system, which is the body's antigen-specific response to itself.
适应性免疫系统由两种主要类型的淋巴细胞组成,称为B细胞和T细胞。这些淋巴细胞具有独特的抗原受体,每个独特的抗原受体仅识别一个抗原,这种特异性范围是由固定数目的基因片段编码的。通过一种称为V(D)J重组的机制,这些遗传区域在细胞发育过程中发生不可逆的体细胞DNA重组,从而形成具有单一特异性的成熟淋巴细胞。免疫库是指适应性免疫系统内所有独特的T细胞受体(TCR)和B细胞受体(BCR)遗传重排。The adaptive immune system consists of two main types of lymphocytes, called B cells and T cells. These lymphocytes have unique antigen receptors, each of which recognizes only one antigen, and this range of specificity is encoded by a fixed number of gene segments. Through a mechanism called V(D)J recombination, these genetic regions undergo irreversible somatic DNA recombination during cell development, resulting in the formation of mature lymphocytes with a single specificity. The immune repertoire refers to all the unique genetic rearrangements of T cell receptors (TCRs) and B cell receptors (BCRs) within the adaptive immune system.
随着精准医学和免疫疗法的发展,免疫组库的应用场景越来越广泛。应用场景包括:生物标志物的挖掘,自身免疫性疾病和感染性疾病的检测,免疫排斥和耐受性评估,肿瘤免疫评估,免疫重建以及用药和疫苗评估。因此,免疫组库NGS检测为评估健康或疾病状态下的机体适应性免疫系统提供了技术支持。With the development of precision medicine and immunotherapy, the application scenarios of immune repertoires are becoming more and more extensive. Application scenarios include: biomarker mining, detection of autoimmune and infectious diseases, immune rejection and tolerance assessment, tumor immune assessment, immune reconstitution, and drug and vaccine assessment. Therefore, immune repertoire NGS detection provides technical support for evaluating the body's adaptive immune system in healthy or diseased states.
目前市场上用来分析免疫功能的主要方法有:The main methods currently on the market for analyzing immune function are:
1)免疫五项,检测血液中免疫球蛋白和补体的含量。即通过单向免疫扩散试验、酶联免疫吸附试验(ELISA)、放射免疫试验(RIA)、免疫固定电泳、免疫比浊法等方法,检测血液中免疫球蛋白G(IgG)、免疫球蛋白A(IgA)、免疫球蛋白M(IgM)、补体C3和C4的含量。免疫球蛋白和补体是体液免疫的主要效应成分,在某些疾病(如感染、自身免疫疾病、免疫缺陷病等)情况下,这些指标的浓度相对参考值将出现升高或降低,从而具有评估免疫力、诊断疾病的临床价值。然而,免疫五项检测针对体液免疫,不能很好评估细胞免疫。在评估体液免疫时,只能检测IgG、IgA、IgM和补体C3、C4的总体水平,不能在分子序列层次上进行深度分析。1) Five items of immunity, to detect the content of immunoglobulin and complement in the blood. That is, by one-way immunodiffusion test, enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), immunofixation electrophoresis, immunoturbidimetry and other methods to detect immunoglobulin G (IgG), immunoglobulin A in blood (IgA), immunoglobulin M (IgM), complement C3 and C4 content. Immunoglobulin and complement are the main effector components of humoral immunity. In the case of certain diseases (such as infections, autoimmune diseases, immunodeficiency diseases, etc.), the concentrations of these indicators will increase or decrease relative to the reference value, so that they can be evaluated. The clinical value of immunity and diagnosis of diseases. However, the five immunoassays target humoral immunity and cannot assess cellular immunity well. When evaluating humoral immunity, only the overall levels of IgG, IgA, IgM, and complement C3 and C4 can be detected, and in-depth analysis at the molecular sequence level cannot be performed.
2)血常规,利用细胞计数的方法分析外周血中白细胞的数量,白细胞数目的增高表明 体内存在炎症反应。即通过显微镜观测对外周血中的白细胞进行分类和计数。白细胞总数高于参考值上限称白细胞增多,低于参考值下限为白细胞减少。其增多和减少主要受中性粒细胞数量的影响,淋巴细胞等数量的改变也会引起白细胞总数的变化。从生理性变化到恶性肿瘤都有可能引起白细胞总数异常,医生可结合血常规检测结果进行临床诊断。然而,血常规检测只能大致判断细胞免疫整体水平的状况,无法分辨针对具体疾病的免疫,也无法在基因水平判断免疫细胞的分类和多样性。2) Blood routine, the number of leukocytes in the peripheral blood is analyzed by the method of cell counting, and the increase of the number of leukocytes indicates that there is an inflammatory reaction in the body. That is, the leukocytes in peripheral blood are classified and counted by microscope observation. The total number of leukocytes above the upper limit of the reference value is called leukocytosis, and the lower limit of the reference value is leukopenia. Its increase and decrease are mainly affected by the number of neutrophils, and changes in the number of lymphocytes can also cause changes in the total number of white blood cells. From physiological changes to malignant tumors, the total number of white blood cells may be abnormal, and doctors can make clinical diagnosis based on the results of routine blood tests. However, blood routine testing can only roughly judge the overall level of cellular immunity, and cannot distinguish immunity against specific diseases, nor can it judge the classification and diversity of immune cells at the gene level.
淋巴细胞亚群分析,利用流式细胞分析以及PCR技术分析外周血中白细胞各个亚群的数目和相对比例。通过流式细胞分析或PCR技术,对外周血中免疫细胞的相对计数、绝对计数及其变化进行监控,分析疾病状态下的免疫状况(如肿瘤、感染性疾病、免疫性疾病等),以此辅助诊断、追踪病情发展及决定用药时机。最常检测的亚群包括T细胞(CD3)、B细胞(CD19)、NK细胞(CD16+56)、辅助性T细胞(CD3+CD4+)和抑制性T细胞(CD3+CD8+)等。然而,淋巴细胞亚群种类繁多,如进行全面分析,则需要采集的外周血量、费用及时间均难以接受。只进行少数几种淋巴细胞亚群分析,则难以获取全面的免疫系统状况。并且淋巴细胞亚群在不同年龄阶段有不同的正常参考范围,并且其结果受多种因素的影响,造成临床判读相对困难。Lymphocyte subset analysis, using flow cytometry and PCR technology to analyze the number and relative proportion of each subset of leukocytes in peripheral blood. By flow cytometry or PCR technology, the relative and absolute counts of immune cells in peripheral blood and their changes are monitored, and the immune status in disease states (such as tumors, infectious diseases, immune diseases, etc.) Assisting in diagnosis, tracking disease progression and deciding on medication timing. The most commonly detected subsets include T cells (CD3), B cells (CD19), NK cells (CD16+56), helper T cells (CD3+CD4+), and suppressor T cells (CD3+CD8+). However, there are many types of lymphocyte subsets, and if a comprehensive analysis is carried out, the amount of peripheral blood that needs to be collected, the cost and the time are all unacceptable. It is difficult to obtain a comprehensive immune system status by analyzing only a few lymphocyte subsets. In addition, lymphocyte subsets have different normal reference ranges at different ages, and the results are affected by many factors, making clinical interpretation relatively difficult.
发明内容SUMMARY OF THE INVENTION
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。为此,本发明的一个目的旨在通过免疫组库测序方法从分子序列层面上对个体适应性免疫系统进行高灵敏度检测,通过免疫球蛋白基因和TCR基因的多种指标(如多样性、均一性等)的综合分析对个体免疫力评估,通过免疫年龄(Immune Age(IA))评估个体机体的健康状况,实现早期健康风险预测。The present invention aims to solve one of the technical problems in the related art at least to a certain extent. To this end, an object of the present invention is to carry out high-sensitivity detection of the adaptive immune system of an individual at the molecular sequence level by means of the immune repertoire sequencing method. (Immune Age (IA)) to assess the health status of the individual body to achieve early health risk prediction.
在本发明的第一方面,本发明提出了一种确定个体免疫力指数的方法,根据本发明的实施例,该方法包括:(1)获取待测个体的核酸测序数据;(2)通过将所述测序结果与参考序列比对,确定所述核酸样本中所包含V/J序列以及CDR序列;(3)基于所述核酸样本中所包含V/J序列以及CDR序列,确定统计特征,所述统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;(4)基于所述统计特征,确定所述个体的免疫年龄数值;和(5)基于所述免疫年龄数值,确定所述个体的免疫力指数。In the first aspect of the present invention, the present invention proposes a method for determining an individual immunity index. According to an embodiment of the present invention, the method includes: (1) acquiring nucleic acid sequencing data of the individual to be tested; (2) by The sequencing result is compared with the reference sequence, and the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined; (3) based on the V/J sequence and the CDR sequence contained in the nucleic acid sample, the statistical characteristics are determined. The statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; (4) based on the statistical characteristics, determine the an immune age value of an individual; and (5) determining an immunity index of the individual based on the immune age value.
根据本发明的实施例,通过测序可以采用少量的样本即可实施本发明的方法,以从分子层面上实现对个体适应性免疫系统进行高灵敏度检测,而且可以实现无创的早期诊断、疗效评估、病情追踪、复发预测以及免疫力综合评估。例如根据本发明的实施例,可以采 用PCR技术扩增外周血中淋巴细胞含有的基因,所需血液样本少,样本后续处理简便,不需要进行不准确的人孔血细胞观察技术,也不需要操作复杂的免疫标记和流式分析。对于骨髓瘤检验,因为只需要采取外周血,不需要实施骨髓穿刺,可以减少对病人身体的损伤,具有积极的意义。总之,根据本发明的实施例,免疫组库测序进行免疫评估不仅可以提升检测的灵敏度,而且可以实现早期诊断,评估疗效,追踪病情,预测复发以及免疫力的综合评估等功能。According to the embodiments of the present invention, the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment. For example, according to the embodiments of the present invention, the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis. For myeloma test, because only peripheral blood needs to be taken, no bone marrow puncture is required, which can reduce the damage to the patient's body, which has positive significance. In conclusion, according to the embodiments of the present invention, immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of illness, prediction of recurrence, and comprehensive evaluation of immunity.
在本发明的第二方面,本发明提出了一种确定个体免疫力指数的设备,根据本发明的实施例,该设备包括:测序数据获取单元,用于获取待测个体的核酸测序数据;测序结果分析单元,用于通过将所述测序结果与参考序列比对,确定所述核酸样本中所包含V/J序列以及CDR序列;统计单元,用于基于所述核酸样本中所包含V/J序列以及CDR序列,确定统计特征,所述统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;免疫年龄确定单元,用于基于所述统计特征,确定所述个体的免疫年龄数值;和免疫力指数确定单元,用于基于所述免疫年龄数值,确定所述个体的免疫力指数。In a second aspect of the present invention, the present invention provides a device for determining an individual immunity index. According to an embodiment of the present invention, the device includes: a sequencing data acquisition unit for acquiring nucleic acid sequencing data of an individual to be tested; sequencing A result analysis unit for determining the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence; a statistical unit for determining the V/J sequence contained in the nucleic acid sample based on the Sequence and CDR sequence, determine statistical characteristics, and the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; immune age a determining unit for determining an immune age value of the individual based on the statistical feature; and an immunity index determining unit for determining an immune index for the individual based on the immune age value.
采用本发明的实施例的该设备,可以有效地实施前面所描述的确定个体免疫力的方法。由此,前面所描述的特征和优点同样适用于该设备,在此不再赘述。Using the apparatus of an embodiment of the present invention, the previously described method of determining immunity of an individual can be effectively implemented. Thus, the features and advantages described above are also applicable to the device and will not be repeated here.
在本发明的第三方面,本发明提出了一种电子设备,根据本发明的实施例,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现前面所述的确定个体免疫力指数的方法。In a third aspect of the present invention, the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory storing machine-executable instructions executable by the processor, the The processor executes the machine-executable instructions to implement the aforementioned method of determining an immunity index of an individual.
在本发明的第四方面,本发明提出了一种机器可读存储介质,根据本发明的实施例,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现前面任一项所述的确定个体免疫力指数的方法。In a fourth aspect of the present invention, the present invention provides a machine-readable storage medium. According to an embodiment of the present invention, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and, when executed, the machine-executable instructions cause a processor to implement the method of determining an individual's immunity index as described in any preceding item.
图1是根据本发明一个实施例的确定个体免疫力指数的方法的流程示意图;1 is a schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention;
图2是根据本发明一个实施例的确定个体免疫力指数的方法的部分流程示意图;FIG. 2 is a partial schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention;
图3是根据本发明一个实施例的确定个体免疫力指数的设备的结构示意图;3 is a schematic structural diagram of a device for determining an individual immunity index according to an embodiment of the present invention;
图4是根据本发明一个实施例的确定个体免疫力指数的设备的部分结构示意图;Fig. 4 is a partial structural schematic diagram of a device for determining an individual immunity index according to an embodiment of the present invention;
图5是本发明实施例2中的不同年龄段人群的免疫力指数的预测结果;Fig. 5 is the prediction result of the immunity index of different age groups in the embodiment 2 of the present invention;
图6是本发明实施例2中的免疫力指数与个体年龄关系的分布图。FIG. 6 is a distribution diagram of the relationship between the immunity index and individual age in Example 2 of the present invention.
下面详细描述本发明的实施例。下面描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。实施例中未注明具体技术或条件的,按照本领域内的文献所描述的技术或条件或者按照产品说明书进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。Embodiments of the present invention are described in detail below. The embodiments described below are exemplary, only for explaining the present invention, and should not be construed as limiting the present invention. If no specific technique or condition is indicated in the examples, the technique or condition described in the literature in the field or the product specification is used. The reagents or instruments used without the manufacturer's indication are conventional products that can be obtained from the market.
在本发明的第一方面,本发明提出了一种确定个体免疫力指数的方法。参考图1,根据本发明的实施例,该方法包括:In the first aspect of the present invention, the present invention proposes a method for determining the immunity index of an individual. 1, according to an embodiment of the present invention, the method includes:
S100获取核酸测序数据S100 obtains nucleic acid sequencing data
根据本发明的实施例,在该步骤中,首先获取来自待测个体的核酸测序数据,以便用于后续的分析。本领域技术人员能够理解的是,这些核酸测序数据可以含有免疫细胞的遗传信息,例如根据本发明的实施例,可以采用来自含有免疫细胞的血液样本或者含有免疫细胞的组织样本(这里所述的组织样本应做广义理解,可以包括器官的至少一部分),例如肠道、呼吸道、泌尿生殖道等黏膜下所含有的非包膜化的弥散性淋巴组织和淋巴小结等。According to an embodiment of the present invention, in this step, nucleic acid sequencing data from the individual to be tested is first acquired for subsequent analysis. Those skilled in the art can understand that these nucleic acid sequencing data may contain the genetic information of immune cells, for example, according to embodiments of the present invention, blood samples containing immune cells or tissue samples containing immune cells (described herein) may be used. Tissue samples should be understood in a broad sense and can include at least a part of organs), such as non-encapsulated diffuse lymphoid tissue and lymph nodes contained in the submucosal mucosa of the intestinal tract, respiratory tract, urogenital tract, etc.
根据本发明的实施例,核酸测序数据可以通过高通量测序获得。例如二代或者三代测序平台,包括但不限于MGISEQ-T7、MGISEQ-2000、MGISEQ-200、BGISEQ-500、BGISEQ-50、MGISP-960、MGISP-100等高通量测序平台。According to embodiments of the present invention, nucleic acid sequencing data can be obtained by high-throughput sequencing. For example, second- or third-generation sequencing platforms, including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100.
本领域技术人员可以在获取核酸后,可以按照测序平台的操作手册进行测序,以便获得核酸测序数据。例如,简言之,根据本发明的一个实施例,测序过程包括:After obtaining the nucleic acid, those skilled in the art can perform sequencing according to the operation manual of the sequencing platform, so as to obtain nucleic acid sequencing data. For example, briefly, according to one embodiment of the present invention, the sequencing process includes:
对于血液或者组织样本,提取DNA或者RNA,对于每个样本,取DNA或RNA的起始量,加入某条链的引物(TCR或者BCR),进行多重PCR扩增,PCR总共进行两轮,第一轮是VJ特异性引物(带部分测序接头)PCR反应,第二轮是测序接头进行普通的PCR建库。之后,多个样本汇总在一起进行测序,从而得到每个样本的数据。根据本发明的实施例中,在第二轮PCR中还可以引入标签序列,从而实现对样本批次的区分。For blood or tissue samples, extract DNA or RNA. For each sample, take the starting amount of DNA or RNA, add primers (TCR or BCR) for a certain chain, and perform multiple PCR amplification. PCR is carried out for a total of two rounds. One round was PCR reaction with VJ-specific primers (with partial sequencing adapters), and the second round was sequencing adapters for ordinary PCR library construction. Afterwards, multiple samples are pooled together for sequencing, resulting in data for each sample. According to the embodiment of the present invention, a tag sequence may also be introduced in the second round of PCR, thereby realizing the distinction of sample batches.
参考图2,根据本发明的具体实施例,获取核酸测序数据可以进一步包括:2, according to a specific embodiment of the present invention, acquiring nucleic acid sequencing data may further include:
S110获取核酸样本S110 Obtain nucleic acid samples
在该步骤中,获取待测个体的核酸样本,核酸样本包括DNA分子和RNA分子的至少之一。本领域技术人员可以采用商购的试剂盒并按照制造商所提供的说明书进行DNA分子或RNA分子的提取。本领域技术人员能够理解的是,在获取RNA分子后,可以容易地采用逆转录处理,获得cDNA分子。In this step, a nucleic acid sample of the individual to be tested is obtained, and the nucleic acid sample includes at least one of DNA molecules and RNA molecules. Those skilled in the art can use commercially available kits and follow the manufacturer's instructions for extraction of DNA molecules or RNA molecules. It can be understood by those skilled in the art that, after obtaining RNA molecules, reverse transcription can be easily used to obtain cDNA molecules.
S120第一扩增处理S120 First Amplification Process
在获得核酸样本后,可以采用VJ特异性引物进行第一扩增处理,以便获得第一扩增产物。After the nucleic acid sample is obtained, VJ-specific primers can be used to perform a first amplification process, so as to obtain a first amplification product.
需要说明的是,可以通过VJ特异性引物对步骤S110中得到的核酸样本中所包含的免疫细胞特有序列即V基因和J基因进行扩增。It should be noted that the V gene and the J gene, the immune cell-specific sequences contained in the nucleic acid sample obtained in step S110, may be amplified by VJ-specific primers.
本文中,VJ特异性引物是指可以扩增V基因和J基因的特异性引物,对于V基因和J基因,值得注意的是,对大多数基因座而言,它们根据其同源程度以家族形式聚集在一起。这些VJ特异性引物可以用于分析至少一个基因座位上V-J重排的组合多样性,基因座位选自座位TRA、TRB、TRG、TRD、IgH、IgK、IgL等。Herein, VJ-specific primers refer to specific primers that can amplify V and J genes. For V and J genes, it is worth noting that for most loci, they are classified as families according to their degree of homology. Forms come together. These VJ-specific primers can be used to analyze the combinatorial diversity of V-J rearrangements at at least one locus selected from loci TRA, TRB, TRG, TRD, IgH, IgK, IgL, and the like.
根据本发明的实施例,本发明所采用VJ特异性引物具有下列核苷酸序列:According to an embodiment of the present invention, the VJ-specific primer used in the present invention has the following nucleotide sequence:
另外,根据本发明的实施例,VJ特异性引物含有测序接头的一部分序列。由此,方便后续通过第二扩增处理,在扩增产物中引入测序接头。In addition, according to an embodiment of the present invention, the VJ-specific primer contains a portion of the sequence of the sequencing adapter. Therefore, it is convenient to introduce sequencing adapters into the amplification products through the second amplification process.
S130第二扩增处理S130 Second Amplification Treatment
对第一扩增产物进行第二扩增处理,以便获得第二扩增产物,其中,第二扩增产物携带测序接头。A second amplification process is performed on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter.
通过采用第一扩增产物中的共同序列,可以进行第二扩增处理,并且采用的引物可以设置为适于引入测序接头。由此,所得到的第二扩增产物构成了可以用于测序的测序文库。The second amplification process can be performed by using the common sequence in the first amplification product, and the primers used can be set to be suitable for introduction into sequencing adapters. Thus, the obtained second amplification product constitutes a sequencing library that can be used for sequencing.
当然本领域技术人员能够理解的是,为了提高测序效率或者方便分析,还可以对第二扩增产物进行其他常规的处理,例如杂交探针筛选等处理。在此不再赘述。Of course, those skilled in the art can understand that, in order to improve sequencing efficiency or facilitate analysis, other conventional processing, such as hybridization probe screening, may also be performed on the second amplification product. It is not repeated here.
S140测序S140 sequencing
对第二扩增产物进行测序,以便获得测序结果。The second amplification product is sequenced to obtain sequencing results.
根据本发明的实施例,在构建测序文库之后,可以对测序文库(第二扩增产物)利用测序平台进行测序。根据本发明的实施例,核酸测序数据可以通过高通量测序获得。例如二代或者三代测序平台,包括但不限于MGISEQ-T7、MGISEQ-2000、MGISEQ-200、BGISEQ-500、BGISEQ-50、MGISP-960、MGISP-100等高通量测序平台。优选采用双末端测序。可以提高后续分析效率。According to an embodiment of the present invention, after the sequencing library is constructed, the sequencing library (second amplification product) can be sequenced using a sequencing platform. According to embodiments of the present invention, nucleic acid sequencing data can be obtained by high-throughput sequencing. For example, second- or third-generation sequencing platforms, including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100. Paired-end sequencing is preferably used. It can improve the efficiency of subsequent analysis.
S200序列比对确定V/J序列和CDR序列S200 sequence alignment to determine V/J sequences and CDR sequences
在获得测序数据后,根据本发明的实施例,通过将测序结果与参考序列比对,确定核酸样本中所包含V/J序列以及CDR序列。After the sequencing data is obtained, according to an embodiment of the present invention, the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined by aligning the sequencing result with the reference sequence.
根据本发明的实施例,在进行比对之前,可以采用例如SOAPnuke(v1.5.3)等软件对原始测序数据进行接头污染序列、低质量碱基和序列的过滤。According to an embodiment of the present invention, before performing the alignment, software such as SOAPnuke (v1.5.3) can be used to filter the linker contaminating sequences, low-quality bases and sequences on the raw sequencing data.
用自主开发程序把FASTQ文件转换为FASTA文件,以便进行序列拼接;最后,如果测序模式是双末端测序,则采用COPE(v1.5.3)和自主开发的程序对序列进行拼接。接下来, 可以采用blastall(v2.2.25)对预处理之后的FASTA序列比对到V(D)J参考基因序列上,接下来采用自主开发的程序进行重比对并选择最佳的比对结果,即:对non-CDR3、CDR3区域用不同方法统计分数,选取得分最高的best hit,通过与CDR、V、J参考序列进行比对来确定测序序列的归属,以便确定CDR序列和VJ序列的。The FASTQ file was converted into a FASTA file with a self-developed program for sequence splicing; finally, if the sequencing mode was paired-end sequencing, COPE (v1.5.3) and the self-developed program were used to assemble the sequences. Next, blastall (v2.2.25) can be used to align the preprocessed FASTA sequence to the V(D)J reference gene sequence, and then the self-developed program is used to perform re-alignment and select the best alignment result , that is: use different methods to count the scores of the non-CDR3 and CDR3 regions, select the best hit with the highest score, and determine the attribution of the sequenced sequence by aligning with the CDR, V, and J reference sequences, so as to determine the CDR sequence and VJ sequence. of.
在得到V基因和J基因的序列后,对免疫分子的结构进行分析此部分主要包含两个功能:错误矫正和区域确定。首先,采用自主开发程序对PCR和测序环节的引入的错误进行矫正,其次利用V/J基因参考序列与保守氨基酸的规律与建立的计算方法确定CDR区域。After obtaining the sequences of V and J genes, the structure of immune molecules is analyzed. This part mainly includes two functions: error correction and region determination. First, the errors introduced in PCR and sequencing were corrected by self-developed programs, and then the CDR regions were determined using the rules of V/J gene reference sequences and conserved amino acids and the established computational methods.
根据本发明的实施例,可以通过常用的方法确定CDR序列。根据本发明的实施例,CDR序列为CDR1、CDR2和CDR3序列的至少之一,优选CDR3序列。因为CDR3变异最大,直接决定了TCR的抗原结合特异性。TCR的CDR3由V、D、J三个基因编码,在淋巴细胞的成熟过程中,通过V、D、J基因的重排形成了各种重组序列片段,再加上DNA碱基的SNP、Indel突变形成了T细胞的多样性。According to the embodiments of the present invention, the CDR sequence can be determined by a common method. According to an embodiment of the present invention, the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence. Because CDR3 has the greatest variation, it directly determines the antigen-binding specificity of TCR. The CDR3 of TCR is encoded by three genes V, D, and J. During the maturation of lymphocytes, various recombinant sequence fragments are formed through the rearrangement of V, D, and J genes, plus DNA base SNP, Indel Mutations create a diversity of T cells.
在本文中所使用的术语“V/J”是指针对特定细胞,其所具有的V(D)J重排的结果的至少一部分,其可以是V基因序列,J基因序列,也可以是V基因序列与J基因序列的组合,还有可能在V基因序列和J基因序列中夹着D基因序列。The term "V/J" as used herein refers to at least a portion of the result of a V(D)J rearrangement for a particular cell, which may be a V gene sequence, a J gene sequence, or a V gene sequence. The combination of the gene sequence and the J gene sequence may also sandwich the D gene sequence between the V gene sequence and the J gene sequence.
S300确定统计特征S300 Determine statistical characteristics
基于核酸样本中所包含V/J序列以及CDR序列,确定统计特征,统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数。Statistical features are determined based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune Cell Homogeneity Index.
根据本发明的实施例,V/J基因使用多样性指数和免疫细胞多样性指数的至少之一为香农指数。根据本发明的实施例,免疫细胞的种类是基于CDR3序列确定的。According to an embodiment of the present invention, at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index. According to an embodiment of the present invention, the type of immune cells is determined based on the CDR3 sequence.
根据本发明的实施例,免疫细胞均一性指数为基尼指数。According to an embodiment of the present invention, the immune cell homogeneity index is the Gini index.
根据本发明的实施例,对免疫组库特征数据进行统计,统计特征主要包括以下几个:According to an embodiment of the present invention, the immune repertoire feature data is counted, and the statistical features mainly include the following:
V/J基因使用多样性,即Shannon_index(V-J);V/J gene usage diversity, i.e. Shannon_index(V-J);
免疫多样性,即Shannon_index(CDR3_aa);Immune diversity, i.e. Shannon_index(CDR3_aa);
免疫细胞种类,即Uniq_number(CDR3_aa);Immune cell type, i.e. Uniq_number (CDR3_aa);
免疫细胞均一性,即Clone_Gini。Immune cell homogeneity, i.e. Clone_Gini.
以上指标中,Shannon_index表示Shannon指数,计算公式如下:Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:
其中,如果以CDR3为例,S表示唯一CDR3的总数,p(i)表示CDR3的频率。Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.
Uniq_number表示唯一序列数。Uniq_number represents the unique sequence number.
Clone_Gini表示Gini指数,计算公式如下:Clone_Gini represents the Gini index, and the calculation formula is as follows:
其中,x指每一种免疫细胞类型出现的频率,n指免疫细胞种类数。Among them, x refers to the frequency of each immune cell type, and n refers to the number of immune cell types.
S400确定免疫年龄数值S400 determines the immune age value
在该步骤中,基于统计特征,确定个体的免疫年龄数值。In this step, based on the statistical characteristics, the immune age value of the individual is determined.
根据本发明的实施例,基于至少一个统计特征,利用最大后验概率估计,确定免疫年龄数值。According to an embodiment of the present invention, the immune age value is determined based on at least one statistical feature using a maximum a posteriori probability estimate.
根据本发明的实施例,在步骤S400中,进一步包括:(4-1)利用预先确定的免疫年龄预测系数分布(主要依据选取特征的特性确定参数先验分布,如果选取的特征是连续,在大数据量的情况下,一般认为是正态分布),基于统计特征的每一个,分别确定各统计特征所对应的免疫年龄预测系数;和(4-2)按照公式
确定个体的免疫年龄,其中,IA表示个体的免疫年龄,i表示统计特征的编号,n表示统计特征的数目,θi表示第i个统计特征所对应的免疫年龄预测系数,xi表示第i个统计特征的数值,θ0表示预测模型中的偏置项。
According to an embodiment of the present invention, in step S400, it further includes: (4-1) using a predetermined immune age prediction coefficient distribution (mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution), based on each statistical feature, determine the immune age prediction coefficient corresponding to each statistical feature; and (4-2) According to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, θi represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, θ0 represents the bias term in the prediction model.
为了方便理解,下面对最大后验概率估计的原理进行解释如下:For the convenience of understanding, the principle of maximum posterior probability estimation is explained as follows:
根据本发明的实施例,基于以上特征指数,结合生化指标采用MAP(maximum a posteriori probability estimate,最大后验概率估计)模型进行IA计算,从而进行综合性免疫力评估和机体风险预测,具体原理如下:According to an embodiment of the present invention, based on the above characteristic indices, combined with biochemical indices, the MAP (maximum a posteriori probability estimate, maximum a posteriori probability estimation) model is used to perform IA calculation, so as to perform comprehensive immunity assessment and body risk prediction. The specific principles are as follows :
MAP的理论依据源于贝叶斯模型,贝叶斯公式如下:The theoretical basis of MAP is derived from the Bayesian model, and the Bayesian formula is as follows:
由全概率公式将B时间展开得到如下公式:The following formula is obtained by expanding the B time by the full probability formula:
其中,~A表示“非A”,Among them, ~A means "not A",
生化指标主要包括常规的指标,如大生化、血常规等。Biochemical indicators mainly include conventional indicators, such as macrobiochemical, blood routine and so on.
MAP的原理具体如下:The principle of MAP is as follows:
最大后验概率假设在给定观测指标x下,预测参数θ的取值,假设f为x的抽样分布,则f(x|θ)为在给定参数θ时观测值为x的概率。假设g为参数θ的先验分布(可由训练数据得到),则根据贝叶斯公式,有:The maximum posterior probability assumes that under a given observation index x, the value of the prediction parameter θ is assumed, and if f is the sampling distribution of x, then f(x|θ) is the probability that the observed value is x when the parameter θ is given. Assuming that g is the prior distribution of the parameter θ (which can be obtained from the training data), then according to the Bayesian formula, there are:
其中,训练数据主要依据选取特征的特性确定参数先验分布,如果选取的特征是连续,在大数据量的情况下,一般认为是正态分布,如果是离散的,直接按照下面的公式加权累乘即可。选取的训练集成员主要包括免疫组库分析得到的一些指标(V/J基因使用多样性,免疫多样性,免疫细胞种类,免疫细胞均一性)以及一些生化指标(大生化,血常规等)。Among them, the training data is mainly based on the characteristics of the selected features to determine the prior distribution of the parameters. If the selected features are continuous, in the case of a large amount of data, it is generally considered to be a normal distribution. If it is discrete, it is directly weighted according to the formula below. Just multiply. The selected members of the training set mainly include some indicators (V/J gene usage diversity, immune diversity, immune cell type, immune cell homogeneity) obtained from immune repertoire analysis and some biochemical indicators (large biochemical, blood routine, etc.).
其中,
为θ的参数空间,由于参数空间
是连续的,因此分母以积分的形式计算,则:
in, is the parameter space of θ, since the parameter space is continuous, so the denominator is calculated as an integral, then:
其中
为使函数f(x|θ)g(θ)取最大值的参数,即预测Immune Age(IA)的系数。若观测值为n维的(即x=(x
1,x
2,…,x
n)),则
in The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). If the observed value is n-dimensional (ie x=(x 1 ,x 2 ,...,x n )), then
IA的预测公式如下:The prediction formula of IA is as follows:
S500确定免疫力指数S500 Determines the Immunity Index
在该步骤中,基于免疫年龄数值,确定个体的免疫力指数。In this step, the immunity index of the individual is determined based on the immune age value.
根据本发明的实施例,免疫力指数是通过下列公式确定的:According to an embodiment of the present invention, the immunity index is determined by the following formula:
其中,IA表示在步骤S400中确定的免疫年龄数值,IAmax表示预先确定的群体中的IA上限,IAmin表示预先确定的群体中的IA下限。Wherein, IA represents the immune age value determined in step S400, IAmax represents the upper limit of IA in the predetermined group, and IAmin represents the lower limit of IA in the predetermined group.
在确定个体的免疫力指数后,该技术方案可以实现从分子层面上实现对个体适应性免疫系统进行高灵敏度检测,而且可以实现无创的早期诊断、疗效评估、病情追踪、复发预测以及免疫力综合评估。After determining the immune index of the individual, the technical solution can realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and can realize non-invasive early diagnosis, curative effect evaluation, disease tracking, recurrence prediction and comprehensive immunity. Evaluate.
根据本发明的实施例,通过测序可以采用少量的样本即可实施本发明的方法,以从分子层面上实现对个体适应性免疫系统进行高灵敏度检测,而且可以实现无创的早期诊断、疗效评估、病情追踪、复发预测以及免疫力综合评估。例如根据本发明的实施例,可以采用PCR技术扩增外周血中淋巴细胞含有的基因,所需血液样本少,样本后续处理简便,不需要进行不准确的人孔血细胞观察技术,也不需要操作复杂的免疫标记和流式分析。对于骨髓瘤检验,因为只需要采取外周血,不需要实施骨髓穿刺,可以减少对病人身体的损伤,具有积极的意义。总之,根据本发明的实施例,免疫组库测序进行免疫评估不仅可以提升 检测的灵敏度,而且可以实现早期诊断,评估疗效,追踪病情,预测复发以及免疫力的综合评估等功能。According to the embodiments of the present invention, the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment. For example, according to the embodiments of the present invention, the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis. For myeloma test, because only peripheral blood needs to be taken, no bone marrow puncture is required, which can reduce the damage to the patient's body, which has positive significance. In a word, according to the embodiment of the present invention, immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of disease condition, prediction of recurrence, and comprehensive evaluation of immunity.
在本发明的第二方面,本发明提出了一种确定个体免疫力指数的设备,根据本发明的实施例,参考图3,该设备包括:In a second aspect of the present invention, the present invention provides a device for determining an individual immunity index. According to an embodiment of the present invention, referring to FIG. 3 , the device includes:
测序数据获取单元100、测序结果分析单元200、统计单元300、免疫年龄确定单元400和免疫力指数确定单元500。其中,测序数据获取单元100,用于获取待测个体的核酸测序数据;测序结果分析单元200,用于通过将测序结果与参考序列比对,确定核酸样本中所包含V/J序列以及CDR序列;统计单元300,用于基于核酸样本中所包含V/J序列以及CDR3序列,确定统计特征,统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;免疫年龄确定单元400,用于基于统计特征,确定个体的免疫年龄数值;免疫力指数确定单元500,用于基于免疫年龄数值,确定个体的免疫力指数。The sequencing data acquisition unit 100 , the sequencing result analysis unit 200 , the statistics unit 300 , the immune age determination unit 400 and the immune index determination unit 500 . The sequencing data acquisition unit 100 is used to acquire nucleic acid sequencing data of the individual to be tested; the sequencing result analysis unit 200 is used to determine the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with the reference sequence Statistical unit 300 for determining statistical features based on the V/J sequences and CDR3 sequences contained in the nucleic acid sample, the statistical features including at least one selected from the following: V/J gene usage diversity index, immune cell diversity index , the number of immune cell types, and the immune cell homogeneity index; the immune age determination unit 400 is used to determine the immune age value of the individual based on the statistical characteristics; the immune index determination unit 500 is used to determine the immune age value of the individual based on the immune age value. index.
采用本发明的实施例的该设备,可以有效地实施前面所描述的确定个体免疫力的方法。由此,前面所描述的特征和优点同样适用于该设备,在此不再赘述。Using the apparatus of an embodiment of the present invention, the previously described method of determining immunity of an individual can be effectively implemented. Thus, the features and advantages described above are also applicable to the device and will not be repeated here.
根据本发明的实施例,参考图4,测序数据获取单元进一步包括:核酸样本获取模块110、第一扩增模块120和第二扩增模块130、测序模块140。其中,根据本发明的实施例,核酸样本获取模块110,用于获取待测个体的核酸样本,核酸样本包括DNA分子和RNA分子的至少之一;第一扩增模块120,用于采用VJ特异性引物进行第一扩增处理,以便获得第一扩增产物;第二扩增模块130,用于对第一扩增产物进行第二扩增处理,以便获得第二扩增产物,其中,第二扩增产物携带测序接头;测序模块140,用于对第二扩增产物进行测序,以便获得测序结果;核酸样本是从个体的血液或者组织样本中获得的。According to an embodiment of the present invention, referring to FIG. 4 , the sequencing data acquisition unit further includes: a nucleic acid sample acquisition module 110 , a first amplification module 120 and a second amplification module 130 , and a sequencing module 140 . Among them, according to the embodiment of the present invention, the nucleic acid sample acquisition module 110 is used to acquire nucleic acid samples of the individual to be tested, and the nucleic acid samples include at least one of DNA molecules and RNA molecules; the first amplification module 120 is used to use VJ specific The first amplification process is performed on the primers to obtain the first amplification product; the second amplification module 130 is used for performing the second amplification process on the first amplification product to obtain the second amplification product, wherein the first amplification product is The second amplification product carries a sequencing adapter; the sequencing module 140 is used to sequence the second amplification product so as to obtain a sequencing result; the nucleic acid sample is obtained from an individual's blood or tissue sample.
根据本发明的实施例,VJ特异性引物含有测序接头的一部分序列。According to an embodiment of the present invention, the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
根据本发明的实施例,CDR序列为CDR1、CDR2和CDR3序列的至少之一,优选CDR3序列。According to an embodiment of the present invention, the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.
根据本发明的实施例,V/J基因使用多样性指数和免疫细胞多样性指数的至少之一为香农指数。According to an embodiment of the present invention, at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
根据本发明的实施例,免疫细胞的种类是基于CDR3序列确定的。According to an embodiment of the present invention, the type of immune cells is determined based on the CDR3 sequence.
根据本发明的实施例,免疫细胞均一性指数为基尼指数。According to an embodiment of the present invention, the immune cell homogeneity index is the Gini index.
根据本发明的实施例,免疫年龄确定单元适于基于至少一个统计特征,利用最大后验概率估计,确定免疫年龄数值。According to an embodiment of the invention, the immune age determination unit is adapted to determine the immune age value based on the at least one statistical feature using a maximum a posteriori probability estimate.
根据本发明的实施例,免疫年龄确定单元用于:利用预先确定的免疫年龄预测系数分 布,基于统计特征的每一个,分别确定各统计特征所对应的免疫年龄预测系数;和按照公式
确定个体的免疫年龄,其中,IA表示个体的免疫年龄,i表示统计特征的编号,n表示统计特征的数目,θi表示第i个统计特征所对应的免疫年龄预测系数,xi表示第i个统计特征的数值,θ0表示预先预测模型中的偏置项。
According to an embodiment of the present invention, the immune age determination unit is configured to: using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each statistical feature; and according to the formula Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, θi represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, θ0 represents the bias term in the pre-prediction model.
根据本发明的实施例,免疫力指数是通过下列公式确定的:According to an embodiment of the present invention, the immunity index is determined by the following formula:
其中,IA表示在免疫年龄确定单元中确定的免疫年龄数值,IAmax表示预先确定的群体中的IA上限,IAmin表示预先确定的群体中的IA下限。Wherein, IA represents the immune age value determined in the immune age determination unit, IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.
在本发明的第三方面,本发明提出了一种电子设备,根据本发明的实施例,包括处理器和存储器,存储器存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令以实现前面的确定个体免疫力指数的方法。In a third aspect of the present invention, the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions. Instructions to implement the preceding method of determining an individual's immunity index.
在本发明的第四方面,本发明提出了一种机器可读存储介质,根据本发明的实施例,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现前面任一项的确定个体免疫力指数的方法。In a fourth aspect of the present invention, the present invention provides a machine-readable storage medium. According to an embodiment of the present invention, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and when executed, the machine-executable instructions cause a processor to implement any of the preceding methods of determining an immunity index of an individual.
实施例1:Example 1:
1、测序数据获取1. Sequencing data acquisition
采集1000例志愿者的外周血液5mL,利用DNA提取试剂盒提取外周血样本的DNA,利用V基因和J基因特异性引物对DNA样本进行扩增,引物中带有部分测序接头,以便获得带有部分测序接头的V基因样本和J基因样本。Collect 5 mL of peripheral blood from 1000 volunteers, extract the DNA from the peripheral blood samples using a DNA extraction kit, and amplify the DNA samples using V gene and J gene specific primers with partial sequencing adapters in order to obtain DNA samples with V gene samples and J gene samples of partially sequenced adapters.
针对所得到的扩增样本,再利用带有测序接头的引物进行进一步扩增建库,并对测序文库进行高通量测序。For the obtained amplified samples, primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
2、测序数据分析2. Sequencing data analysis
数据下机后,对测序数据进行如下分析:After the data is off the computer, the sequencing data is analyzed as follows:
(1)采用SOAPnuke(v1.5.3)对原始测序数据进行接头污染序列、低质量碱基和序列(根据序列中碱基的平均质量值和所含的N碱基数量占比两个指标进行过滤,“read的碱基质量值小于等于20”、“N碱基数大于等于5”,两者满足其一或全满足的被过滤掉)的过滤;(1) SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , "the base quality value of the read is less than or equal to 20", "the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
(2)把FASTQ文件转换为FASTA文件;(2) Convert the FASTQ file to a FASTA file;
(3)采用blastall(v2.2.25)对预处理之后的FASTA序列比对到V(D)J参考基因序列上,并进行重比对,选择最佳的比对结果;(3) Use blastall (v2.2.25) to align the pretreated FASTA sequence to the V(D)J reference gene sequence, and perform multiple alignments to select the best alignment result;
(4)将比对后的序列数据进行结构分析(错误校正和区域确定),采用华大基因结构分析程序对PCR和测序环节的引入的错误进行矫正,其次利用V/J基因参考序列与保守氨基酸的规律与建立的计算方法确定CDR3区域。(4) Perform structural analysis (error correction and region determination) on the aligned sequence data, and use the BGI gene structure analysis program to correct the errors introduced in PCR and sequencing. The regularity of amino acids and the established computational method determine the CDR3 region.
3、指标统计与预测3. Indicator statistics and forecasting
对免疫组库特征数据进行统计,并根据自主开发模型进行免疫力预测和分析。Statistics on immune repertoire feature data, and immunity prediction and analysis based on self-developed models.
统计特征主要包括以下几个:Statistical features mainly include the following:
V/J基因使用多样性,即Shannon_index(V-J);V/J gene usage diversity, i.e. Shannon_index(V-J);
免疫多样性,即Shannon_index(CDR3_aa);Immune diversity, i.e. Shannon_index(CDR3_aa);
免疫细胞种类,即Uniq_number(CDR3_aa);Immune cell type, i.e. Uniq_number (CDR3_aa);
免疫细胞均一性,即Clone_Gini。Immune cell homogeneity, i.e. Clone_Gini.
以上指标中,Shannon_index表示Shannon指数,计算公式如下:Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:
其中,如果以CDR3为例,S表示唯一CDR3的总数,p(i)表示CDR3的频率。Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.
Uniq_number表示唯一序列数。Uniq_number represents the unique sequence number.
Clone_Gini表示Gini指数,计算公式如下:Clone_Gini represents the Gini index, and the calculation formula is as follows:
其中,x指每一种免疫细胞类型出现的频率,n指免疫细胞种类数。Among them, x refers to the frequency of each immune cell type, and n refers to the number of immune cell types.
基于以上特征指数,结合血常规生化指标采用MAP(maximum a posteriori probability estimate,最大后验概率估计)模型进行IA计算,从而进行综合性免疫力评估和机体风险预测。Based on the above characteristic indexes, combined with blood routine biochemical indexes, the MAP (maximum a posteriori probability estimate) model was used for IA calculation, so as to conduct comprehensive immunity assessment and body risk prediction.
其中
为使函数f(x|θ)g(θ)取最大值的参数,即预测Immune Age(IA)的系数。若观测值为n维的(即x=(x
1,x
2,…,x
n)),则
in The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). If the observed value is n-dimensional (ie x=(x 1 ,x 2 ,...,x n )), then
IA的预测公式如下:The prediction formula of IA is as follows:
确定免疫力:Determine immunity:
基于预测出来的IA,结合群体分布特征,最终确定个体免疫力Immune Index(II)情况,具体模型如下:Based on the predicted IA, combined with the population distribution characteristics, the individual immunity Immune Index (II) is finally determined. The specific model is as follows:
其中,IA表示预测样本的免疫年龄,IA
max和IA
min分别表示群体分布中的上限和下限。
where IA represents the immune age of the predicted sample, and IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
实施例2:Example 2:
1、测序数据获取1. Sequencing data acquisition
采集439例志愿者的外周血液5mL,利用DNA提取试剂盒提取外周血样本的DNA,利用V基因和J基因特异性引物对DNA样本进行扩增,引物中带有部分测序接头,以便获得带有部分测序接头的V基因样本和J基因样本。Collect 5 mL of peripheral blood from 439 volunteers, extract the DNA from the peripheral blood samples using a DNA extraction kit, and amplify the DNA samples using V gene and J gene specific primers with partial sequencing adapters in order to obtain DNA samples with V gene samples and J gene samples of partially sequenced adapters.
针对所得到的扩增样本,再利用带有测序接头的引物进行进一步扩增建库,并对测序文库进行高通量测序。For the obtained amplified samples, primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.
2、测序数据分析2. Sequencing data analysis
数据下机后,对测序数据进行如下分析:After the data is off the computer, the sequencing data is analyzed as follows:
(1)采用SOAPnuke(v1.5.3)对原始测序数据进行接头污染序列、低质量碱基和序列(根据序列中碱基的平均质量值和所含的N碱基数量占比两个指标进行过滤,“read的碱基质量值小于等于20”、“N碱基数大于等于5”,两者满足其一或全满足的被过滤掉)的过滤;(1) SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , "the base quality value of the read is less than or equal to 20", "the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);
(2)把FASTQ文件转换为FASTA文件;(2) Convert the FASTQ file to a FASTA file;
(3)采用blastall(v2.2.25)对预处理之后的FASTA序列比对到V(D)J参考基因序列上,并进行重比对,选择最佳的比对结果;(3) Use blastall (v2.2.25) to align the pretreated FASTA sequence to the V(D)J reference gene sequence, and perform multiple alignments to select the best alignment result;
(4)将比对后的序列数据进行结构分析(错误校正和区域确定),采用华大基因结构分析程序对PCR和测序环节的引入的错误进行矫正,其次利用V/J基因参考序列与保守氨基酸的规律与建立的计算方法确定CDR3区域。(4) Perform structural analysis (error correction and region determination) on the aligned sequence data, and use the BGI gene structure analysis program to correct the errors introduced in PCR and sequencing. The regularity of amino acids and the established computational method determine the CDR3 region.
3、指标统计3. Indicator statistics
对免疫组库特征数据进行统计,统计特征主要包括以下3个:Statistical data on immune repertoire characteristics mainly include the following three:
免疫多样性,即Shannon_index(CDR3_aa);Immune diversity, i.e. Shannon_index(CDR3_aa);
免疫细胞种类,即Uniq_number(CDR3_aa);Immune cell type, i.e. Uniq_number (CDR3_aa);
序列多样性,即Uniq_number(seq_aa)。Sequence diversity, i.e. Uniq_number(seq_aa).
以上指标中,Shannon_index表示Shannon指数,计算公式如下:Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:
其中,如果以CDR3为例,S表示唯一CDR3的总数,p(i)表示CDR3的频率。Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.
Uniq_number表示唯一序列数。Uniq_number represents the unique sequence number.
4、预处理4. Preprocessing
移除3个含有缺失值的样本。Remove 3 samples with missing values.
5、模型训练5. Model training
将剩余436个样本按年龄分为3组(20-30岁、30-50岁、>50岁),从每组各随机抽取75%的样本并将其合并为训练集,剩余的111个样本作为测试集。Divide the remaining 436 samples into 3 groups by age (20-30 years old, 30-50 years old, >50 years old), randomly select 75% of the samples from each group and combine them into the training set, the remaining 111 samples as a test set.
使用训练集,基于上述3个免疫组库特征指数,采用MAP(maximum a posteriori probability estimate,最大后验概率估计)模型进行IA计算,从而进行综合性免疫力评估和机体风险预测。模型参数的训练过程如下:Using the training set, based on the above three immune repertoire characteristic indices, the MAP (maximum a posteriori probability estimate, maximum a posteriori probability estimation) model was used for IA calculation, so as to conduct comprehensive immunity assessment and body risk prediction. The training process of the model parameters is as follows:
其中
为使函数f(x|θ)g(θ)取最大值的参数,即预测Immune Age(IA)的系数。此处观测值是3维的(即x=(x
1,x
2,x
3)),则
in The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). Here the observations are 3-dimensional (ie x=(x 1 , x 2 , x 3 )), then
基于训练出的参数,得到IA的预测公式:Based on the trained parameters, the prediction formula of IA is obtained:
基于预测出来的IA,结合群体分布特征,最终确定个体免疫力Immune Index(II)。具体公式如下:Based on the predicted IA, combined with the population distribution characteristics, the individual immunity Immune Index (II) was finally determined. The specific formula is as follows:
其中,IA表示预测样本的免疫年龄,IA
max和IA
min分别表示群体分布中的上限和下限。
where IA represents the immune age of the predicted sample, and IA max and IA min represent the upper and lower bounds in the population distribution, respectively.
6、II预测结果6. II prediction results
从图5和6可以看出,随着年龄的增加,免疫力指数呈下降趋势。尽管年龄段大于50的样本量较少,图6呈现出的免疫力指数下降趋势不太明显,但图5中呈现出的免疫力指数下降趋势较明显。因此,该实施例的结果表明,免疫力指数可以作为一个用来评估健康指数的指标。As can be seen from Figures 5 and 6, the immunity index showed a downward trend with increasing age. Although the sample size of the age group greater than 50 is small, the decline trend of the immunity index shown in Figure 6 is not obvious, but the decline trend of the immunity index shown in Figure 5 is more obvious. Therefore, the results of this example show that the immunity index can be used as an index for evaluating the health index.
尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的, 不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.
Claims (23)
- 一种确定个体免疫力指数的方法,其特征在于,包括:A method for determining an individual immunity index, characterized in that it comprises:(1)获取待测个体的核酸测序数据;(1) Obtain nucleic acid sequencing data of the individual to be tested;(2)通过将所述测序结果与参考序列比对,确定所述核酸样本中所包含V/J序列以及CDR序列;(2) by aligning the sequencing result with the reference sequence, determine the V/J sequence and the CDR sequence contained in the nucleic acid sample;(3)基于所述核酸样本中所包含V/J序列以及CDR序列,确定统计特征,所述统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;(3) Determine statistical features based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index , number of immune cell types, immune cell homogeneity index;(4)基于所述统计特征,确定所述个体的免疫年龄数值;和(4) determining the immune age value of the individual based on the statistical characteristics; and(5)基于所述免疫年龄数值,确定所述个体的免疫力指数。(5) Determine the immunity index of the individual based on the immune age value.
- 根据权利要求1所述的方法,其特征在于,所述测序数据是通过下列步骤获得的:The method according to claim 1, wherein the sequencing data is obtained by the following steps:(1-1)获取待测个体的核酸样本,所述核酸样本包括DNA分子和RNA分子的至少之一;(1-1) Obtaining a nucleic acid sample of the individual to be tested, the nucleic acid sample includes at least one of a DNA molecule and an RNA molecule;(1-2)采用VJ特异性引物进行第一扩增处理,以便获得第一扩增产物;(1-2) using VJ-specific primers for the first amplification process to obtain the first amplification product;(1-3)对所述第一扩增产物进行第二扩增处理,以便获得第二扩增产物,其中,所述第二扩增产物携带测序接头;(1-3) performing a second amplification process on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter;(1-4)对所述第二扩增产物进行测序,以便获得测序结果;(1-4) Sequencing the second amplification product to obtain a sequencing result;所述核酸样本是从所述个体的血液或者组织样本中获得的。The nucleic acid sample is obtained from a blood or tissue sample of the individual.
- 根据权利要求1所述的方法,其特征在于,所述VJ特异性引物含有所述测序接头的一部分序列。The method of claim 1, wherein the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
- 根据权利要求1所述的方法,其特征在于,所述CDR序列为CDR1、CDR2和CDR3序列的至少之一。The method according to claim 1, wherein the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences.
- 根据权利要求4所述的方法,其特征在于,所述CDR序列为CDR3序列。The method of claim 4, wherein the CDR sequence is a CDR3 sequence.
- 根据权利要求1所述的方法,其特征在于,所述V/J基因使用多样性指数和免疫细胞多样性指数的至少之一为香农指数。The method according to claim 1, wherein at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
- 根据权利要求1所述的方法,其特征在于,所述免疫细胞的种类是基于所述CDR3序列确定的。The method of claim 1, wherein the type of the immune cell is determined based on the CDR3 sequence.
- 根据权利要求1所述的方法,其特征在于,所述免疫细胞均一性指数为基尼指数。The method according to claim 1, wherein the immune cell homogeneity index is a Gini index.
- 根据权利要求1所述的方法,其特征在于,基于至少一个所述统计特征,利用最大后验概率估计,确定所述免疫年龄数值。The method of claim 1, wherein the immune age value is determined using a maximum a posteriori probability estimate based on at least one of the statistical features.
- 根据权利要求1所述的方法,其特征在于,在步骤(4)中,进一步包括:The method according to claim 1, characterized in that, in step (4), further comprising:(4-1)利用预先确定的免疫年龄预测系数分布,基于所述统计特征的每一个,分别确定各所述统计特征所对应的免疫年龄预测系数;和(4-1) Using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each of the statistical features; and(4-2)按照公式 确定所述个体的免疫年龄, (4-2) According to the formula determining the immune age of said individual,其中,IA表示所述个体的免疫年龄,i表示所述统计特征的编号,n表示所述统计特征的数目,θi表示第i个所述统计特征所对应的免疫年龄预测系数,xi表示第i个所述统计特征的数值,θ0表示预测模型中的偏置项。Wherein, IA represents the immune age of the individual, i represents the number of the statistical feature, n represents the number of the statistical feature, θi represents the immune age prediction coefficient corresponding to the i-th statistical feature, and xi represents the i-th statistical feature The numerical values of the statistical features, θ0 represents the bias term in the prediction model.
- 根据权利要求10所述的方法,其特征在于,所述免疫力指数是通过下列公式确定的:The method of claim 10, wherein the immunity index is determined by the following formula:其中,IA表示在步骤(4)中确定的所述免疫年龄数值,IAmax表示预先确定的群体中的IA上限,IAmin表示预先确定的群体中的IA下限。Wherein, IA represents the immune age value determined in step (4), IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.
- 一种确定个体免疫力指数的设备,其特征在于,包括:A device for determining an individual immunity index, characterized in that it includes:测序数据获取单元,用于获取待测个体的核酸测序数据;A sequencing data acquisition unit for acquiring nucleic acid sequencing data of the individual to be tested;测序结果分析单元,用于通过将所述测序结果与参考序列比对,确定所述核酸样本中所包含V/J序列以及CDR序列;A sequencing result analysis unit for determining the V/J sequence and the CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence;统计单元,用于基于所述核酸样本中所包含V/J序列以及CDR序列,确定统计特征,所述统计特征包括选自下列的至少之一:V/J基因使用多样性指数、免疫细胞多样性指数、免疫细胞种类数目、免疫细胞均一性指数;A statistical unit for determining statistical features based on the V/J sequences and CDR sequences contained in the nucleic acid sample, the statistical features including at least one selected from the following: V/J gene usage diversity index, immune cell diversity Sex index, number of immune cell types, immune cell homogeneity index;免疫年龄确定单元,用于基于所述统计特征,确定所述个体的免疫年龄数值;和an immune age determination unit for determining an immune age value of the individual based on the statistical characteristics; and免疫力指数确定单元,用于基于所述免疫年龄数值,确定所述个体的免疫力指数。An immunity index determination unit, configured to determine the immunity index of the individual based on the immune age value.
- 根据权利要求12所述的设备,其特征在于,所述测序数据获取单元进一步包括:The device according to claim 12, wherein the sequencing data acquisition unit further comprises:核酸样本获取模块,用于获取待测个体的核酸样本,所述核酸样本包括DNA分子和RNA分子的至少之一;a nucleic acid sample acquisition module, used for acquiring a nucleic acid sample of an individual to be tested, the nucleic acid sample comprising at least one of DNA molecules and RNA molecules;第一扩增模块,用于采用VJ特异性引物进行第一扩增处理,以便获得第一扩增产物;a first amplification module for performing a first amplification process using VJ-specific primers, so as to obtain a first amplification product;第二扩增模块,用于对所述第一扩增产物进行第二扩增处理,以便获得第二扩增产物,其中,所述第二扩增产物携带测序接头;a second amplification module, configured to perform a second amplification process on the first amplification product, so as to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter;测序模块,用于对所述第二扩增产物进行测序,以便获得测序结果;a sequencing module for sequencing the second amplification product, so as to obtain a sequencing result;所述核酸样本是从所述个体的血液或者组织样本中获得的。The nucleic acid sample is obtained from a blood or tissue sample of the individual.
- 根据权利要求12所述的设备,其特征在于,所述VJ特异性引物含有所述测序接头的一部分序列。The apparatus of claim 12, wherein the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
- 根据权利要求12所述的设备,其特征在于,所述CDR序列为CDR1、CDR2和 CDR3序列的至少之一,优选CDR3序列。The device according to claim 12, wherein the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.
- 根据权利要求12所述的设备,其特征在于,所述V/J基因使用多样性指数和免疫细胞多样性指数的至少之一为香农指数。The device according to claim 12, wherein at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
- 根据权利要求12所述的设备,其特征在于,所述免疫细胞的种类是基于所述CDR3序列确定的。The device of claim 12, wherein the type of immune cells is determined based on the CDR3 sequence.
- 根据权利要求12所述的设备,其特征在于,所述免疫细胞均一性指数为基尼指数。The device according to claim 12, wherein the immune cell homogeneity index is a Gini index.
- 根据权利要求12所述的设备,其特征在于,所述免疫年龄确定单元适于基于至少一个所述统计特征,利用最大后验概率估计,确定所述免疫年龄数值。13. The apparatus of claim 12, wherein the immune age determination unit is adapted to determine the immune age value based on at least one of the statistical features using a maximum a posteriori probability estimate.
- 根据权利要求12所述的设备,其特征在于,所述免疫年龄确定单元用于:The device according to claim 12, wherein the immune age determination unit is used for:利用预先确定的免疫年龄预测系数分布,基于所述统计特征的每一个,分别确定各所述统计特征所对应的免疫年龄预测系数;和Using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, determine the immune age prediction coefficient corresponding to each of the statistical features, respectively; and其中,IA表示所述个体的免疫年龄,i表示所述统计特征的编号,n表示所述统计特征的数目,θi表示第i个所述统计特征所对应的免疫年龄预测系数,xi表示第i个所述统计特征的数值,θ0表示预测模型中的偏置项。Wherein, IA represents the immune age of the individual, i represents the number of the statistical feature, n represents the number of the statistical feature, θi represents the immune age prediction coefficient corresponding to the i-th statistical feature, and xi represents the i-th statistical feature The numerical values of the statistical features, θ0 represents the bias term in the prediction model.
- 根据权利要求20所述的设备,其特征在于,所述免疫力指数是通过下列公式确定的:The device of claim 20, wherein the immunity index is determined by the following formula:其中,IA表示在所述免疫年龄确定单元中确定的所述免疫年龄数值,IAmax表示预先确定的群体中的IA上限,IAmin表示预先确定的群体中的IA下限。Wherein, IA represents the immune age value determined in the immune age determination unit, IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.
- 一种电子设备,其特征在于,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现权利要求1-11任一项所述的确定个体免疫力指数的方法。An electronic device, characterized by comprising a processor and a memory, wherein the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement claims 1- The method for determining an individual immunity index according to any one of 11.
- 一种机器可读存储介质,其特征在于,该机器可读存储介质存储有机器可执行指令,该机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现权利要求1-11任一项所述的确定个体免疫力指数的方法。A machine-readable storage medium, characterized in that the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are invoked and executed by a processor, the machine-executable instructions cause the processor to implement claim 1 - The method for determining an individual immunity index according to any one of 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180065823.0A CN116391237A (en) | 2021-03-30 | 2021-09-08 | Method, device, electronic device and machine readable storage medium for determining an individual immunity index |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110342463.6 | 2021-03-30 | ||
CN202110342463 | 2021-03-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022205775A1 true WO2022205775A1 (en) | 2022-10-06 |
Family
ID=83455556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/117149 WO2022205775A1 (en) | 2021-03-30 | 2021-09-08 | Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116391237A (en) |
WO (1) | WO2022205775A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100062473A1 (en) * | 2006-06-15 | 2010-03-11 | Katsuiku Hirokawa | Immunity evaluation method, immunity evaluation apparatus, immunity evaluation program and data recording medium having the immunity evaluation program stored therein |
US20140235478A1 (en) * | 2013-02-04 | 2014-08-21 | The Board Of Trustees Of The Leland Stanford Junior University | Measurement and Comparison of Immune Diversity by High-Throughput Sequencing |
US20180356403A1 (en) * | 2017-06-09 | 2018-12-13 | The Regents Of The University Of California | Use of Immune Repertoire Diversity For Predicting Transplant Rejection |
WO2019215740A1 (en) * | 2018-05-07 | 2019-11-14 | Technion Research & Development Foundation Limited | Immune age and use thereof |
WO2020178816A1 (en) * | 2019-03-04 | 2020-09-10 | The National Institute for Biotechnology in the Negev Ltd. | Kits, compositions and methods for evaluating immune system status |
CN112331344A (en) * | 2020-11-12 | 2021-02-05 | 深圳泛因医学有限公司 | Immune state evaluation method and application |
-
2021
- 2021-09-08 WO PCT/CN2021/117149 patent/WO2022205775A1/en unknown
- 2021-09-08 CN CN202180065823.0A patent/CN116391237A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100062473A1 (en) * | 2006-06-15 | 2010-03-11 | Katsuiku Hirokawa | Immunity evaluation method, immunity evaluation apparatus, immunity evaluation program and data recording medium having the immunity evaluation program stored therein |
US20140235478A1 (en) * | 2013-02-04 | 2014-08-21 | The Board Of Trustees Of The Leland Stanford Junior University | Measurement and Comparison of Immune Diversity by High-Throughput Sequencing |
US20180356403A1 (en) * | 2017-06-09 | 2018-12-13 | The Regents Of The University Of California | Use of Immune Repertoire Diversity For Predicting Transplant Rejection |
WO2019215740A1 (en) * | 2018-05-07 | 2019-11-14 | Technion Research & Development Foundation Limited | Immune age and use thereof |
WO2020178816A1 (en) * | 2019-03-04 | 2020-09-10 | The National Institute for Biotechnology in the Negev Ltd. | Kits, compositions and methods for evaluating immune system status |
CN112331344A (en) * | 2020-11-12 | 2021-02-05 | 深圳泛因医学有限公司 | Immune state evaluation method and application |
Also Published As
Publication number | Publication date |
---|---|
CN116391237A (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190325988A1 (en) | Method and system for rapid genetic analysis | |
CN104271759B (en) | Detection as the type spectrum of the same race of disease signal | |
JP2014503223A (en) | Method for evaluating immune diversity and use thereof | |
WO2018160548A1 (en) | Markers for coronary artery disease and uses thereof | |
US20200357487A1 (en) | Computer-implemented method and system for determining a disease status of a subject from immune-receptor sequencing data | |
WO2021232388A1 (en) | Method for determining base type of predetermined site in embryonic cell chromosome, and application thereof | |
CN105506115A (en) | DNA library for detection and diagnosis of hereditary cardiomyopathy causing genes and application thereof | |
WO2014186036A1 (en) | Methods for evaluating copd status | |
CN110904213B (en) | Ulcerative colitis biomarker based on intestinal flora and application thereof | |
JP2022512890A (en) | Sample quality evaluation method | |
CN107208131A (en) | Method for lung cancer parting | |
Habgood-Coote et al. | Diagnosis of childhood febrile illness using a multi-class blood RNA molecular signature | |
WO2019224668A1 (en) | Method for determining the probability of the risk of chromosomal and genetic disorders from free dna of fetal origin | |
CN109072306A (en) | Isolated nucleic acid and application | |
WO2022205775A1 (en) | Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium | |
WO2023086999A1 (en) | Systems and methods for evaluating immunological peptide sequences | |
CN113178257A (en) | Training method of classification model of pulmonary nodules | |
JP2022533656A (en) | Immune repertoire health assessment system and method | |
CN112118781A (en) | Assessment of transplant rejection status by analysis of T cell receptor subunit pool diversity | |
Ghraichy et al. | Maturation of the human B-cell receptor repertoire with age | |
CN116287207B (en) | Use of biomarkers in diagnosing cardiovascular related diseases | |
WO2022210606A1 (en) | Method for evaluating future risk of developing dementia | |
Pinal-Fernandez | Transcriptome profiling and longitudinal cohort studies of myositis subsets | |
Aterido et al. | Seven chain adaptive immune receptor repertoire analysis in rheumatoid arthritis: association to disease and clinically relevant phenotypes | |
CN108603870A (en) | Marker of coronary artery disease and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21934414 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |