WO2022205775A1

WO2022205775A1 - Method and device for determining immunity index of individual, electronic device, and machine-readable storage medium

Info

Publication number: WO2022205775A1
Application number: PCT/CN2021/117149
Authority: WO
Inventors: 柴相花; 袁玉英; 王梦杰; 强薇; 李宁
Original assignee: 深圳华大基因股份有限公司
Priority date: 2021-03-30
Filing date: 2021-09-08
Publication date: 2022-10-06
Also published as: CN116391237A

Abstract

A method and device for determining the immunity index of an individual, an electronic device, and a machine-readable storage medium. The method comprises: acquiring nucleic acid sequencing data of an individual to be tested (S100); determining a V/J sequence and a CDR sequence contained in a nucleic acid sample by comparing a sequencing result with a reference sequence (S200); determining statistical features on the basis of the V/J sequence and the CDR sequence contained in the nucleic acid sample (S300), the statistical features comprising at least one selected from among the following: the usage diversity index of a V/J gene, the diversity index of immune cells, the number of immune cell types, and the homogeneity index of immune cells; determining an immune age value of the individual on the basis of the statistical features (S400); and determining the immunity index of the individual on the basis of the immune age value (S500). The method for determining the immunity index of an individual can be implemented by using a small number of samples by sequencing.

Description

Method, device, electronic device, and machine-readable storage medium for determining an individual's immunity index

technical field

The present invention relates to the field of biomedicine, and in particular, the present invention relates to a method, a device, an electronic device and a machine-readable storage medium for determining an individual's immunity index.

Background technique

Immunity is the body's own defense mechanism. It is the body's ability to identify and eliminate any foreign intrusion (viruses, bacteria, etc.) Ability is the physiological response of the human body to identify and exclude "others". The immune system of the human body is maintained by the immune system, and the immune system is the best doctor in the world that the human body is born with.

The immune system consists of two cooperating subsystems that provide innate and adaptive immunity. Innate immunity refers to a non-specific defense mechanism that protects the body from toxins or foreign substances (called antigens). The rapid response of the innate immune system also activates the adaptive system, which is the body's antigen-specific response to itself.

The adaptive immune system consists of two main types of lymphocytes, called B cells and T cells. These lymphocytes have unique antigen receptors, each of which recognizes only one antigen, and this range of specificity is encoded by a fixed number of gene segments. Through a mechanism called V(D)J recombination, these genetic regions undergo irreversible somatic DNA recombination during cell development, resulting in the formation of mature lymphocytes with a single specificity. The immune repertoire refers to all the unique genetic rearrangements of T cell receptors (TCRs) and B cell receptors (BCRs) within the adaptive immune system.

With the development of precision medicine and immunotherapy, the application scenarios of immune repertoires are becoming more and more extensive. Application scenarios include: biomarker mining, detection of autoimmune and infectious diseases, immune rejection and tolerance assessment, tumor immune assessment, immune reconstitution, and drug and vaccine assessment. Therefore, immune repertoire NGS detection provides technical support for evaluating the body's adaptive immune system in healthy or diseased states.

The main methods currently on the market for analyzing immune function are:

1) Five items of immunity, to detect the content of immunoglobulin and complement in the blood. That is, by one-way immunodiffusion test, enzyme-linked immunosorbent assay (ELISA), radioimmunoassay (RIA), immunofixation electrophoresis, immunoturbidimetry and other methods to detect immunoglobulin G (IgG), immunoglobulin A in blood (IgA), immunoglobulin M (IgM), complement C3 and C4 content. Immunoglobulin and complement are the main effector components of humoral immunity. In the case of certain diseases (such as infections, autoimmune diseases, immunodeficiency diseases, etc.), the concentrations of these indicators will increase or decrease relative to the reference value, so that they can be evaluated. The clinical value of immunity and diagnosis of diseases. However, the five immunoassays target humoral immunity and cannot assess cellular immunity well. When evaluating humoral immunity, only the overall levels of IgG, IgA, IgM, and complement C3 and C4 can be detected, and in-depth analysis at the molecular sequence level cannot be performed.

2) Blood routine, the number of leukocytes in the peripheral blood is analyzed by the method of cell counting, and the increase of the number of leukocytes indicates that there is an inflammatory reaction in the body. That is, the leukocytes in peripheral blood are classified and counted by microscope observation. The total number of leukocytes above the upper limit of the reference value is called leukocytosis, and the lower limit of the reference value is leukopenia. Its increase and decrease are mainly affected by the number of neutrophils, and changes in the number of lymphocytes can also cause changes in the total number of white blood cells. From physiological changes to malignant tumors, the total number of white blood cells may be abnormal, and doctors can make clinical diagnosis based on the results of routine blood tests. However, blood routine testing can only roughly judge the overall level of cellular immunity, and cannot distinguish immunity against specific diseases, nor can it judge the classification and diversity of immune cells at the gene level.

Lymphocyte subset analysis, using flow cytometry and PCR technology to analyze the number and relative proportion of each subset of leukocytes in peripheral blood. By flow cytometry or PCR technology, the relative and absolute counts of immune cells in peripheral blood and their changes are monitored, and the immune status in disease states (such as tumors, infectious diseases, immune diseases, etc.) Assisting in diagnosis, tracking disease progression and deciding on medication timing. The most commonly detected subsets include T cells (CD3), B cells (CD19), NK cells (CD16+56), helper T cells (CD3+CD4+), and suppressor T cells (CD3+CD8+). However, there are many types of lymphocyte subsets, and if a comprehensive analysis is carried out, the amount of peripheral blood that needs to be collected, the cost and the time are all unacceptable. It is difficult to obtain a comprehensive immune system status by analyzing only a few lymphocyte subsets. In addition, lymphocyte subsets have different normal reference ranges at different ages, and the results are affected by many factors, making clinical interpretation relatively difficult.

SUMMARY OF THE INVENTION

The present invention aims to solve one of the technical problems in the related art at least to a certain extent. To this end, an object of the present invention is to carry out high-sensitivity detection of the adaptive immune system of an individual at the molecular sequence level by means of the immune repertoire sequencing method. (Immune Age (IA)) to assess the health status of the individual body to achieve early health risk prediction.

In the first aspect of the present invention, the present invention proposes a method for determining an individual immunity index. According to an embodiment of the present invention, the method includes: (1) acquiring nucleic acid sequencing data of the individual to be tested; (2) by The sequencing result is compared with the reference sequence, and the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined; (3) based on the V/J sequence and the CDR sequence contained in the nucleic acid sample, the statistical characteristics are determined. The statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; (4) based on the statistical characteristics, determine the an immune age value of an individual; and (5) determining an immunity index of the individual based on the immune age value.

According to the embodiments of the present invention, the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment. For example, according to the embodiments of the present invention, the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis. For myeloma test, because only peripheral blood needs to be taken, no bone marrow puncture is required, which can reduce the damage to the patient's body, which has positive significance. In conclusion, according to the embodiments of the present invention, immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of illness, prediction of recurrence, and comprehensive evaluation of immunity.

In a second aspect of the present invention, the present invention provides a device for determining an individual immunity index. According to an embodiment of the present invention, the device includes: a sequencing data acquisition unit for acquiring nucleic acid sequencing data of an individual to be tested; sequencing A result analysis unit for determining the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence; a statistical unit for determining the V/J sequence contained in the nucleic acid sample based on the Sequence and CDR sequence, determine statistical characteristics, and the statistical characteristics include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune cell homogeneity index; immune age a determining unit for determining an immune age value of the individual based on the statistical feature; and an immunity index determining unit for determining an immune index for the individual based on the immune age value.

Using the apparatus of an embodiment of the present invention, the previously described method of determining immunity of an individual can be effectively implemented. Thus, the features and advantages described above are also applicable to the device and will not be repeated here.

In a third aspect of the present invention, the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory storing machine-executable instructions executable by the processor, the The processor executes the machine-executable instructions to implement the aforementioned method of determining an immunity index of an individual.

In a fourth aspect of the present invention, the present invention provides a machine-readable storage medium. According to an embodiment of the present invention, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and, when executed, the machine-executable instructions cause a processor to implement the method of determining an individual's immunity index as described in any preceding item.

Description of drawings

1 is a schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention;

FIG. 2 is a partial schematic flowchart of a method for determining an individual immunity index according to an embodiment of the present invention;

3 is a schematic structural diagram of a device for determining an individual immunity index according to an embodiment of the present invention;

Fig. 4 is a partial structural schematic diagram of a device for determining an individual immunity index according to an embodiment of the present invention;

Fig. 5 is the prediction result of the immunity index of different age groups in the embodiment 2 of the present invention;

FIG. 6 is a distribution diagram of the relationship between the immunity index and individual age in Example 2 of the present invention.

Detailed ways

Embodiments of the present invention are described in detail below. The embodiments described below are exemplary, only for explaining the present invention, and should not be construed as limiting the present invention. If no specific technique or condition is indicated in the examples, the technique or condition described in the literature in the field or the product specification is used. The reagents or instruments used without the manufacturer's indication are conventional products that can be obtained from the market.

In the first aspect of the present invention, the present invention proposes a method for determining the immunity index of an individual. 1, according to an embodiment of the present invention, the method includes:

S100 obtains nucleic acid sequencing data

According to an embodiment of the present invention, in this step, nucleic acid sequencing data from the individual to be tested is first acquired for subsequent analysis. Those skilled in the art can understand that these nucleic acid sequencing data may contain the genetic information of immune cells, for example, according to embodiments of the present invention, blood samples containing immune cells or tissue samples containing immune cells (described herein) may be used. Tissue samples should be understood in a broad sense and can include at least a part of organs), such as non-encapsulated diffuse lymphoid tissue and lymph nodes contained in the submucosal mucosa of the intestinal tract, respiratory tract, urogenital tract, etc.

According to embodiments of the present invention, nucleic acid sequencing data can be obtained by high-throughput sequencing. For example, second- or third-generation sequencing platforms, including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100.

After obtaining the nucleic acid, those skilled in the art can perform sequencing according to the operation manual of the sequencing platform, so as to obtain nucleic acid sequencing data. For example, briefly, according to one embodiment of the present invention, the sequencing process includes:

For blood or tissue samples, extract DNA or RNA. For each sample, take the starting amount of DNA or RNA, add primers (TCR or BCR) for a certain chain, and perform multiple PCR amplification. PCR is carried out for a total of two rounds. One round was PCR reaction with VJ-specific primers (with partial sequencing adapters), and the second round was sequencing adapters for ordinary PCR library construction. Afterwards, multiple samples are pooled together for sequencing, resulting in data for each sample. According to the embodiment of the present invention, a tag sequence may also be introduced in the second round of PCR, thereby realizing the distinction of sample batches.

2, according to a specific embodiment of the present invention, acquiring nucleic acid sequencing data may further include:

S110 Obtain nucleic acid samples

In this step, a nucleic acid sample of the individual to be tested is obtained, and the nucleic acid sample includes at least one of DNA molecules and RNA molecules. Those skilled in the art can use commercially available kits and follow the manufacturer's instructions for extraction of DNA molecules or RNA molecules. It can be understood by those skilled in the art that, after obtaining RNA molecules, reverse transcription can be easily used to obtain cDNA molecules.

S120 First Amplification Process

After the nucleic acid sample is obtained, VJ-specific primers can be used to perform a first amplification process, so as to obtain a first amplification product.

It should be noted that the V gene and the J gene, the immune cell-specific sequences contained in the nucleic acid sample obtained in step S110, may be amplified by VJ-specific primers.

Herein, VJ-specific primers refer to specific primers that can amplify V and J genes. For V and J genes, it is worth noting that for most loci, they are classified as families according to their degree of homology. Forms come together. These VJ-specific primers can be used to analyze the combinatorial diversity of V-J rearrangements at at least one locus selected from loci TRA, TRB, TRG, TRD, IgH, IgK, IgL, and the like.

According to an embodiment of the present invention, the VJ-specific primer used in the present invention has the following nucleotide sequence:

In addition, according to an embodiment of the present invention, the VJ-specific primer contains a portion of the sequence of the sequencing adapter. Therefore, it is convenient to introduce sequencing adapters into the amplification products through the second amplification process.

S130 Second Amplification Treatment

A second amplification process is performed on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter.

The second amplification process can be performed by using the common sequence in the first amplification product, and the primers used can be set to be suitable for introduction into sequencing adapters. Thus, the obtained second amplification product constitutes a sequencing library that can be used for sequencing.

Of course, those skilled in the art can understand that, in order to improve sequencing efficiency or facilitate analysis, other conventional processing, such as hybridization probe screening, may also be performed on the second amplification product. It is not repeated here.

S140 sequencing

The second amplification product is sequenced to obtain sequencing results.

According to an embodiment of the present invention, after the sequencing library is constructed, the sequencing library (second amplification product) can be sequenced using a sequencing platform. According to embodiments of the present invention, nucleic acid sequencing data can be obtained by high-throughput sequencing. For example, second- or third-generation sequencing platforms, including but not limited to high-throughput sequencing platforms such as MGISEQ-T7, MGISEQ-2000, MGISEQ-200, BGISEQ-500, BGISEQ-50, MGISP-960, and MGISP-100. Paired-end sequencing is preferably used. It can improve the efficiency of subsequent analysis.

S200 sequence alignment to determine V/J sequences and CDR sequences

After the sequencing data is obtained, according to an embodiment of the present invention, the V/J sequence and the CDR sequence contained in the nucleic acid sample are determined by aligning the sequencing result with the reference sequence.

According to an embodiment of the present invention, before performing the alignment, software such as SOAPnuke (v1.5.3) can be used to filter the linker contaminating sequences, low-quality bases and sequences on the raw sequencing data.

The FASTQ file was converted into a FASTA file with a self-developed program for sequence splicing; finally, if the sequencing mode was paired-end sequencing, COPE (v1.5.3) and the self-developed program were used to assemble the sequences. Next, blastall (v2.2.25) can be used to align the preprocessed FASTA sequence to the V(D)J reference gene sequence, and then the self-developed program is used to perform re-alignment and select the best alignment result , that is: use different methods to count the scores of the non-CDR3 and CDR3 regions, select the best hit with the highest score, and determine the attribution of the sequenced sequence by aligning with the CDR, V, and J reference sequences, so as to determine the CDR sequence and VJ sequence. of.

After obtaining the sequences of V and J genes, the structure of immune molecules is analyzed. This part mainly includes two functions: error correction and region determination. First, the errors introduced in PCR and sequencing were corrected by self-developed programs, and then the CDR regions were determined using the rules of V/J gene reference sequences and conserved amino acids and the established computational methods.

According to the embodiments of the present invention, the CDR sequence can be determined by a common method. According to an embodiment of the present invention, the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence. Because CDR3 has the greatest variation, it directly determines the antigen-binding specificity of TCR. The CDR3 of TCR is encoded by three genes V, D, and J. During the maturation of lymphocytes, various recombinant sequence fragments are formed through the rearrangement of V, D, and J genes, plus DNA base SNP, Indel Mutations create a diversity of T cells.

The term "V/J" as used herein refers to at least a portion of the result of a V(D)J rearrangement for a particular cell, which may be a V gene sequence, a J gene sequence, or a V gene sequence. The combination of the gene sequence and the J gene sequence may also sandwich the D gene sequence between the V gene sequence and the J gene sequence.

S300 Determine statistical characteristics

Statistical features are determined based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index, number of immune cell types, immune Cell Homogeneity Index.

According to an embodiment of the present invention, at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index. According to an embodiment of the present invention, the type of immune cells is determined based on the CDR3 sequence.

According to an embodiment of the present invention, the immune cell homogeneity index is the Gini index.

According to an embodiment of the present invention, the immune repertoire feature data is counted, and the statistical features mainly include the following:

V/J gene usage diversity, i.e. Shannon_index(V-J);

Immune diversity, i.e. Shannon_index(CDR3_aa);

Immune cell type, i.e. Uniq_number (CDR3_aa);

Immune cell homogeneity, i.e. Clone_Gini.

Among the above indicators, Shannon_index represents the Shannon index, and the calculation formula is as follows:

Among them, if CDR3 is taken as an example, S represents the total number of unique CDR3s, and p(i) represents the frequency of CDR3s.

Uniq_number represents the unique sequence number.

Clone_Gini represents the Gini index, and the calculation formula is as follows:

Among them, x refers to the frequency of each immune cell type, and n refers to the number of immune cell types.

S400 determines the immune age value

In this step, based on the statistical characteristics, the immune age value of the individual is determined.

According to an embodiment of the present invention, the immune age value is determined based on at least one statistical feature using a maximum a posteriori probability estimate.

According to an embodiment of the present invention, in step S400, it further includes: (4-1) using a predetermined immune age prediction coefficient distribution (mainly according to the characteristics of the selected feature to determine the parameter prior distribution, if the selected feature is continuous, in In the case of a large amount of data, it is generally considered to be a normal distribution), based on each statistical feature, determine the immune age prediction coefficient corresponding to each statistical feature; and (4-2) According to the formula

Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, θi represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, θ0 represents the bias term in the prediction model.

For the convenience of understanding, the principle of maximum posterior probability estimation is explained as follows:

According to an embodiment of the present invention, based on the above characteristic indices, combined with biochemical indices, the MAP (maximum a posteriori probability estimate, maximum a posteriori probability estimation) model is used to perform IA calculation, so as to perform comprehensive immunity assessment and body risk prediction. The specific principles are as follows :

The theoretical basis of MAP is derived from the Bayesian model, and the Bayesian formula is as follows:

The following formula is obtained by expanding the B time by the full probability formula:

Among them, ~A means "not A",

Biochemical indicators mainly include conventional indicators, such as macrobiochemical, blood routine and so on.

The principle of MAP is as follows:

The maximum posterior probability assumes that under a given observation index x, the value of the prediction parameter θ is assumed, and if f is the sampling distribution of x, then f(x|θ) is the probability that the observed value is x when the parameter θ is given. Assuming that g is the prior distribution of the parameter θ (which can be obtained from the training data), then according to the Bayesian formula, there are:

Among them, the training data is mainly based on the characteristics of the selected features to determine the prior distribution of the parameters. If the selected features are continuous, in the case of a large amount of data, it is generally considered to be a normal distribution. If it is discrete, it is directly weighted according to the formula below. Just multiply. The selected members of the training set mainly include some indicators (V/J gene usage diversity, immune diversity, immune cell type, immune cell homogeneity) obtained from immune repertoire analysis and some biochemical indicators (large biochemical, blood routine, etc.).

in,

is the parameter space of θ, since the parameter space

is continuous, so the denominator is calculated as an integral, then:

in

The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). If the observed value is n-dimensional (ie x=(x ₁ ,x ₂ ,...,x _n )), then

The prediction formula of IA is as follows:

S500 Determines the Immunity Index

In this step, the immunity index of the individual is determined based on the immune age value.

According to an embodiment of the present invention, the immunity index is determined by the following formula:

Wherein, IA represents the immune age value determined in step S400, IAmax represents the upper limit of IA in the predetermined group, and IAmin represents the lower limit of IA in the predetermined group.

After determining the immune index of the individual, the technical solution can realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and can realize non-invasive early diagnosis, curative effect evaluation, disease tracking, recurrence prediction and comprehensive immunity. Evaluate.

According to the embodiments of the present invention, the method of the present invention can be implemented by using a small amount of samples by sequencing, so as to realize the high-sensitivity detection of the individual adaptive immune system at the molecular level, and realize non-invasive early diagnosis, curative effect evaluation, Condition tracking, relapse prediction and comprehensive immune assessment. For example, according to the embodiments of the present invention, the PCR technology can be used to amplify the genes contained in lymphocytes in peripheral blood, which requires less blood samples, and the subsequent processing of the samples is simple, and no inaccurate human well blood cell observation technology is required, and no operation is required. Sophisticated immunolabeling and flow analysis. For myeloma test, because only peripheral blood needs to be taken, no bone marrow puncture is required, which can reduce the damage to the patient's body, which has positive significance. In a word, according to the embodiment of the present invention, immune evaluation by immune repertoire sequencing can not only improve the sensitivity of detection, but also realize functions such as early diagnosis, evaluation of curative effect, tracking of disease condition, prediction of recurrence, and comprehensive evaluation of immunity.

In a second aspect of the present invention, the present invention provides a device for determining an individual immunity index. According to an embodiment of the present invention, referring to FIG. 3 , the device includes:

The sequencing data acquisition unit 100 , the sequencing result analysis unit 200 , the statistics unit 300 , the immune age determination unit 400 and the immune index determination unit 500 . The sequencing data acquisition unit 100 is used to acquire nucleic acid sequencing data of the individual to be tested; the sequencing result analysis unit 200 is used to determine the V/J sequence and CDR sequence contained in the nucleic acid sample by comparing the sequencing result with the reference sequence Statistical unit 300 for determining statistical features based on the V/J sequences and CDR3 sequences contained in the nucleic acid sample, the statistical features including at least one selected from the following: V/J gene usage diversity index, immune cell diversity index , the number of immune cell types, and the immune cell homogeneity index; the immune age determination unit 400 is used to determine the immune age value of the individual based on the statistical characteristics; the immune index determination unit 500 is used to determine the immune age value of the individual based on the immune age value. index.

According to an embodiment of the present invention, referring to FIG. 4 , the sequencing data acquisition unit further includes: a nucleic acid sample acquisition module 110 , a first amplification module 120 and a second amplification module 130 , and a sequencing module 140 . Among them, according to the embodiment of the present invention, the nucleic acid sample acquisition module 110 is used to acquire nucleic acid samples of the individual to be tested, and the nucleic acid samples include at least one of DNA molecules and RNA molecules; the first amplification module 120 is used to use VJ specific The first amplification process is performed on the primers to obtain the first amplification product; the second amplification module 130 is used for performing the second amplification process on the first amplification product to obtain the second amplification product, wherein the first amplification product is The second amplification product carries a sequencing adapter; the sequencing module 140 is used to sequence the second amplification product so as to obtain a sequencing result; the nucleic acid sample is obtained from an individual's blood or tissue sample.

According to an embodiment of the present invention, the VJ-specific primer contains a portion of the sequence of the sequencing adapter.

According to an embodiment of the present invention, the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.

According to an embodiment of the present invention, at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.

According to an embodiment of the present invention, the type of immune cells is determined based on the CDR3 sequence.

According to an embodiment of the invention, the immune age determination unit is adapted to determine the immune age value based on the at least one statistical feature using a maximum a posteriori probability estimate.

According to an embodiment of the present invention, the immune age determination unit is configured to: using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each statistical feature; and according to the formula

Determine the immune age of the individual, where IA represents the immune age of the individual, i represents the number of statistical features, n represents the number of statistical features, θi represents the immune age prediction coefficient corresponding to the ith statistical feature, and xi represents the ith statistical feature The numerical value of the feature, θ0 represents the bias term in the pre-prediction model.

Wherein, IA represents the immune age value determined in the immune age determination unit, IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.

In a third aspect of the present invention, the present invention provides an electronic device, according to an embodiment of the present invention, comprising a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions. Instructions to implement the preceding method of determining an individual's immunity index.

In a fourth aspect of the present invention, the present invention provides a machine-readable storage medium. According to an embodiment of the present invention, the machine-readable storage medium stores machine-executable instructions, and the machine-executable instructions are called by a processor when the and when executed, the machine-executable instructions cause a processor to implement any of the preceding methods of determining an immunity index of an individual.

Example 1:

1. Sequencing data acquisition

Collect 5 mL of peripheral blood from 1000 volunteers, extract the DNA from the peripheral blood samples using a DNA extraction kit, and amplify the DNA samples using V gene and J gene specific primers with partial sequencing adapters in order to obtain DNA samples with V gene samples and J gene samples of partially sequenced adapters.

For the obtained amplified samples, primers with sequencing adapters are used to further amplify and build a library, and the sequencing library is subjected to high-throughput sequencing.

2. Sequencing data analysis

After the data is off the computer, the sequencing data is analyzed as follows:

(1) SOAPnuke (v1.5.3) was used to perform junction contamination sequences, low-quality bases and sequences (filtered according to the average quality value of the bases in the sequence and the proportion of the number of N bases contained in the sequence) on the original sequencing data. , "the base quality value of the read is less than or equal to 20", "the number of N bases is greater than or equal to 5", the two satisfy one or all of them are filtered out);

(2) Convert the FASTQ file to a FASTA file;

(3) Use blastall (v2.2.25) to align the pretreated FASTA sequence to the V(D)J reference gene sequence, and perform multiple alignments to select the best alignment result;

(4) Perform structural analysis (error correction and region determination) on the aligned sequence data, and use the BGI gene structure analysis program to correct the errors introduced in PCR and sequencing. The regularity of amino acids and the established computational method determine the CDR3 region.

3. Indicator statistics and forecasting

Statistics on immune repertoire feature data, and immunity prediction and analysis based on self-developed models.

Statistical features mainly include the following:

V/J gene usage diversity, i.e. Shannon_index(V-J);

Immune diversity, i.e. Shannon_index(CDR3_aa);

Immune cell type, i.e. Uniq_number (CDR3_aa);

Immune cell homogeneity, i.e. Clone_Gini.

Uniq_number represents the unique sequence number.

Based on the above characteristic indexes, combined with blood routine biochemical indexes, the MAP (maximum a posteriori probability estimate) model was used for IA calculation, so as to conduct comprehensive immunity assessment and body risk prediction.

in

The prediction formula of IA is as follows:

Determine immunity:

Based on the predicted IA, combined with the population distribution characteristics, the individual immunity Immune Index (II) is finally determined. The specific model is as follows:

where IA represents the immune age of the predicted sample, and IA _max and IA _min represent the upper and lower bounds in the population distribution, respectively.

Example 2:

1. Sequencing data acquisition

Collect 5 mL of peripheral blood from 439 volunteers, extract the DNA from the peripheral blood samples using a DNA extraction kit, and amplify the DNA samples using V gene and J gene specific primers with partial sequencing adapters in order to obtain DNA samples with V gene samples and J gene samples of partially sequenced adapters.

2. Sequencing data analysis

After the data is off the computer, the sequencing data is analyzed as follows:

(2) Convert the FASTQ file to a FASTA file;

3. Indicator statistics

Statistical data on immune repertoire characteristics mainly include the following three:

Immune diversity, i.e. Shannon_index(CDR3_aa);

Immune cell type, i.e. Uniq_number (CDR3_aa);

Sequence diversity, i.e. Uniq_number(seq_aa).

Uniq_number represents the unique sequence number.

4. Preprocessing

Remove 3 samples with missing values.

5. Model training

Divide the remaining 436 samples into 3 groups by age (20-30 years old, 30-50 years old, >50 years old), randomly select 75% of the samples from each group and combine them into the training set, the remaining 111 samples as a test set.

Using the training set, based on the above three immune repertoire characteristic indices, the MAP (maximum a posteriori probability estimate, maximum a posteriori probability estimation) model was used for IA calculation, so as to conduct comprehensive immunity assessment and body risk prediction. The training process of the model parameters is as follows:

in

The parameter to maximize the function f(x|θ)g(θ), that is, the coefficient of predicting Immune Age (IA). Here the observations are 3-dimensional (ie x=(x ₁ , x ₂ , x ₃ )), then

Based on the trained parameters, the prediction formula of IA is obtained:

Based on the predicted IA, combined with the population distribution characteristics, the individual immunity Immune Index (II) was finally determined. The specific formula is as follows:

6. II prediction results

As can be seen from Figures 5 and 6, the immunity index showed a downward trend with increasing age. Although the sample size of the age group greater than 50 is small, the decline trend of the immunity index shown in Figure 6 is not obvious, but the decline trend of the immunity index shown in Figure 5 is more obvious. Therefore, the results of this example show that the immunity index can be used as an index for evaluating the health index.

Although the embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Embodiments are subject to variations, modifications, substitutions and variations.

Claims

A method for determining an individual immunity index, characterized in that it comprises:

(1) Obtain nucleic acid sequencing data of the individual to be tested;

(2) by aligning the sequencing result with the reference sequence, determine the V/J sequence and the CDR sequence contained in the nucleic acid sample;

(3) Determine statistical features based on the V/J sequences and CDR sequences contained in the nucleic acid sample, and the statistical features include at least one selected from the following: V/J gene usage diversity index, immune cell diversity index , number of immune cell types, immune cell homogeneity index;

(4) determining the immune age value of the individual based on the statistical characteristics; and

(5) Determine the immunity index of the individual based on the immune age value.
The method according to claim 1, wherein the sequencing data is obtained by the following steps:

(1-1) Obtaining a nucleic acid sample of the individual to be tested, the nucleic acid sample includes at least one of a DNA molecule and an RNA molecule;

(1-2) using VJ-specific primers for the first amplification process to obtain the first amplification product;

(1-3) performing a second amplification process on the first amplification product to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter;

(1-4) Sequencing the second amplification product to obtain a sequencing result;

The nucleic acid sample is obtained from a blood or tissue sample of the individual.
The method of claim 1, wherein the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
The method according to claim 1, wherein the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences.
The method of claim 4, wherein the CDR sequence is a CDR3 sequence.
The method according to claim 1, wherein at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
The method of claim 1, wherein the type of the immune cell is determined based on the CDR3 sequence.
The method according to claim 1, wherein the immune cell homogeneity index is a Gini index.
The method of claim 1, wherein the immune age value is determined using a maximum a posteriori probability estimate based on at least one of the statistical features.
The method according to claim 1, characterized in that, in step (4), further comprising:

(4-1) Using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, respectively determine the immune age prediction coefficient corresponding to each of the statistical features; and

(4-2) According to the formula
determining the immune age of said individual,

Wherein, IA represents the immune age of the individual, i represents the number of the statistical feature, n represents the number of the statistical feature, θi represents the immune age prediction coefficient corresponding to the i-th statistical feature, and xi represents the i-th statistical feature The numerical values of the statistical features, θ0 represents the bias term in the prediction model.
The method of claim 10, wherein the immunity index is determined by the following formula:

Wherein, IA represents the immune age value determined in step (4), IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.
A device for determining an individual immunity index, characterized in that it includes:

A sequencing data acquisition unit for acquiring nucleic acid sequencing data of the individual to be tested;

A sequencing result analysis unit for determining the V/J sequence and the CDR sequence contained in the nucleic acid sample by comparing the sequencing result with a reference sequence;

A statistical unit for determining statistical features based on the V/J sequences and CDR sequences contained in the nucleic acid sample, the statistical features including at least one selected from the following: V/J gene usage diversity index, immune cell diversity Sex index, number of immune cell types, immune cell homogeneity index;

an immune age determination unit for determining an immune age value of the individual based on the statistical characteristics; and

An immunity index determination unit, configured to determine the immunity index of the individual based on the immune age value.
The device according to claim 12, wherein the sequencing data acquisition unit further comprises:

a nucleic acid sample acquisition module, used for acquiring a nucleic acid sample of an individual to be tested, the nucleic acid sample comprising at least one of DNA molecules and RNA molecules;

a first amplification module for performing a first amplification process using VJ-specific primers, so as to obtain a first amplification product;

a second amplification module, configured to perform a second amplification process on the first amplification product, so as to obtain a second amplification product, wherein the second amplification product carries a sequencing adapter;

a sequencing module for sequencing the second amplification product, so as to obtain a sequencing result;

The nucleic acid sample is obtained from a blood or tissue sample of the individual.
The apparatus of claim 12, wherein the VJ-specific primer contains a portion of the sequence of the sequencing adapter.
The device according to claim 12, wherein the CDR sequence is at least one of CDR1, CDR2 and CDR3 sequences, preferably a CDR3 sequence.
The device according to claim 12, wherein at least one of the V/J gene usage diversity index and the immune cell diversity index is a Shannon index.
The device of claim 12, wherein the type of immune cells is determined based on the CDR3 sequence.
The device according to claim 12, wherein the immune cell homogeneity index is a Gini index.
13. The apparatus of claim 12, wherein the immune age determination unit is adapted to determine the immune age value based on at least one of the statistical features using a maximum a posteriori probability estimate.
The device according to claim 12, wherein the immune age determination unit is used for:

Using a predetermined distribution of immune age prediction coefficients, based on each of the statistical features, determine the immune age prediction coefficient corresponding to each of the statistical features, respectively; and

According to the formula
determining the immune age of said individual,

Wherein, IA represents the immune age of the individual, i represents the number of the statistical feature, n represents the number of the statistical feature, θi represents the immune age prediction coefficient corresponding to the i-th statistical feature, and xi represents the i-th statistical feature The numerical values of the statistical features, θ0 represents the bias term in the prediction model.
The device of claim 20, wherein the immunity index is determined by the following formula:

Wherein, IA represents the immune age value determined in the immune age determination unit, IAmax represents the upper limit of IA in the predetermined population, and IAmin represents the lower limit of IA in the predetermined population.
An electronic device, characterized by comprising a processor and a memory, wherein the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement claims 1- The method for determining an individual immunity index according to any one of 11.
A machine-readable storage medium, characterized in that the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are invoked and executed by a processor, the machine-executable instructions cause the processor to implement claim 1 - The method for determining an individual immunity index according to any one of 11.