CN112133372B - Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR - Google Patents

Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR Download PDF

Info

Publication number
CN112133372B
CN112133372B CN202010828988.6A CN202010828988A CN112133372B CN 112133372 B CN112133372 B CN 112133372B CN 202010828988 A CN202010828988 A CN 202010828988A CN 112133372 B CN112133372 B CN 112133372B
Authority
CN
China
Prior art keywords
sequence
specific
antigen
tcr
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010828988.6A
Other languages
Chinese (zh)
Other versions
CN112133372A (en
Inventor
任树成
宋瑾
梅博源
张恒辉
沈宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhenzhi Medical Technology Co ltd
Original Assignee
Beijing Zhenzhi Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhenzhi Medical Technology Co ltd filed Critical Beijing Zhenzhi Medical Technology Co ltd
Priority to CN202010828988.6A priority Critical patent/CN112133372B/en
Publication of CN112133372A publication Critical patent/CN112133372A/en
Application granted granted Critical
Publication of CN112133372B publication Critical patent/CN112133372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Abstract

The invention discloses a method for establishing an antigen-specific TCR database and a method for evaluating an antigen-specific TCR, wherein the method for establishing the antigen-specific TCR database comprises the following steps: collecting a T cell sample, and separating out specific T cell clones in the T cell sample; sequencing the specific T cell clone to obtain CDR3 beta sequence information of the specific T cell clone; filtering the specific T cell clone; trimming the sequence of the CDR3 beta region of the specific T cell clone to obtain a first specific T cell clone; obtaining sequences in the TCR public database through the TCR public database, and scoring the sequences according to the credibility; selecting a sequence with the credibility score larger than a preset value, and trimming the sequence of the CDR3 beta region to obtain a second type of specific T cell clone; an antigen-specific TCR database was established based on the first and second specific T cell clones. The database created by the embodiments of the present invention can be used as a reference database for antigen-specific TCR analysis.

Description

Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR
Technical Field
The invention relates to the field of biotechnology, in particular to a method for establishing an antigen-specific TCR database and a method for evaluating an antigen-specific TCR.
Background
Lymphocytes exert immune functions by recognizing specific antigens through their surface antibodies. The specificity of the antigen recognition is embodied in clone level, namely, lymphocytes of the same clone can recognize the same antigen receptor and the same antigen epitope. The T cell antigen receptor (TCR) is a structure that T cells specifically recognize and bind to antigen peptide-MHC molecules, and is a heterodimer, consisting of two distinct subunits. The receptors of 95% of T cells are composed of an alpha subunit and a beta subunit, and the other 5% of receptors are composed of a gamma subunit and a delta subunit, the proportions of which may vary due to ontogeny or disease.
TRB is the gene locus encoding TCR beta, which comprises 4 gene segments of a variable segment (V), a multiple segment (D), a connecting segment (J) and a constant region (C), V, D, J, C is divided into a plurality of alleles, gene recombination occurs in the TRB gene locus in the T cell development process, molecular basis is provided for the diversity of TCR beta, and in addition, through mechanisms such as random insertion and deletion of bases near the gene recombination sites, the diverse TCR is finally generated, so as to meet the requirement of an organism for identifying various antigens.
The CDR3 region of TCR beta is composed of the nucleotide sequence inserted between the tail end of V gene segment, D gene segment, the front end of J gene segment and V, D, J. The rearrangement of the TCR beta chain in preference to the TCR alpha chain in an allelic exclusion manner can better reflect the TCR characteristics of T cells. In addition, the variable region of each subunit of the TCR comprises three highly variable Complementarity Determining Regions (CDRs), the most important CDR3 being responsible for direct binding to polypeptides presented by the MHC, and the sequence being highly variable, so the diversity of the TCR is determined primarily by CDR 3. By sequencing the CDR3 region of TCR beta, the parameters such as diversity of TCR beta in an immune repertoire can be evaluated, and the response generation mechanism and process of an immune system can be further analyzed. Thus, the identification of antigen-specific T cell receptors has also often focused on the detection of the β chain CDR3 coding sequence.
T cell-mediated cellular immunity is one of the adaptive immunizations and is an important pathway for the immune system to recognize and eliminate pathogens and tumor cells. Adoptive T cell Therapy (ACT for short) for repairing the immune surveillance, defense and regulation functions of the body by enhancing T cell mediated adaptive immune response overcomes the limitations of the conventional treatment techniques. ACT therapies include Tumor Infiltrating T cell (TIL) therapy, CAR-T cell therapy, and TCR-T (T cell receptor-gene engineered T cells) therapy. The TCR-T therapy is the latest technology in the field of ACT at present, is a means of genetic engineering to directly modify the T cell to recognize the surface receptor TCR of the tumor antigen, thereby strengthening the capability of the T cell to recognize and kill the tumor cell, is widely concerned and becomes a research hotspot, is proved to have good curative effect in the carried clinical test, and the research result is disputed and reported by top-level journals such as Science, Nature and the like. The therapeutic potential of this therapy for infectious diseases and specific viruses has also been explored. Such as HIV, HBV and COVID-19, can be target points of TCR-T, and can be used for treating and controlling infectious diseases as an innovative therapy. In the case of HBV infection, current antiviral therapy merely inhibits viral replication and cures less than 5% of patients. Treatment of these patients with an anti-viral drug in combination with CAR/TCR-T cells may be a viable option. The prior art shows that methods of genetically modifying CAR/TCR-T cells using mRNA electroporation limit their functional activity to a short time, thus providing enhanced safety profiles suitable for use in patients with chronic viral diseases.
The identification of the antigen-specific TCR is the most key step of TCR-T therapy and drug development, and the method for identifying the high-throughput antigen-specific TCR can greatly shorten the open cycle of the TCR-T therapy and reduce the cost.
The important basis for their role, whether tumor neoantigen vaccines that specifically target tumor specific antigens, TCR-T antigen specific or other pathogen antigen specific immune cell therapies, or checkpoint inhibitors (ICIs) targeting immune checkpoints, is to mobilize T cells to recognize antigens, triggering tumor specific killing responses. In the course of immunotherapy, the expansion, persistence and decline of antigen-specific T cell clones can intuitively reflect the effect of immunotherapy. Therefore, the identification method of the antigen-specific TCR can not only test whether the immunotherapy effectively stimulates the antigen-specific T cell response, but also be used for monitoring the change of the specific T cell in the immunotherapy process, and has the potential value of the concomitant diagnosis of the immunotherapy.
In the prior art, a 5' RACE and a nested PCR method are used for amplifying, building a library, sequencing and analyzing and comparing sequencing results of CDR3 regions of TCR beta. The method of 5' RACE and nested PCR is characterized in that RNA is used as a starting material for immobilization, the sample type is single, the operation requirement is relatively high compared with other technologies, the whole process is complicated, and the repeatability can be influenced. Furthermore, only a total repertoire of T cells in the test material can be obtained, and detection of specific TCRs cannot be achieved.
Disclosure of Invention
The invention mainly aims to provide a method for establishing an antigen-specific TCR database and a method for evaluating an antigen-specific TCR, so as to solve the problem that the prior art lacks a technology for effectively detecting and identifying the antigen-specific TCR.
According to one aspect of the present invention, there is provided a method of building an antigen-specific TCR database, comprising: collecting a T cell sample, and separating out specific T cell clones in the T cell sample; sequencing the specific T cell clone to obtain CDR3 beta sequence information of the specific T cell clone; filtering the specific T cell clones according to the frequency and ranking of the specific T cell clones in the T cell sample; trimming the CDR3 beta region sequence of the specific T cell clone after filtration treatment to obtain a first specific T cell clone with a consistent sequence format; obtaining sequences in the TCR public database through the TCR public database, and scoring the sequences with credibility, wherein the credibility scoring rule comprises the following steps: a sequencing method scoring rule, a TCR-pMHC first identification method scoring rule and a T cell specificity identification method scoring rule; selecting a sequence with the credibility score larger than a preset value, and trimming the sequence of the CDR3 beta region to obtain a second type of specific T cell clone with consistent sequence format; establishing an antigen-specific TCR database based on the first and second specific T cell clones.
Wherein the sequences in the antigen-specific TCR database comprise warehousing basic information and statistical basic information; the warehousing basic information further comprises: antigen gene name, HLA type, antigen epitope amino acid sequence, trimmed CDR3 beta amino acid sequence and VDJ gene; the statistical basic information further includes: frequency occupied by CDR3 beta, VJ ratio, CDR3 partial length and distribution.
Wherein said step of isolating specific T cell clones within said T cell sample comprises: labeling specific T cells in the T cell sample with HLA-restricted MHC-epitope peptide complexes; isolating specific T cell clones within the T cell sample using flow cytometric sorting techniques or magnetic bead cell sorting techniques.
Wherein the step of scoring the sequence for confidence comprises: the sequencing method scoring rule comprises the following steps: 3X score if single cell sequencing is included; 2X score if amplicon sequencing; 2X score if sanger sequencing is performed but two or more cell sequencing results exist, and 1X score if only one cell is detected; if no sequencing method is provided, no score is obtained; wherein the sequencing score is set as a; the TCR-pMHC complex extraction scoring rules comprise: if the frequency is more than 0.1 based on cell sorting, the score is 1X; if the frequency is more than 0.5 based on cell culture, the rate is 1X; if no complex extraction method is provided, no score is obtained; wherein the TCR-pMHC complex extraction method is set to obtain a score b; the T lymphocyte specificity identification scoring rule comprises the following steps: if the direct identification method is included, the score is 3; if a target antigen stimulation method is adopted, 2X points are obtained; if only a dyeing method is adopted, the rate is 1X; wherein the T lymphocyte specificity identification score is set as c; wherein X is an integer, and the credibility score is the fraction of the smaller one of the sum of a and b plus c.
According to another aspect of the present invention, there is also provided a method of evaluating an antigen-specific TCR, comprising: obtaining a target CDR3 beta sequence; obtaining a reference sequence through the database; comparing the target CDR3 beta sequence with the reference sequence, and obtaining the antigen recognition comprehensive index of the target CDR3 beta sequence according to the comparison result; outputting an antigen-specific TCR in the target CDR3 β sequence according to the antigen recognition composite index.
Wherein the step of aligning the target CDR3 β sequence to the reference sequence comprises: calculating the Hamming distance between the target CDR3 beta sequence and the reference sequence to obtain a sequence similarity score; calculating the amino acid similarity between the target CDR3 β sequence and the reference sequence to obtain an amino acid similarity score; generating an antigen contact site weighting vector from amino acids at different positions of the target CDR3 β sequence, wherein the weight at an antigen contact site position is greater than the weight at a non-antigen contact site position; and obtaining an alignment result according to the sequence similarity score, the amino acid similarity score and the weighting vector of the position of the amino acid.
Wherein the step of obtaining the antigen recognition comprehensive index of the target CDR3 beta sequence according to the comparison result comprises: and obtaining the antigen recognition comprehensive index according to the sum of the amino acid similarity score and the weighted vector of the position of the amino acid and the sequence similarity score.
Wherein, the level of the antigen recognition comprehensive index is inversely proportional to the similarity of the sequence.
Wherein, the Hamming distance is adopted to calculate the difference of the target CDR3 beta sequence and the reference sequence.
Wherein the amino acid similarity score is calculated using a Blosum matrix.
The antigen-specific TCR database established according to the embodiment of the invention can be used as a reference database for antigen-specific TCR analysis; furthermore, by evaluating the similarity between the CDR3 β sequence of any TCR and the CDR3 β sequence in the antigen-specific TCR database constructed in the present invention, it can be determined whether a certain TCR recognizes a specific antigen.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a method of building an antigen-specific TCR database according to an embodiment of the invention;
FIG. 2 is a flow chart of a method of assessing an antigen-specific TCR in accordance with an embodiment of the invention;
FIG. 3 is a schematic diagram of a Blosum90 matrix according to an embodiment of the invention;
FIGS. 4A-4D are schematic representations of comparison of AFP and MAGEA1 antigen-specific TCRs according to an embodiment of the invention;
FIGS. 5A and 5B are schematic illustrations of AFP and MAGEA1 tumor-specific CD8+ MHC multimer + T cell flow assays according to embodiments of the invention;
FIG. 6 is a schematic representation of flow detection of CMV-positive donor CMV-specific CD8+ MHC multimer + T cells in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
According to an embodiment of the present invention, a method for building an antigen-specific TCR database is provided, in which antigen-specific TCRs are mainly derived from two sources, namely public data and self-building data. Referring to fig. 1, the method includes the steps of:
step S102, collecting a T cell sample, and separating out specific T cell clones in the T cell sample.
Specifically, for cell samples with the potential to contain TCR specific for potential antigens of interest, specific T cells therein are labeled with HLA (human leukocyte antigen) defined MHC-epitope peptide complexes, and then specific T cell clones within the T cell sample are isolated using flow cytometric sorting (FACS) or magnetic bead cell sorting (MACS) techniques. In which the presence or absence of potential target antigen-specific TCRs in a T cell sample can be determined using known techniques.
And step S104, sequencing the specific T cell clone to obtain CDR3 beta sequence information of the specific T cell clone.
The CDR3 β sequence of the specific T cell clone described above can be determined by, but is not limited to, sequencing methods known in the art for determining the CDR3 β sequence of the specific T cell clone. For example, a second generation sequencing method (NGS) or a first generation Sequencing (SANGER) method. High-throughput Sequencing technology is a revolutionary change to conventional Sequencing, and sequences hundreds of thousands to millions of DNA molecules at a time, so it is called Next Generation Sequencing (NGS) in some literatures, and high-throughput Sequencing makes it possible to perform detailed global analysis on transcriptome and genome of a species, so it is also called deep Sequencing.
And S106, filtering the specific T cell clone according to the frequency and the ranking of the specific T cell clone in the T cell sample.
Wherein, the specific T cell clone, may be multiple or one, is filtered by the frequency and ranking of the T cell clone in the sample, and can ensure that the real clone of the specific T cell is retained to the maximum extent. For example, the filtering condition may be set to > 10% frequency of a single clone and the clone frequency is ranked top five within a single sample; or the filtering condition can also be set to > 20% frequency of a single clone and the clone frequency is ranked top five within a single sample.
And step S108, trimming the sequence of the CDR3 beta region of the specific T cell clone after the filtration treatment to obtain the first specific T cell clone with consistent sequence format.
Wherein, the sequence of the trimmed CDR3 region can be a sequence segment between C104(C) amino acid and F118(F) amino acid which are defined by IMGT.
The self-created data is obtained through the above steps S102-S108.
In step S110, the TCR sequence in the TCR public database is obtained.
In the embodiment of the application, the TCR public database can be three public databases of VDJdb, McPAS-TCR, TCR3d and the like, and references which are not included in the databases.
Step S112, performing credibility scoring on the TCR sequences obtained through the TCR public database, where the credibility scoring rule includes: a sequencing method scoring rule, a TCR-pMHC first identification method scoring rule and a T cell specificity identification method scoring rule.
Because the standards and the quality of the sequence data of the public data sources are inconsistent, the data needs to be washed first. For example, the sequences in three public databases of VDJdb, McPAS-TCR, TCR3d and a plurality of literature reports which are not included in the databases, the items with incomplete information, contradiction and repetition are combed and screened, and finally the items which meet the warehousing standards are reserved.
And after the sequence is subjected to data washing, the reliability of the sequence data is scored. Specifically, the credibility scoring rule mainly comprises three parts: a sequencing method scoring rule, a TCR-pMHC first identification method scoring rule and a T cell specificity identification method scoring rule.
(1) Sequencing scoring rules.
For data that can determine the sequencing method, this item is scored according to the sequencing method. For example, a score of 3 if single cell sequencing is involved; 2 points if sequencing for amplicon; in the case of sanger sequencing, but there are two or more cell sequencing results, also score 2, in the case of only one cell, score 1; if no sequencing method is provided, no score is obtained. This score is denoted as the a-score.
(2) TCR-pMHC complex extraction scoring rules.
This term was scored according to the extraction method for the sequence that could determine the method of extraction of the TCR-pMHC complex. For example, if based on cell sorting and the frequency is greater than 0.1, a score of 1 is obtained; if the frequency is more than 0.5 based on cell culture, 1 point is obtained; if no complex extraction method is provided, no score is given. This score is denoted as the b-score.
(3) T lymphocyte specific identification scoring rules.
The sequences that define the specific identification method for T lymphocytes are scored according to their concept and method. This is scored 3 if direct identification is included, such as Protein Data Bank or other means of directly identifying TCR-pMHC binding; if a target antigen stimulation method is adopted, the score is 2; if only dyeing is used, this is given a score of 1. This score is denoted as the c-score.
The final fraction is: the sum of the a score, the b and c scores, the smaller of the two (the final score is capped at 3 points). It should be noted that the scores of the above three scoring rules are only exemplary, and the scores in other embodiments may also be integer multiples of the above scores.
And step S114, selecting sequence data with the credibility score larger than a preset value, and trimming the sequence of the CDR3 beta region of the sequence data to obtain a second type of specific T cell clone with consistent sequence format.
The sequences are filtered, e.g., sequences with confidence scores greater than or equal to 2 are selected, and then trimmed to trim the sequences in the CDR3 region, e.g., to preserve the sequence segments defined by IMGT from amino acid C104(C) to amino acid F118(F), so that the sequence formats obtained from the TCR public database are consistent, and so that the public data are consistent with the sequence formats of the self-created data.
The public data is available through steps S110-S114.
Step S116, an antigen-specific TCR database is established based on the first and second specific T cell clones, which can be used as a reference database for antigen-specific TCR analysis.
In the examples of the present application, the classification of specific T cell clones into first and second types of specific T cell clones does not mean that the data types thereof are inconsistent, but rather that the data sources thereof are inconsistent. Wherein, the specific T cell clone with the data source as self-established data is represented by a first specific T cell clone, and the specific T cell clone with the data source as public data is represented by a second specific T cell clone.
In the embodiment of the present application, the data in the antigen-specific TCR database is normalized annotated and counted according to the basic warehousing information and the basic statistical information standard, with the clonotypes as the unit. The basic information of the warehouse entry comprises an antigen gene name, an HLA type, an epitope amino acid sequence, a trimmed CDR3 beta amino acid sequence, a VDJ gene and the like; the statistical basic information includes the occupied frequency, the VJ ratio, the length and the distribution of the CDR3 part and the like. And matching the results to form complete data content, and summarizing the processed data content in a corresponding tumor specific antigen database.
Referring to fig. 2, there is also provided a method for evaluating an antigen-specific TCR, which comprises:
step S202, a target CDR3 β sequence is obtained.
Specifically, the target CDR3 β sequence is extracted and the sequence is trimmed, leaving a sequence segment from the third amino acid position to the first two amino acids position F118 from position C104 defined by IMGT, which is the most common segment of the domain.
Step S204, acquiring a reference sequence through the database; wherein the database is the antigen-specific TCR database described above, and all or part of the antigens in the database can be selected as reference sequences for alignment.
Step S206, the target CDR3 beta sequence is compared with the reference sequence, and the antigen recognition comprehensive index of the target CDR3 beta sequence is obtained according to the comparison result.
Specifically, the sequence alignment mainly comprises: sequence similarity analysis, amino acid similarity analysis, and antigen contact site weighting, described in detail below.
(1) And (5) analyzing sequence similarity.
The Hamming distance between two character strings with equal length is the number of different characters at the corresponding positions of the two character strings. In the embodiment of the present application, the difference between sequences is calculated by using Hamming distance. Specifically, the sequence similarity score is obtained by calculating the Hamming distance between the target CDR3 β sequence and the reference sequence. This step is used to give a quantitative measure of the difference between the calculated target and reference sequences, which is indicative of sequence-to-sequence variability.
In the algorithm of Hamming distance calculation, the upper limit of the number of amino acid differences between the target sequence amino acid and the reference sequence amino acid is set to 2. In the case of an amino acid addition or deletion mutation, a position where an amino acid is present after aligning other amino acid positions is marked with a null value.
(2) And (4) analyzing amino acid similarity.
The Blosum series matrix is used for scoring the similarity of amino acid sequences, and the number of the Blosum series matrix represents the consistency of sequences from which the matrix is constructed. In principle, the more closely related sequences are scored using the Blosum matrix with the higher number, and since the algorithm is directed to human TCR sequences with higher similarity, the Blosum90 scoring matrix is preferably used to score amino acid sequences (see FIG. 3). In different amino acid sequences, similar amino acid substitutions do not tend to result in a significant change in specificity, in which case the score will be closer to zero, and vice versa; if the calibrated position is null, then the maximum value is assigned within the selected range. This score is defined as the amino acid similarity score. Within this algorithm, the default distance score range may be 0 to 5.
(3) Antigen contact site weighting.
The molecular basis of the T cell capable of recognizing specific antigen is the crystal structure of the TCR-pMHC complex, and in the TCR-pMHC crystal structure, a part of amino acid sequence section which is very close to the antigen peptide section can be considered to be combined with the antigen peptide section, namely, the T cell plays a main role in TCR-pMHC specific recognition. By summarizing the regularity of the distances of amino acids to an antigenic peptide at different positions within a given region by working out the published crystal structure data of TCR-pMHC, it was found that in the known crystal structure of TCR-pMHC complexes, most of the amino acid stretches that are close to the antigenic peptide stretch fall in relatively fixed positions in the CDR3 β region. Based on this, an antigen contact site weight vector was generated, and corresponding weights were added to different positions on the target CDR3 β sequence, with an antigen contact site weight of 3 and other positions (non-antigen contact sites) of 1.
The final antigen recognition composite index of the sequence of interest (target CDR3 β sequence) relative to the reference sequence was: the sum of the amino acid similarity score and the weighted vector of the positions of the amino acids is added to the sequence similarity score.
Step S208, outputting an antigen-specific TCR in the target CDR3 β sequence according to the likelihood score. The antigen recognition composite index represents the probability that the target CDR3 beta sequence and the reference sequence recognize the same antigen, wherein the grade is inversely proportional to the similarity of the sequences.
Before outputting the result, the different alignment results of the same target sequence need to be simplified, and only the reference sequence with the lowest score, i.e. the highest similarity, is reserved for each single clonotype. This result is more likely to have specific recognition of the antigen to which the reference sequence corresponds than to other sequences. The result has a standardized format specification, can be used for analyzing and counting other biological information, and provides a credible TCR sequence with potential antigen specificity for downstream analysis.
In practical application, whole blood samples of liver cancer patients at different treatment periods are collected, specific CDR3 beta sequences in tissues of the patients and the whole blood samples are detected by the method, and CDR3 beta sequences of the patients at different treatment periods are compared, and the results are shown in figures 4A-4D, after tumor specific antigen immunotherapy, AFP and MAGEA1 antigen specific TCR in the tissues of the patients and the whole blood samples are obviously increased.
In embodiments of the present application, methods of assessing antigen-specific TCRs include methods of assessing tumor antigen-specific TCRs and methods of assessing pathogen antigen-specific TCRs. The tumor antigen-specific TCR database is used in the evaluation of tumor antigen-specific TCRs and the pathogen antigen-specific TCR database is used in the evaluation of pathogen antigen-specific TCRs.
Epitope peptides of AFP and MAGEA1 were designed and synthesized, and then prepared into MHC multimers, respectively. CD8+ T cells were sorted in PBMCs of patients treated with reinfused CTLs using the CD8+ T cell isolation kit. The CD8+ T cells of the patient are respectively stained by using the synthesized MHC multimer, and the stained CD8+ T cells are sorted by a flow cytometer to obtain CD8+ MHC multimer + T cells. The CDR3 β sequence of the TCR of the resulting double positive T cells was detected using the method of the present invention, and this sequence was aligned with the CDR3 β sequence of the TCR in patient tissue and whole blood samples, as shown in FIGS. 5A-5B, indicating that the patient's TCR is a tumor antigen-specific TCR. The CDR β amino acid sequence of the identified TCR is shown below:
TABLE 1 AFP and MAGEA1 specific CDR beta
Antigens Epitope peptide sequences HLA type CDR3 beta sequence Frequency of
AFP GLSPNLNRFL HLA-A*02 ASSLLLQETQY 0.4843181
AFP FMNKFIYEI HLA-A*02 ASSLELSETQY 0.2426883
MAGEA1 KVLEYVIKV HLA-A*02 ASSLYQETQY 0.1529936
Epitope peptides of CMV were designed and synthesized, and then prepared as MHC multimers. CD8+ T cells in PBMCs of CMV positive donors were isolated using CD8+ T cell isolation kit. The CD8+ T cells of the patient were stained with synthetic MHC multimers, and stained CD8+ T cells were sorted by flow cytometry to give CD8+ MHC multimer + T cells. The identification of CMV antigen-specific TCRs was accomplished by detecting the CDR3 β sequence of the TCR of the resulting double positive T cells using the method of the invention, and with reference to fig. 6, the CDR β amino acid sequence of the identified TCR is shown below:
TABLE 2 CMV-specific CDR beta
Antigens Epitope peptide sequences HLA type CDR3 beta sequence Frequency of
CMV NLVPMVATV HLA-A*02 CASRGQGFSYEQYF 0.521353
CMV NLVPMVATV HLA-A*02 CASSFLGLNEQFF 0.3332293
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (9)

1.A method of establishing an antigen-specific TCR database, comprising:
collecting a T cell sample, and separating out specific T cell clones in the T cell sample;
sequencing the specific T cell clone to obtain CDR3 beta sequence information of the specific T cell clone;
filtering the specific T cell clones according to the frequency and ranking of the specific T cell clones in the T cell sample;
trimming the CDR3 beta region sequence of the specific T cell clone after filtration treatment to obtain a first specific T cell clone with a consistent sequence format;
obtaining sequences in the TCR public database through the TCR public database, and scoring the sequences with credibility, wherein the credibility scoring rule comprises the following steps: a sequencing method scoring rule, a TCR-pMHC complex extraction method scoring rule and a T lymphocyte specificity identification method scoring rule;
the sequencing method scoring rule comprises the following steps: 3X score if single cell sequencing is included; 2X score if amplicon sequencing; if the sequencing is carried out by the sanger, but the sequencing results of two or more cells are obtained, the score is 2X, and if only one cell is obtained, the score is 1X; if no sequencing method is provided, no score is obtained; wherein the sequencing score is set as a;
the TCR-pMHC complex extraction scoring rules comprise: if the frequency is more than 0.1 based on cell sorting, the score is 1X; if the frequency is more than 0.5 based on cell culture, the rate is 1X; if no complex extraction method is provided, no score is obtained; wherein the TCR-pMHC complex extraction method is set to b;
the T lymphocyte specificity identification scoring rule comprises the following steps: if the direct identification method is included, the score is 3X; if a target antigen stimulation method is adopted, 2X points are obtained; if only the dyeing method is adopted, the rate is 1X; wherein the T lymphocyte specificity identification score is set as c;
wherein X is an integer, and the credibility score is the fraction of the smaller one of the sum of a and b plus c;
selecting a sequence with the credibility score larger than a preset value, and trimming the sequence of the CDR3 beta region to obtain a second type of specific T cell clone with consistent sequence format;
establishing an antigen-specific TCR database based on the first and second specific T cell clones.
2. The method of claim 1, wherein the sequences in the antigen-specific TCR database comprise binned and statistical basis information;
the warehousing basic information further comprises: antigen gene name, HLA type, antigen epitope amino acid sequence, trimmed CDR3 beta amino acid sequence and VDJ gene;
the statistical basic information further includes: frequency occupied by CDR3 beta, VJ ratio, CDR3 partial length and distribution.
3. The method of claim 1, wherein said step of isolating specific T cell clones within said T cell sample comprises:
labeling specific T cells in the T cell sample with HLA-restricted MHC-epitope peptide complexes;
isolating specific T cell clones within the T cell sample using flow cytometric sorting techniques or magnetic bead cell sorting techniques.
4. A method of assessing an antigen-specific TCR using the database of any one of claims 1 to 3, comprising:
obtaining a target CDR3 beta sequence;
obtaining a reference sequence through the database;
comparing the target CDR3 beta sequence with the reference sequence, and obtaining the antigen recognition comprehensive index of the target CDR3 beta sequence according to the comparison result;
outputting an antigen-specific TCR in the target CDR3 β sequence according to the antigen recognition composite index.
5. The method of claim 4, wherein the step of aligning the target CDR3 β sequence to the reference sequence comprises:
calculating the Hamming distance between the target CDR3 beta sequence and the reference sequence to obtain a sequence similarity score;
calculating the amino acid similarity between the target CDR3 β sequence and the reference sequence to obtain an amino acid similarity score;
generating an antigen contact site weighting vector from amino acids at different positions of the target CDR3 β sequence, wherein the weight for an antigen contact site position is greater than the weight for a non-antigen contact site position;
and obtaining an alignment result according to the sequence similarity score, the amino acid similarity score and the weighting vector of the position of the amino acid.
6. The method of claim 5, wherein the step of obtaining the antigen recognition composite index of the target CDR3 β sequence from the alignment result comprises:
and obtaining the antigen recognition comprehensive index according to the sum of the amino acid similarity score and the weighted vector of the position of the amino acid and the sequence similarity score.
7. The method of claim 6, wherein the magnitude of the antigen recognition complex index is inversely proportional to the similarity of the sequences.
8. The method of claim 5, wherein the Hamming distance is used to calculate the difference between the target CDR3 β sequence and the reference sequence.
9. The method of claim 5, wherein the amino acid similarity score is calculated using a Blosum matrix.
CN202010828988.6A 2020-08-18 2020-08-18 Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR Active CN112133372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010828988.6A CN112133372B (en) 2020-08-18 2020-08-18 Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010828988.6A CN112133372B (en) 2020-08-18 2020-08-18 Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR

Publications (2)

Publication Number Publication Date
CN112133372A CN112133372A (en) 2020-12-25
CN112133372B true CN112133372B (en) 2022-06-03

Family

ID=73850984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010828988.6A Active CN112133372B (en) 2020-08-18 2020-08-18 Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR

Country Status (1)

Country Link
CN (1) CN112133372B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111574613A (en) * 2020-05-23 2020-08-25 湖南源品细胞生物科技有限公司 TCR enrichment clone type and acquisition method and application thereof
CN113450877B (en) * 2021-06-28 2022-04-08 深圳裕泰抗原科技有限公司 Biomarker analysis method based on multiple immunohistochemical technology and application thereof
CN113980899A (en) * 2021-11-29 2022-01-28 杭州艾沐蒽生物科技有限公司 Method for high-throughput screening of antigen-specific TCR
CN116564423B (en) * 2023-07-05 2023-09-15 广州源古纪科技有限公司 Method and system for constructing microbial metagenome database

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107074932A (en) * 2014-10-02 2017-08-18 美国卫生和人力服务部 Separate the method that the φt cell receptor with antigentic specificity is mutated to cancer specific
US20180201991A1 (en) * 2016-12-09 2018-07-19 Regeneron Pharmaceuticals, Inc. Systems and Methods for Sequencing T Cell Receptors and Uses Thereof
CN108603171A (en) * 2015-12-23 2018-09-28 基因医疗免疫疗法有限责任公司 Antigen-specificity TCR of new generation
CN110277135A (en) * 2019-08-10 2019-09-24 杭州新范式生物医药科技有限公司 A kind of method and system based on expected effect selection individuation knubble neoantigen
CN110318100A (en) * 2019-06-04 2019-10-11 四川常青树生物科技有限公司 Building matching epitope and antibody cog region and the method and database of encoding gene database, storage medium and electronic equipment
US20200056237A1 (en) * 2017-03-31 2020-02-20 The United States Of America,As Represented By The Secretary,Department Of Health And Human Services Methods of isolating neoantigen-specific t cell receptor sequences
CN111534602A (en) * 2020-04-22 2020-08-14 深圳市血液中心(深圳市输血医学研究所) Method for analyzing human blood type and genotype based on high-throughput sequencing and application thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107074932A (en) * 2014-10-02 2017-08-18 美国卫生和人力服务部 Separate the method that the φt cell receptor with antigentic specificity is mutated to cancer specific
CN108603171A (en) * 2015-12-23 2018-09-28 基因医疗免疫疗法有限责任公司 Antigen-specificity TCR of new generation
US20180201991A1 (en) * 2016-12-09 2018-07-19 Regeneron Pharmaceuticals, Inc. Systems and Methods for Sequencing T Cell Receptors and Uses Thereof
US20200056237A1 (en) * 2017-03-31 2020-02-20 The United States Of America,As Represented By The Secretary,Department Of Health And Human Services Methods of isolating neoantigen-specific t cell receptor sequences
CN110318100A (en) * 2019-06-04 2019-10-11 四川常青树生物科技有限公司 Building matching epitope and antibody cog region and the method and database of encoding gene database, storage medium and electronic equipment
CN110277135A (en) * 2019-08-10 2019-09-24 杭州新范式生物医药科技有限公司 A kind of method and system based on expected effect selection individuation knubble neoantigen
CN111534602A (en) * 2020-04-22 2020-08-14 深圳市血液中心(深圳市输血医学研究所) Method for analyzing human blood type and genotype based on high-throughput sequencing and application thereof

Also Published As

Publication number Publication date
CN112133372A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN112133372B (en) Method for establishing antigen-specific TCR database and method for evaluating antigen-specific TCR
Sohail et al. In silico T cell epitope identification for SARS-CoV-2: Progress and perspectives
Meysman et al. On the viability of unsupervised T-cell receptor sequence clustering for epitope preference
Mohanty et al. Prolonged proinflammatory cytokine production in monocytes modulated by interleukin 10 after influenza vaccination in older adults
WO2016040900A1 (en) Personalized cancer vaccines and methods therefor
CN110706742B (en) Pan-cancer tumor neoantigen high-throughput prediction method and application thereof
Gaseitsiwe et al. Peptide microarray-based identification of Mycobacterium tuberculosis epitope binding to HLA-DRB1* 0101, DRB1* 1501, and DRB1* 0401
Gironi et al. A novel data mining system points out hidden relationships between immunological markers in multiple sclerosis
CN112331344A (en) Immune state evaluation method and application
Palgen et al. Innate and secondary humoral responses are improved by increasing the time between MVA vaccine immunizations
CN108559778A (en) Huppert's disease molecule parting and its application on medication guide
McKinney et al. Cytokine expression patterns associated with systemic adverse events following smallpox immunization
EP4229640A1 (en) Method, system and computer program product for determining peptide immunogenicity
Lukowski et al. Absence of Batf3 reveals a new dimension of cell state heterogeneity within conventional dendritic cells
Yohannes et al. Clustering based approach for population level identification of condition-associated T-cell receptor β-chain CDR3 sequences
Cai et al. SARS-CoV-2 vaccination enhances the effector qualities of spike-specific T cells induced by COVID-19
Milighetti et al. Large clones of pre-existing T cells drive early immunity against SARS-COV-2 and LCMV infection
Nozuma et al. Immunopathogenic CSF TCR repertoire signatures in virus-associated neurologic disease
JP2023530790A (en) Methods and systems for optimal vaccine design
US20210403529A1 (en) Tcr-enriched clonotype, acquisition method and use thereof
EP3767629A1 (en) Effective clustering of immunological entities
Feraoun et al. The route of vaccine administration determines whether blood neutrophils undergo long-term phenotypic modifications
CN108570501A (en) Huppert's disease molecule parting and application
CN113248593A (en) TCR enrichment clone type and acquisition method and application thereof
Wilamowski et al. InterClone: Store, search and cluster Adaptive immune receptor repertoires

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant