CN111613269B - Method for predicting HLA match probability and mismatch type - Google Patents
Method for predicting HLA match probability and mismatch type Download PDFInfo
- Publication number
- CN111613269B CN111613269B CN202010424265.XA CN202010424265A CN111613269B CN 111613269 B CN111613269 B CN 111613269B CN 202010424265 A CN202010424265 A CN 202010424265A CN 111613269 B CN111613269 B CN 111613269B
- Authority
- CN
- China
- Prior art keywords
- hla
- database
- mismatch
- dqb1
- haplotype
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 102000054766 genetic haplotypes Human genes 0.000 claims abstract description 88
- 108700028369 Alleles Proteins 0.000 claims abstract description 32
- 238000006243 chemical reaction Methods 0.000 claims abstract description 11
- 101100284398 Bos taurus BoLA-DQB gene Proteins 0.000 claims description 45
- 101001100327 Homo sapiens RNA-binding protein 45 Proteins 0.000 claims description 29
- 102100038823 RNA-binding protein 45 Human genes 0.000 claims description 29
- 108010075704 HLA-A Antigens Proteins 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000002068 genetic effect Effects 0.000 claims description 2
- 102000011786 HLA-A Antigens Human genes 0.000 claims 1
- 238000002054 transplantation Methods 0.000 abstract description 9
- 238000012216 screening Methods 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 abstract description 3
- 230000004083 survival effect Effects 0.000 abstract description 3
- 238000012790 confirmation Methods 0.000 abstract 1
- 102100026639 MICOS complex subunit MIC60 Human genes 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 10
- 102100028972 HLA class I histocompatibility antigen, A alpha chain Human genes 0.000 description 4
- 210000001072 colon Anatomy 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 3
- 239000000427 antigen Substances 0.000 description 3
- 108091007433 antigens Proteins 0.000 description 3
- 102000036639 antigens Human genes 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011316 allogeneic transplantation Methods 0.000 description 2
- 210000001185 bone marrow Anatomy 0.000 description 2
- 238000003205 genotyping method Methods 0.000 description 2
- 208000026278 immune system disease Diseases 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 108010024164 HLA-G Antigens Proteins 0.000 description 1
- 206010053159 Organ failure Diseases 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000002163 immunogen Effects 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 229960005486 vaccine Drugs 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for predicting HLA matching probability and mismatch type, which comprises the following steps: (1) constructing an HLA database; (2) inputting and submitting genotypes to be aligned; (3) Converting the format of the alleles entered in step (2) to match the format in the HLA database; (4) Arranging and combining the genotypes subjected to format conversion in the step (3) to obtain a haplotype combination; (5) Comparing the haplotype combination obtained in the step (4) with the HLA database constructed in the step (1); and (6) obtaining a prediction result through comparison. The method of the invention is helpful for clinically selecting the primary screening donor with larger 10/10 matching probability to carry out the confirmation typing, and selecting the optimal irrelevant donor which can allow the mismatch from 8-9/10 mismatch donors, thereby saving the detection cost of patients and having important influence on reducing the transplantation complications and improving the transplantation survival.
Description
Technical Field
The invention belongs to the field of biomedicine, and particularly relates to a method for predicting HLA matching probability and mismatch type.
Background
Human leukocyte antigens (human leucocyte antigen, HLA) are not only closely related to allogeneic transplantation, but also play an important role in the genetic laws of human evolution, the occurrence and development of immune diseases, tumor escape, and vaccine preparation. Patients with clinically developed tumors and organ failure eventually only save lives by xenograft, while HLA is the most common transplantation antigen, and the extent to which the patient and donor HLA match or mismatch directly affects the prognosis of transplantation and the long-term survival of the patient.
At present, the potential donor for the allogeneic transplantation is a related donor and a non-blood donor, so that HLA genotyping results of patients are screened from the related donor and search matching of the unrelated donor is carried out in a Chinese bone marrow bank before the transplantation is carried out clinically. In the case where the patient has no related donor available for selection of a transplant, it is important to select the best matching donor among the selected preliminary matching unrelated donors. At present, the retrieval flow of the Chinese bone marrow bank is as follows: 1) There are two results after the primary screening of unrelated donors: HLA-A, B, DRB1, C, DQB1 five locus 10 allele high score results; HLA-A, B, DRB1 three locus 6 allele high score or low score results. 2) The confirmatory typing was reconfirmed of samples and results from the above-described prescreening donors and patients with varying degrees of match.
Because different patients have different opportunities to retrieve preliminary matching donors; a, B, DRB1 locus 6/6 high score or low score, 9/10 match preliminary match donor, the probability that HLA various loci match or mismatch appears in the stage of confirming the parting is different, so the existing search method has the defects of large blindness, high patient detection cost and the like.
Although there are obvious differences between HLA alleles and haplotype frequencies of Chinese population and other international ethnicities, foreign literature mentions that HLA genotyping is utilized to predict the match probability of donors, however, complete search system is not formed and is applied in clinical cases.
Disclosure of Invention
In order to solve the above problems in the prior art, an object of the present invention is to provide a method for predicting HLA match probability and mismatch type.
In order to achieve the above object, the present invention provides the following technical solutions:
a method for predicting HLA match probability and mismatch type, comprising the steps of:
(1) Constructing an HLA database;
(2) Inputting and submitting genotypes to be compared, wherein the input genotypes are HLA-A, B, C, DRB1 and DQB1 locus genotypes;
(3) Converting the format of the alleles entered in step (2) to match the format in the HLA database:
(4) Arranging and combining the genotypes subjected to format conversion in the step (3) to obtain a haplotype combination;
(5) Comparing the haplotype combination obtained in the step (4) with the HLA database constructed in the step (1);
(6) And obtaining a prediction result through comparison.
Further, the HLA database comprises an HLA allele CWD database, an HLA haplotype frequency database and an HLA negative linkage database; wherein the HLA haplotype frequency database comprises a 10/10 phase matching prediction database, a 9/10 phase matching prediction database and a C, DQB1 prediction database.
Further, the HLA allele CWD database comprises genotypes, C/WD/R, frequency; wherein C is a Common gene, WD is a confirming gene, and R is a rare gene (ref: common and well-documented HLA alleles: report of the Ad-Hoc committee of the american society for histocompatiblity and immunogenes. Human immunology. 2007);
the 10/10-phase prediction database comprises A-B-C-DRB1-DQB1 haplotypes, A, B, C, DRB1, DQB1 genotypes, frequency, sequencing, C/WD/R;
the 9/10-phase prediction database comprises an A mismatch database, a B mismatch database, a C mismatch database, a DRB1 mismatch database and a DQB1 mismatch database; each mismatch database comprises mismatched haplotypes and mismatched genotypes and frequencies corresponding to the haplotypes;
the C, DQB1 prediction database comprises C, DQB1 genotypes and frequency corresponding to the A-B-DRB1 haplotype and haplotype.
Further, the number of haplotype combinations in the step (4) is 2 n-1 And a group, wherein n is the number of gene loci.
Further, the comparison method in the step (5) comprises 10/10 combined search, 9/10 combined search, A, B, 6/6 combined search of DRB1 locus, allele CWD interpretation and negative linkage search.
A system for predicting HLA match probability and mismatch type, which executes the above method for predicting HLA match probability and mismatch type, and comprises an HLA database, a genotype entry module, a format conversion module, a genotype combination module and a comparison module;
the HLA database is used for storing reference data;
the genotype input module is used for inputting genotypes to be compared and checking the format of the genotypes;
the format conversion module is used for processing and converting the input alleles and unifying formats matched with an HLA database:
the genotype combination module is used for arranging and combining genotypes after format conversion to obtain a haplotype combination;
and the comparison module is used for comparing the haplotype combination with the HLA database constructed in the step (1) to obtain HLA matching probability and mismatch type.
A storage medium for predicting HLA-matched probabilities and mismatch types, the storage medium performing the above-described method of predicting HLA-matched probabilities and mismatch types.
A processor for predicting HLA match probability and mismatch type, the processor being configured to run a program, the program being configured to perform the above-described method of predicting HLA match probability and mismatch type.
A device for predicting HLA match probability and mismatch type for performing the above-described method of predicting HLA match probability and mismatch type.
The beneficial effects are that: the invention provides a method for predicting HLA match probability and mismatch type, which can predict and search HLA-10/10 match or 8-9/10 match donor probability in early disease, wherein the possible results of A, B and DRB1 locus 6/6 high-score or low-score match donor C and DQB1 are helpful for clinically selecting a primary screening donor with a higher 10/10 match probability for confirming and typing, and selecting an optimal irrelevant donor which can allow mismatch in 8-9/10 mismatch donors, thereby saving the detection cost of patients and having important influence on reducing transplantation complications and improving transplantation survival.
The method can carry out code development on different platforms and by adopting different programming languages, and is easy to realize, develop and popularize; the practical requirements in clinical transplantation work are closely met, the practicability is strong, and the method can be extended to the field of tumor immunity and immune diseases; the user can complete the prediction of all results by only inputting the genotype results into the designated positions by one key, and the method is simple to operate and has wide development and application prospects.
Drawings
FIG. 1 is a flow chart of a technical route of the method of the present invention.
Detailed Description
The invention is further described below in connection with specific embodiments, which are exemplary only and do not limit the scope of the invention in any way. It will be understood by those skilled in the art that various modifications and substitutions can be made in the details and form of the present invention without departing from the spirit and scope of the invention, and that various modifications and substitutions can be made without departing from the spirit and scope of the invention.
The invention starts from the requirement of prejudgement of the donor result in the clinical transplantation field, and based on deep understanding of research results such as HLA allele and haplotype frequency, CWD, interrelation, linkage disequilibrium and the like, the success of the scientific research results is converted into a search and prediction tool which is easy to develop, operate and popularize. Firstly, a background reference database with a specific format is established and is used for comparing with haplotypes with a matching format virtually obtained by HLA genotypes at the foreground, so that a retrieval function is realized; then, by setting the screening conditions and the preferred parameters, the patient's probability of retrieving 10/10, 9/10 co-donors, 6/6 co-donors C, DQB1 possible results, and the type of possible mismatch for the patient are predicted.
The technical scheme of the invention comprises six main parts: establishing a background reference database, inputting and submitting a foreground genotype result, processing genotype data formats, generating virtual haplotypes and linkage genes, comparing search data with the reference database, and giving a prediction result according to screening conditions and preferred parameters. The technical route flow chart of the method of the invention is shown in figure 1. In the first part, a China crowd HLA allele CWD database, an HLA haplotype frequency database and an HLA linkage disequilibrium parameter database are established, and are written into a format which can be used for searching and comparing, and are used as background reference databases, the content of the database is required to be hidden to avoid leakage and error modification, the database is protected by means of setting passwords and the like, and updating and maintenance can be carried out on the database after authorization.
(1) HLA allele CWD reference database format
Comprises 3 columns A-C, which are respectively genotype, C/WD/R and frequency, and the format is shown in Table 1:
TABLE 1
Genotype of the type | C/WD/R | Frequency number |
A*01:01 | C | 76477 |
A*01:03 | WD | 300 |
A*01:06 | R | 3 |
A*01:127 | R | 2 |
A*01:129 | R | 1 |
A*01:141 | R | 1 |
(2) HLA haplotype frequency and CWD reference database format
1) 10/10-phase prediction database
Comprises 9 columns A-I, namely A-B-C-DRB1-DQB1 haplotypes, A, B, C, DRB1 and DQB1 genotypes, frequency, sequencing and C/WD/R respectively, and the formats are shown in Table 2:
TABLE 2
2) 9/10 coherent prediction database
A mismatch (B-C-DRB 1-DQB1 haplotype), B mismatch (A-C-DRB 1-DQB1 haplotype), C mismatch (A-B-DRB 1-DQB1 haplotype), DRB1 mismatch (A-B-C-DQB 1 haplotype), DQB1 mismatch (A-B-C-DRB 1 haplotype) and a total of 5 mismatch databases. Taking A mismatch as an example, the A mismatch comprises 3 columns A-C, wherein the A mismatch (B-C-DRB 1-DQB1 haplotype) and the A genotype and frequency corresponding to the haplotype are respectively shown in a format shown in Table 3:
TABLE 3 Table 3
3) C, DQB1 predictive database
Comprises 4 columns A-D, namely C corresponding to A-B-DRB1 haplotype and haplotype, and genotype and frequency of DQB1, wherein the format is shown in table 4:
TABLE 4 Table 4
(3) HLA negative linkage reference database
Comprises 4 columns A-D, which are respectively linked genes, D', r2 and P values, and the format is shown in the table 5:
TABLE 5
Linkage gene | D′ | r2 | P |
A*01:01-C*01:02:01G | -0.6784 | 0.0025 | 0.0002 |
A*02:01-C*06:02 | -0.6332 | 0.0066 | 0.0000 |
A*02:03-C*03:04 | -0.2285 | 0.0002 | 0.3161 |
A*02:03-C*08:01:01G | -0.5882 | 0.0010 | 0.0172 |
A*02:06-C*03:02 | -0.7246 | 0.0023 | 0.0003 |
The second part, filling HLA-A, B, C, DRB1, DQB1 locus genotype results in a genotype entry box, wherein the filling requirements comprise: (1) Each locus contains two alleles, when one genotype entry box has content, the other genotype entry box cannot be empty; (2) The filled-in content may only contain arabic numerals, 26 letters (case-less), western colon ": "(the input Chinese colon automatically changes to the Western state), letters are not allowed to follow the colon, and the total length of the input cannot exceed 13 bytes; (3) The input boxes of A, B and DRB1 are necessary filling items, if the input boxes are not filled completely, the next search can not be carried out, and the prompt is that 'please fill the complete resubmission data with the results of A, B and DRB1 sites', (4) if the results of A, B and DRB1 are filled only, the prediction function of possible results of C and DQB1 can be executed only; five bit results of A, B, C, DRB1 and DQB1 are completely filled to execute all prediction functions. The genotype entry format is exemplified in table 6 below:
TABLE 6
The third part, processing and converting the input alleles, unifies the format matching the background database:
(1) Genotype four-position
All "entered alleles" only take the first colon and the tandem arabic numerals, if the last digit is G or P, then the values are complete, for example: 02:06:01:01 or 02:06:01 or 02:06 all became 02:06, 24:02:01:02L or 24:02:01L or 24:02L all became 24:02, 35:108:02:01 or 35:108:02 or 35:108 all became 35:108, 12:01:01G still 12:01:01G.
(2) G group judgment
HLA G group is public resource on https:// www.ebi.ac.uk/ipd/imgt/HLA/network, the invention picks up the G group name and the included alleles related in the HLA allele CWD database and the HLA haplotype frequency database, and writes the HLA alleles in the HLA group name CWD database into a G group reference database in a unified format, wherein the G group reference database comprises A, B columns which are respectively G group names (example 04: 01G) and gene names (example 04:01 or 04: 82).
Comparing the genotype obtained in the step (1) with the B column of the background G group reference database, and if matching content is searched, converting the genotype obtained in the step (1) into the content of the A column, for example: 04:01 or 04:82 or 04:01:01G each becomes 04:01:01G.
(3) Grabbing letters
The last letter is grasped from the "entered allele" and a null value is returned if the last byte is the letter G or P or not.
(4) Merging
This step produces two branches, branch one: combining the results of steps (2) and (3), for example: 02:06:01 or 02:06:06 eventually both become 02:06, 24:02:01:02l or 24:02:01l or 24:02l eventually both become 24:02l,35:108:02:01 or 35:108:02 or 35:108 eventually both become 35:108; branch two: combining the results of steps (1) and (3).
(5) Addition site
Adding "loci" to the first and second branches of "pooled alleles" in step (4) to obtain "loci-1" and "loci-2", respectively "
The "locus genotype-1" format is consistent with the HLA haplotype frequency and allele format referred to in column A of the CWD reference database, and the results of the conversion of the "locus allele" by this step are shown in Table 7 below:
TABLE 7
The "locus genotype-2" format is consistent with the allele format referred to in column a of the allele CWD reference database, HLA negative linkage reference database, and the results of the "locus allele" conversion by this scheme are shown in table 8 below:
TABLE 8
Fourth, the fourth part is arranged and combined by using the locus-added genotype-1 to virtually obtain 2 n-1 Group (2) n Bars) theoretical haplotype results (n is the number of sites), examples:
(1) The A, B, C, DRB1 and DQB1 have 5 sites, and 16 groups (32) of A-B-C-DRB1-DQB1 theoretical haplotypes can be virtually obtained;
(2) Taking the site lacking A as an example, when any 1 site lacks, 8 groups (16) of B-C-DRB1-DQB1 theoretical haplotypes can be virtually obtained;
(3) Taking the example of lack of C and DQB1 sites, when any 2 sites are absent, 4 groups (8) of A-B-DRB1 theoretical haplotypes can be virtually obtained. Alleles at each locus were then pooled and separated by western bars "-" for subsequent alignment with the reference database. The three types of arrangements are shown in the following tables 9 to 11, respectively:
TABLE 9A-B-C-DRB1-DQB1 haplotype combinations
TABLE 10B-C-DRB1-DQB1 (lack of A site) haplotype combinations
TABLE 11A-B-DRB1 (lack of C, DQB1 site) haplotype combinations
Fifth part, comparing the retrieved data with the reference database
(1) Allelic CWD interpretation
And (3) respectively searching the 10 loci-added genotypes-2 into the A column of the background genotype CWD reference database, and if the matched genotypes are searched, assigning the C/WD/R and the frequency of the B column and the C column to the designated background position.
(2) Complete match (10/10 match) search
Searching A-B-C-DRB1-DQB1 haplotypes, namely searching A column of a background A-B-C-DRB1-DQB1 haplotype reference database by using 32 virtual haplotypes, and if matching haplotypes are searched, assigning genotypes, frequencies, sequences and C/WD/R of B-J columns to appointed positions of a background calculation library; if the matching class capacity is not retrieved, a null value is given; each combination comprises two haplotypes, when one haplotype in the combination searches the matching class capacity, the other haplotype is displayed at a designated position even if the other haplotype is not matched, and the haplotype CWD is assigned as a content null value such as 'no match', sorting, frequency and the like.
(3) Gene mismatch (9/10 match) search
The A locus mismatch search, namely the B-C-DRB1-DQB1 locus haplotype search lacking the A locus; b site mismatch retrieval, namely, A-C-DRB1-DQB1 site haplotype retrieval lacking B site; c site mismatch retrieval, namely, retrieval of A-B-DRB1-DQB1 site haplotype lacking the C site; a DRB1 locus mismatch search, namely, a-B-C-DQB1 locus haplotype search lacking the DRB1 locus; and (3) performing DQB1 locus mismatch search, namely A-B-C-DRB1 locus haplotype search lacking the DQB1 locus. There are 16 haplotypes for each mismatch type. Taking A locus mismatch retrieval as an example, comparing all values of A column in a background reference database with 16 virtual haplotypes of A mismatch one by one, and if matching content is retrieved, assigning A genotypes and frequency numbers of B column and C column to appointed positions of a background calculation library; if no matching class is retrieved, a null value is assigned.
(4) A, B, DRB1 locus 6/6 combined search
Searching the A-B-DRB1 haplotype, comparing all contents of the A column of a background A-B-DRB1 haplotype reference database with 8 virtual haplotypes one by one, and if matching contents are searched, assigning C, DQB1 results and frequency numbers of the B-E columns to the appointed position of a background calculation library; if no matching class is retrieved, a null value is assigned.
(5) Negative linkage search
Combining the loci genotype-2 with each other, namely combining the loci genotype A-B, genotype A-C, genotype A-DRB1, genotype A-DQB1, genotype B-C, genotype B-DRB1, genotype B-DQB1, genotype C-DQB1 and genotype C-DQB1 with each other, comparing the loci A with the column A of the background negative linkage reference database, and assigning the contents of the column A to the designated positions of the background if the matched contents are searched.
A sixth section for giving a prediction result based on the screening conditions and the preference parameters and displaying the prediction result in the foreground
(1) Allelic CWD interpretation
C/WD/R and the number of cases are displayed in a foreground 'genotype CWD' display frame, C only needs to display C, WD or R needs to display WD (frequency) or R (frequency).
(2) Complete match (10/10 match) search
1) 10/10 phase theory probability calculation: half of the sum of the two haplotype frequencies (Haplotype Frequency, HF) in a set is the theoretical probability (Haplotype Matching Probability, HMP) that the haplotype combinations can be 10/10 combined, i.e., hmp= (HF 1+ HF 2)/2. The total probability of the 16-group haplotypes 10/10 combining (Total Haplotype Matching Probability, THMP) is the sum of the 16-group HMPs, i.e., thmp= Σhmpi, i=16. The THMP value is displayed in the foreground "10/10-match probability prediction" display frame.
2) 10/10 match opportunity judgment
The method is divided into 9 layers of high+, high-, medium+, medium, low+, low- ". (1) Dividing 10/10 of the combination of haplotypes into three layers of high, medium and low, and if both haplotypes are Common or WD, the two haplotypes are high; one is Common or WD, the other is Rare, no match or null, then "Medium"; both are Rare, no match or null, then "low". (2) The high, medium and low are further classified into high, high-, medium-, low- "according to the HMP value of the combination, and the high is classified into high, high and high-" by taking the HMP value of 0.4% -0.2% as a boundary; the 'middle' is further divided into 'middle+, middle and middle-' by taking the HMP value of 0.1% -0.05% as a boundary; the "low" is classified as "low+, low-, by the HMP value of 0.05% -0.02%. (3) The frequency of "high+, high-, medium+, medium-, low+, low-" is counted. (4) When the same level appears twice or more, the level can be increased upwards by one level, and finally the highest level is taken as the total rating, and the total level is displayed in a foreground 10/10 combined probability prediction display frame.
3) The retrieved haplotype combinations are arranged from high to low according to the 10/10 combined theoretical probability, and the haplotype combinations, the corresponding CWDs, the ranking, the frequency and the opportunity are displayed in the foreground.
(3) Gene mismatch (9/10 match) search
1) The searched mismatched haplotypes of A, B, C, DRB1 and DQB1 are arranged from large to small according to frequency;
2) Obtaining theoretical probability (Haplotype Mismatching Probability, HMMP) of the mismatch type using frequency/database haplotype count/16;
3) 9/10 phase sum probability calculation, average of sum of A, B, C, DRB, DQB1 site HMMP and THMP, namely THMMP= (HMMP) A +HMMP B +HMMP C +HMMP DRB1 +HMMP DQB1 +THMP)/2;
4) 9/10 of the matching opportunity judges that the THMMP value is divided into three layers of high, medium and low by taking 1% -0.1% of the THMMP value as a boundary;
5) HMMP values, total rank, a, B, C, DRB1, DQB1 each locus likely mismatch genotype and probability are shown in the foreground.
(4) Prediction of C, DQB1 site results upon 6/6 phase of A, B, DRB1 site
1) 8 (4 groups) virtual haplotypes and the retrieved C, DQB1 results are paired in groups, for example: haplotype-1 and haplotype-2 are a group, 2 types of C are searched by the haplotype-1, the results of DQB1 are C, DQB1-1, C and DQB1-2 respectively, 2 types of C are searched by the haplotype-2, the results of DQB1 are C respectively, DQB1-3, C, DQB1-4, then there are 4 possibilities for the possible outcomes of the group C, DQB1, namely C, DQB1-1& C, DQB1-3, C, DQB1-1& C, DQB1-4, C, DQB1-2& C, DQB1-3, C, DQB1-2& C, DQB1-4.
2) Summarizing 4 groups of possible C and DQB1 results of the virtual haplotypes, sorting from large to small according to the frequency, and using the probability of one C and DQB1 result as the frequency/total number of the haplotypes of the database/2;
3) All possible combinations and probabilities of C, DQB1 are displayed in the foreground.
(5) Negative linkage search
Example 1
The 10/10 match probability rating is "high+" for example, a combination of 2 Common, so it is first judged "high", then based on the combined HMP value >0.4%, so it is then judged "high+"; the combination two is 1 Common and 1 Rare, so that the combination is firstly judged as 'middle', and then the combination is judged as 'middle' according to the fact that the HMP value is more than 0.05% and the combination is 0.1% >; the combination three is 2 Rares, so that the combination three is firstly judged to be 'low', and then the combination three is judged to be 'low' according to the fact that the HMP value is more than 0.02% of the combination three; the combination three and four are 1 Rare and 1 Rare without matching, so that the combination is judged to be 'low', and then the combination is judged to be 'low' according to the HMP value of < 0.02%. The two 'low-' increases to 1 'low', then increases to 1 'low+' with the original 1 'low', and finally takes the highest level 'high+' as the final rating of 10/10; the total 9/10 match probability THMMP is >1% and so the rating is "high".
Table 12
Example 2
The 10/10 match probability rating is "high-" for example, 3 "medium+" steps up to 1 "high-", so the highest level "high-" is finally taken as the 10/10 match final rating; the total probability of 9/10 match is 1% > THMMP >0.1%, so the rating is "medium".
TABLE 13
Example 3
The reason that the 10/10 match probability rating is "middle+" for example, C, DQB1 prediction is null is that 1 mismatch appears in each group of haplotypes, so no result appears in the step of performing in-combination permutation and combination, and reference is made to the 9/10 match C, DQB1 prediction.
TABLE 14
Example 4
10/10 odds of the match are rated as "low+" for example, 3 "low-" increments to 1 "low", and 1 "low+" is added to the original 2 "low", so the highest rated "low+" is finally taken as the final rating; the total 9/10 match probability THMMP is <0.1%, so the rating is "low".
TABLE 15
Example 5
The genotype CWD interpretation results are exemplified by that genotype A is WD, thus the number of examples need to be displayed, and the rest are C, thus the number of examples need not to be displayed.
Table 16
Example 6
A, B, DRB1 locus 6/6 co-donors C, DQB1 outcome prediction examples, C, DQB1 predicted 2 sets of outcomes, but the first set of outcomes were significantly more probable than the second set.
TABLE 17
In each of examples 1 to 4, the search of the independent provider was performed, and since the search page of the independent provider was displayed only for the first 15 search providers, if the search page exceeds 15, the page is temporarily displayed, and if the search page is less than 15. The results displayed on the search page are compared with the predicted results of the present invention for verifying the reliability of the predictions of the present invention. These cases are 10/10 match and 9/10 match predictions and mismatch types are highly consistent with the patient's actual opportunity to retrieve unrelated donors, and the mismatch types shown on the web page fall largely within the predicted mismatch types, as detailed in the following table:
TABLE 18
The patient of example 6 had two A, B, DRB1 site 6/6 matched donors for confirmatory typing, so that the predicted results of C, DQB1 were compared with the actual results of confirmatory typing, which were fully matched with the predicted high probability results, as shown in the following Table:
TABLE 19
Claims (7)
1. A method for predicting HLA match probability and mismatch type, comprising the steps of: (1) constructing an HLA database; (2) Inputting and submitting genotypes to be compared, wherein the input genotypes are HLA-A, B, C, DRB1 and DQB1 locus genotypes; (3) Converting the format of the alleles entered in step (2) to match the format in the HLA database; (4) Arranging and combining the genotypes subjected to format conversion in the step (3) to obtain a haplotype combination; (5) Comparing the haplotype combination obtained in the step (4) with the HLA database constructed in the step (1); (6) obtaining a prediction result through comparison; the HLA database comprises an HLA allele CWD database, an HLA haplotype frequency database and an HLA negative linkage database; wherein the HLA haplotype frequency database comprises a 10/10 phase matching prediction database, a 9/10 phase matching prediction database and a C, DQB1 prediction database;
the HLA allele CWD database comprises genotype, C/WD/R, frequency;
the 10/10-phase prediction database comprises A-B-C-DRB1-DQB1 haplotypes, A, B, C, DRB1, DQB1 genotypes, frequency, sequencing, C/WD/R;
the 9/10-phase prediction database comprises an A mismatch database, a B mismatch database, a C mismatch database, a DRB1 mismatch database and a DQB1 mismatch database; each mismatch database comprises mismatched haplotypes and mismatched genotypes and frequencies corresponding to the haplotypes;
the C, DQB1 prediction database comprises C, DQB1 genotypes and frequency corresponding to the A-B-DRB1 haplotype and haplotype.
2. The method of claim 1, wherein the number of haplotype combinations in step (4) is 2n-1, where n is the number of genetic loci.
3. The method of claim 1, wherein the alignment of step (5) comprises 10/10-phase search, 9/10-phase search, a, B, DRB1 site 6/6-phase search, allele CWD interpretation, and negative linkage search.
4. A system for predicting HLA match probability and mismatch type, wherein the system performs the method for predicting HLA match probability and mismatch type according to any one of claims 1 to 3, and the system comprises an HLA database, a genotype entry module, a format conversion module, a genotype combination module, and a comparison module;
the HLA database is used for storing reference data;
the genotype input module is used for inputting genotypes to be compared and checking the format of the genotypes;
the format conversion module is used for processing and converting the input alleles and unifying a format matched with an HLA database;
the genotype combination module is used for arranging and combining genotypes after format conversion to obtain a haplotype combination;
and the comparison module is used for comparing the haplotype combination with the HLA database constructed in the step (1) to obtain HLA matching probability and mismatch type.
5. A storage medium for predicting HLA match probability and mismatch type, wherein the storage medium performs the method for predicting HLA match probability and mismatch type according to any one of claims 1 to 3.
6. A processor for predicting HLA match probability and mismatch type, wherein the processor is configured to run a program, the program being configured to execute the method for predicting HLA match probability and mismatch type according to any one of claims 1 to 3.
7. A device for predicting HLA match probability and mismatch type, characterized in that it is used for carrying out the method for predicting HLA match probability and mismatch type according to any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010424265.XA CN111613269B (en) | 2020-05-19 | 2020-05-19 | Method for predicting HLA match probability and mismatch type |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010424265.XA CN111613269B (en) | 2020-05-19 | 2020-05-19 | Method for predicting HLA match probability and mismatch type |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111613269A CN111613269A (en) | 2020-09-01 |
CN111613269B true CN111613269B (en) | 2024-01-05 |
Family
ID=72198908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010424265.XA Active CN111613269B (en) | 2020-05-19 | 2020-05-19 | Method for predicting HLA match probability and mismatch type |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111613269B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103221551A (en) * | 2010-11-23 | 2013-07-24 | 深圳华大基因科技有限公司 | HLA genotype-SNP linkage database, its constructing method, and HLA typing method |
JP2015228819A (en) * | 2014-06-04 | 2015-12-21 | ジェノダイブファーマ株式会社 | Dna typing method for hla gene, and computer program used for data analysis of the same method |
CN106434863A (en) * | 2016-06-15 | 2017-02-22 | 广州医科大学附属第二医院 | Method for identifying HLA-DQB1 exon 2 haplotype |
CN108241792A (en) * | 2016-12-23 | 2018-07-03 | 深圳华大基因科技服务有限公司 | A kind of method and apparatus for integrating multi-platform genotypic results |
CN110423818A (en) * | 2019-08-07 | 2019-11-08 | 南京实践医学检验有限公司 | A kind of primer, probe, kit and method for HLA-LOSS detection and individual identification |
-
2020
- 2020-05-19 CN CN202010424265.XA patent/CN111613269B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103221551A (en) * | 2010-11-23 | 2013-07-24 | 深圳华大基因科技有限公司 | HLA genotype-SNP linkage database, its constructing method, and HLA typing method |
JP2015228819A (en) * | 2014-06-04 | 2015-12-21 | ジェノダイブファーマ株式会社 | Dna typing method for hla gene, and computer program used for data analysis of the same method |
CN106434863A (en) * | 2016-06-15 | 2017-02-22 | 广州医科大学附属第二医院 | Method for identifying HLA-DQB1 exon 2 haplotype |
CN108241792A (en) * | 2016-12-23 | 2018-07-03 | 深圳华大基因科技服务有限公司 | A kind of method and apparatus for integrating multi-platform genotypic results |
CN110423818A (en) * | 2019-08-07 | 2019-11-08 | 南京实践医学检验有限公司 | A kind of primer, probe, kit and method for HLA-LOSS detection and individual identification |
Non-Patent Citations (3)
Title |
---|
Bone marrow transplantation with unrelated donors: what is the probability of identifying an HLA-A/B/Cw/DRB1/B3/B5/DQB1-matched donor?;J-M Tiercy 等;Bone Marrow Transplantation;第26卷;第437-441页 * |
应用不同类型白血病患者群体HLA-A、B和DRB1 遗传学数据评估HLA相合供者的概率∗;高素青 等;中国输血杂志;第22卷(第5期);第358-362页 * |
应用不同类型白血病患者群体HLA-A、B和DRB1遗传学数据评估HLA相合供者的概率;高素青 等;中国输血杂志;第22卷(第5期);第358-362页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111613269A (en) | 2020-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Schneider et al. | Arlequin ver. 2.000 | |
Sirén et al. | Pangenomics enables genotyping of known structural variants in 5202 diverse genomes | |
Lowy-Gallego et al. | Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project | |
Buhler et al. | The heterogeneous HLA genetic makeup of the Swiss population | |
US10741291B2 (en) | Systems and methods for genomic annotation and distributed variant interpretation | |
Nunes et al. | How ancestry influences the chances of finding unrelated donors: an investigation in admixed Brazilians | |
Tassone et al. | Different eplet software programs give discordant and incorrect results: an analysis of HLAMatchmaker vs Fusion Matchmaker Eplet calling software | |
He et al. | Population genetics, diversity and forensic characteristics of Tai–Kadai-speaking Bouyei revealed by insertion/deletions markers | |
Maróstica et al. | How HLA diversity is apportioned: influence of selection and relevance to transplantation | |
JP2009516241A (en) | Selection of blood donors with blood group identification by cross-test for blood type-identified recipients | |
Lea et al. | Natural selection of immune and metabolic genes associated with health in two lowland Bolivian populations | |
CN111613269B (en) | Method for predicting HLA match probability and mismatch type | |
KR102085169B1 (en) | Analysis system for personalized medicine based personal genome map and Analysis method using thereof | |
Pemberton et al. | Impact of restricted marital practices on genetic variation in an endogamous Gujarati group | |
Sajulga et al. | Assessment of HLA-DPB1 genetic variation using an HLA-DP tool and its implications in clinical transplantation | |
Kulski et al. | HLA class I allele lineages and haplotype frequencies in Arabs of the United Arab Emirates | |
JP2005122231A (en) | Screen display system and screen display method | |
Wolters et al. | Mitochondrial genome diversity across the subphylum Saccharomycotina | |
Song et al. | Integration of new alternative reference strain genome sequences into the Saccharomyces genome database | |
Excoffier et al. | An integrated software package for population genetics data analysis | |
Testi et al. | HLA‐A‐B‐C‐DRB1‐DQB1 phased haplotypes in 124 Nigerian families indicate extreme HLA diversity and low linkage disequilibrium in Central‐West Africa | |
Naseri et al. | Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in uk biobank | |
US10936681B2 (en) | Generalized search engine for abstract data types with skimming and approximate retrieval | |
KR20190000341A (en) | Analysis platform for personalized medicine based personal genome map and Analysis method using thereof | |
Higgins et al. | Characterizing subgenome recombination and chromosomal imbalances in banana varietal lineages |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |