CN111613269A - Method for predicting HLA matching probability and mismatch type - Google Patents

Method for predicting HLA matching probability and mismatch type Download PDF

Info

Publication number
CN111613269A
CN111613269A CN202010424265.XA CN202010424265A CN111613269A CN 111613269 A CN111613269 A CN 111613269A CN 202010424265 A CN202010424265 A CN 202010424265A CN 111613269 A CN111613269 A CN 111613269A
Authority
CN
China
Prior art keywords
database
hla
mismatch
dqb1
haplotype
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010424265.XA
Other languages
Chinese (zh)
Other versions
CN111613269B (en
Inventor
李杨
何军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Suzhou University
Original Assignee
First Affiliated Hospital of Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Suzhou University filed Critical First Affiliated Hospital of Suzhou University
Priority to CN202010424265.XA priority Critical patent/CN111613269B/en
Publication of CN111613269A publication Critical patent/CN111613269A/en
Application granted granted Critical
Publication of CN111613269B publication Critical patent/CN111613269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for predicting HLA matching probability and mismatch types, which comprises the following steps: (1) constructing an HLA database; (2) inputting and submitting genotypes to be compared; (3) converting the format of the allele recorded in the step (2) into a format matched with the format in an HLA database; (4) arranging and combining the genotypes subjected to format conversion in the step (3) to obtain a haplotype combination; (5) comparing the haplotype combination obtained in the step (4) with the HLA database constructed in the step (1); (6) and obtaining a prediction result through comparison. The method of the invention is helpful for clinically selecting 10/10 primary screening donors with higher matching probability to confirm typing, and selecting allowable mismatch optimal unrelated donors from 8-9/10 mismatch donors, thereby saving the detection cost of patients and having important influence on reducing the transplantation complications and improving the transplantation survival.

Description

Method for predicting HLA matching probability and mismatch type
Technical Field
The invention belongs to the field of biomedicine, and particularly relates to a method for predicting HLA matching probability and mismatch types.
Background
Human Leukocyte Antigen (HLA) is not only closely associated with allogeneic transplantation, but also plays an important role in the genetic regularity of human evolution, the occurrence and development of immune diseases, tumor escape and vaccine preparation. Patients with clinically developed tumors and organ failure eventually only survive allogeneic transplantation, with HLA as the most common transplantation antigen, and the degree of HLA match or mismatch between the patient and donor directly affecting the prognosis of the transplantation and the long-term survival of the patient.
At present, the source of potential donor for heterogenic transplantation includes kindred donor and non-blood donor, so that before transplantation, the HLA genotyping result of patient is screened in kindred donor and the irrelevant donor is searched and matched in Chinese marrow bank. In the case where the patient does not have a parental donor to choose for transplantation, it is important to select the best matching donor among the selected preliminary matching unrelated donors. The current Chinese bone marrow pool search process is as follows: 1) there were two results after initial screening by unrelated donors: HLA-A, B, DRB1, C, DQB1 five sites 10 allele high score results; HLA-A, B, DRB1 three sites 6 alleles high score or low score results. 2) Confirmation typing is the reconfirmation of the samples and results of the primary screening donors and patients with different matching degrees.
The chance of retrieving a preliminary matching donor varies from patient to patient; a, B, DRB1 site 6/6 high score or low score, 9/10 match the primary match donor, appear HLA various sites match or mismatch probability difference in confirming the typing stage, so the current search method has the disadvantage of large blindness, high patient detection cost, etc.
The HLA allele and haplotype frequency of Chinese population are obviously different from those of other international ethnic groups, and although documents in foreign countries refer to the prediction of the donor coincidence probability by HLA genotyping, a complete retrieval system is not formed and is applied to clinical cases.
Disclosure of Invention
In order to solve the above problems of the prior art, the present invention provides a method for predicting HLA match probability and mismatch type.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for predicting HLA match probability and mismatch type, comprising the steps of:
(1) constructing an HLA database;
(2) recording and submitting genotypes to be compared, wherein the recorded genotypes are HLA-A, B, C, DRB1 and DQB1 locus genotypes;
(3) converting the format of the allele entered in step (2) to match the format in the HLA database:
(4) arranging and combining the genotypes subjected to format conversion in the step (3) to obtain a haplotype combination;
(5) comparing the haplotype combination obtained in the step (4) with the HLA database constructed in the step (1);
(6) and obtaining a prediction result through comparison.
Further, the HLA database comprises an HLA allele CWD database, an HLA haplotype frequency database and an HLA negative linkage database; wherein the HLA haplotype frequency database comprises 10/10 concordance prediction database, 9/10 concordance prediction database and a C, DQB1 prediction database.
Further, the HLA allele CWD database comprises genotype, C/WD/R and frequency; wherein C is a Common gene, WD is a confirmed gene, and R is a rare gene (reference: Common and well-documented HLA allols: report of the Ad-Hoc committee of the experimental facility for human immunology, 2007);
the 10/10 concordance prediction database comprises A-B-C-DRB1-DQB1 haplotype, A, B, C, DRB1, DQB1 genotype, frequency, sequence, C/WD/R;
the 9/10 concordant predictive databases include an A mismatch database, a B mismatch database, a C mismatch database, a DRB1 mismatch database, a DQB1 mismatch database; each mismatch database comprises mismatched haplotypes, mismatched genotypes corresponding to the haplotypes and frequency;
the C, DQB1 prediction database contains the haplotype of A-B-DRB1, the C, DQB1 genotype corresponding to the haplotype, and frequency.
Further, the number of haplotype combinations in step (4) is 2n-1And (b) a group, wherein n is the number of gene loci.
Further, the alignment mode of the step (5) comprises 10/10 concordance search, 9/10 concordance search, A, B, DRB1 locus 6/6 concordance search, allele CWD interpretation and negative linkage search.
A system for predicting HLA match probability and mismatch type, the system executes the above method for predicting HLA match probability and mismatch type, the system includes HLA database, genotype recording module, format conversion module, genotype combination module and comparison module;
the HLA database is used for storing reference data;
the genotype input module is used for inputting the genotypes to be compared and checking the formats of the genotypes;
the format conversion module is used for processing and converting the recorded alleles to unify the recorded alleles into a format matched with an HLA database:
the genotype combination module is used for arranging and combining the genotypes after format conversion to obtain haplotype combination;
and the comparison module is used for comparing the haplotype combination with the HLA database constructed in the step (1) to obtain HLA matching probability and mismatch types.
A storage medium for predicting HLA match probability and mismatch type, the storage medium performing the above-described methods of predicting HLA match probability and mismatch type.
A processor for predicting HLA match probability and mismatch type, the processor being configured to execute a program that executes the above-described method of predicting HLA match probability and mismatch type.
A device for predicting HLA match probability and mismatch type for implementing the above-described methods of predicting HLA match probability and mismatch type.
Has the advantages that: the method can predict the probability of searching HLA-10/10 matched donors or 8-9/10 matched donors in early disease, can predict the possible results of 6/6 high-grade or low-grade matched donors C and DQB1 at the site of A, B and DRB1, is favorable for clinically selecting 10/10 initial-screened donors with higher matched probability for confirmation typing, and can select the independent donors with the allowable mismatch optimum from 8-9/10 mismatched donors, thereby saving the detection cost of patients and having important influence on reducing the transplantation complications and improving the transplantation survival.
The method of the invention can be used for code development on different platforms and by adopting different programming languages, and is easy to realize, develop and popularize; closely meets the actual requirements in clinical transplantation work, has strong practicability, and can be extended to the fields of tumor immunity and immune diseases; the user can complete the prediction of all the results by only inputting the genotype results into the designated positions, the operation is simple, and the method has wide development and application prospects.
Drawings
FIG. 1 is a technical route flow diagram of the method of the present invention.
Detailed Description
The present invention is further described below with reference to specific examples, which are only exemplary and do not limit the scope of the present invention in any way. It will be understood by those skilled in the art that modifications or substitutions, additions or deletions to the details and forms of the present invention may be made without departing from the spirit and scope of the invention, and these modifications and substitutions are within the scope of the invention.
The invention starts from the requirement of the clinical transplantation field for the predicition of the result of the donor, and successfully converts the scientific research result into a retrieval and prediction tool which is easy to develop, operate and popularize on the basis of the deep understanding of the research results such as the frequency of HLA alleles and haplotypes, CWD, mutual relation, linkage disequilibrium and the like. Firstly, establishing a background reference database with a specific format for comparing with a foreground haplotype with a matching format virtualized by HLA genotype to realize a retrieval function; then, by setting the screening conditions and the preferred parameters, the probability of 10/10, 9/10 co-donors being retrieved by the patient, the possible outcome of 6/6 co-donors C, DQB1, and the possible mismatch types of donor patients are predicted.
The technical scheme of the invention comprises six main parts: establishing a background reference database, inputting and submitting foreground genotype results, processing genotype data formats, generating virtual haplotypes and linked genes, comparing retrieval data with the reference database, and giving a prediction result according to screening conditions and preferred parameters. The technical route flow chart of the method of the invention is shown in figure 1. The first part is to establish a CWD database, an HLA haplotype frequency database and an HLA linkage disequilibrium parameter database of Chinese population, to compile the databases into a format for retrieval and comparison, to serve as a background reference database, to hide the database contents to avoid leakage and error modification, to protect the database by setting a password, and to update and maintain the database after authorization.
(1) HLA allele CWD reference database format
Contains 3 columns A-C, which are genotype, C/WD/R and frequency respectively, and the format is shown in Table 1:
TABLE 1
Genotype(s) C/WD/R Frequency of occurrence
A*01:01 C 76477
A*01:03 WD 300
A*01:06 R 3
A*01:127 R 2
A*01:129 R 1
A*01:141 R 1
(2) HLA haplotype frequency and CWD reference database format
1)10/10 database of coherent predictions
The gene comprises 9 columns A to I, namely A-B-C-DRB1-DQB1 haplotype, A, B, C, DRB1 and DQB1 genotype, frequency, sequencing and C/WD/R, and the format is shown in Table 2:
TABLE 2
Figure BDA0002498079680000041
2)9/10 database of coherent predictions
The total of 5 mismatch databases are divided into A mismatch (B-C-DRB1-DQB1 haplotype), B mismatch (A-C-DRB1-DQB1 haplotype), C mismatch (A-B-DRB1-DQB1 haplotype), DRB1 mismatch (A-B-C-DQB1 haplotype) and DQB1 mismatch (A-B-C-DRB1 haplotype). Taking the A mismatch as an example, the A mismatch includes 3 columns of A-C, which are respectively the A mismatch (B-C-DRB1-DQB1 haplotype), haplotype corresponding to A genotype, frequency, and the format is shown in Table 3:
TABLE 3
Figure BDA0002498079680000042
3) C, DQB1 prediction database
The gene type and frequency of the gene type are respectively C, DQB1, the formats of which are shown in a table 4, wherein the gene type and frequency of the gene type are respectively C, DQB1 and C is C, DQB1 haplotype and frequency, and the formats of the gene type and frequency are shown in the following table 4:
TABLE 4
Figure BDA0002498079680000051
(3) HLA negative linkage reference database
Contains a total of 4 columns A to D, which are respectively a linked gene, D', r2 and a P value, and the format is shown in Table 5:
TABLE 5
Linkage gene D′ r2 P
A*01:01-C*01:02:01G -0.6784 0.0025 0.0002
A*02:01-C*06:02 -0.6332 0.0066 0.0000
A*02:03-C*03:04 -0.2285 0.0002 0.3161
A*02:03-C*08:01:01G -0.5882 0.0010 0.0172
A*02:06-C*03:02 -0.7246 0.0023 0.0003
And in the second part, filling out HLA-A, B, C, DRB1 and DQB1 locus genotype results in a genotype entry box, wherein the filling requirements comprise: (1) each locus comprises two alleles, and when one of the genotype input boxes contains content, the other genotype input box cannot be empty; (2) the filled-in content may contain only arabic numerals, 26 letters (case-insensitive), western colon ": "(the input Chinese colon is automatically converted to Western state), the letters are not allowed to follow the colon, and the total length of the input cannot exceed 13 bytes; (3) if the input boxes of A, B and DRB1 are indispensable items, if the items are not completely filled, the next retrieval cannot be carried out, and a prompt is given to complete filling of the results of the sites A, B and DRB1 and then submitting data is given, and (4) if only the results of A, B and DRB1 are filled, the function of predicting the possible results of C and DQB1 can be executed; all the results of five sites, namely A, B, C, DRB1 and DQB1, must be completely filled in to perform all the prediction functions. Examples of the genotype entry format are shown in Table 6 below:
TABLE 6
Figure BDA0002498079680000052
And the third part is used for processing and converting the input allele to unify the input allele into a format matched with the background database:
(1) taking the four position of the genotype
All the 'recorded alleles' only take the first colon and Arabic numerals before and after the colon, and if the last digit is G or P, the value is completely taken, for example: 02:06:01:01 or 02:06:01 or 02:06 became 02:06, 24:02:01:02L or 24:02:01L or 24:02L to 24:02, 35:108:02:01 or 35:108:02 or 35:108 to 35:108, 12:01:01G and still 12:01: 01G.
(2) G group judgment
The HLA G group is https:// www.ebi.ac.uk/ipd/imgt/HLA/public resources on the network, the invention extracts the G group name and the contained allele related in an HLA allele CWD database and an HLA haplotype frequency database, and compiles the HLA alleles in a unified format into a G group reference database comprising A, B two columns, wherein the two columns are respectively the G group name (example 04:01:01G) and the gene name (example 04:01 or 04: 82).
Comparing the genotype after four-digit selection in the step (1) with a background G group reference database B column, and if matching contents are searched, assigning the allele after four-digit selection to the contents of the A column, for example: 04:01 or 04:82 or 04:01:01G is changed into 04:01: 01G.
(3) Grabbing letter
The last letter is grabbed from "entered allele" and a null value is returned if the last byte is the letter G or P or not.
(4) Merging
This step results in two branches, branch one: the results of steps (2) and (3) are combined, example: 02:06:01:01 or 02:06 eventually becomes 02:06, 24:02:01:02L or 24:02:01L or 24:02L eventually becomes 24:02L, 35:108:02:01 or 35:108:02 or 35:108 eventually becomes 35: 108; and branch two: combining the results of steps (1) and (3).
(5) Add site
Adding sites to the branch I and the branch II of the combined allele in the step (4) respectively to obtain the plus-site genotype-1 and the plus-site genotype-2 "
The format of "locus-added genotype-1" is consistent with the HLA haplotype frequency and the allele format referred to in column a of the CWD reference database, and the results of the conversion of the "entered allele" by this step are shown in table 7 below:
TABLE 7
Figure BDA0002498079680000061
The "plus site genotype-2" format is consistent with the allele format referred to in column a of the allele CWD reference database, HLA negative linked reference database, and the results of the "entered allele" transformed by this protocol are shown in table 8 below:
TABLE 8
Figure BDA0002498079680000071
Fourthly, the genotype with added sites-1 is used for permutation and combination to virtualize 2n-1Group (2)nBars) theoretical haplotype results (n is the number of loci), for example:
(1) the A, B, C, DRB1 and DQB1 have 5 sites, and 16 groups (32 strips) of A-B-C-DRB1-DQB1 theoretical haplotypes can be virtualized;
(2) taking the lack of A sites as an example, when any 1 site is lacked, 8 groups (16 pieces) of B-C-DRB1-DQB1 theoretical haplotypes can be virtualized;
(3) taking the lack of C, DQB1 locus as an example, when any 2 loci are lacked, 4 groups (8 pieces) of A-B-DRB1 theoretical haplotypes can be virtualized. Alleles at each site were then pooled and separated by a western bar "-" for subsequent alignment with the reference database. The above three types of permutation and combination are shown in the following tables 9-11, respectively:
TABLE 9A-B-C-DRB1-DQB1 haplotype combinations
Figure BDA0002498079680000072
TABLE 10B-C-DRB1-DQB1 (lack of A site) haplotype combinations
Figure BDA0002498079680000081
TABLE 11A-B-DRB1 (lack of C, DQB1 site) haplotype combinations
Figure BDA0002498079680000082
Fifth, comparison of search data with reference database
(1) Allele CWD interpretation
And respectively searching 10 'genotype-2 with added sites' in the A column of the background genotype CWD reference database, and assigning the C/WD/R sum frequency numbers of the B column and the C column to a background designated position if a matched genotype is searched.
(2) Perfect match (10/10 congruence) retrieval
Searching A-B-C-DRB1-DQB1 haplotypes, searching A columns of background A-B-C-DRB1-DQB1 haplotype reference databases by using the 32 virtualized haplotypes, and assigning the genotypes, frequency, sequencing and C/WD/R of the B-J columns to the designated positions of a background calculation library if matched haplotypes are searched; if the matching class is not retrieved, assigning a null value; each combination comprises two haplotypes, when one haplotype in the combination retrieves the matching class content, the other haplotype is displayed at a specified position even if the haplotype is not matched, the haplotype CWD is assigned with a 'no match', and the contents such as the sequence, the frequency and the like are assigned with null values.
(3) One gene mismatch (9/10 consensus) search
A site mismatch retrieval, namely B-C-DRB1-DQB1 site haplotype retrieval lacking A site; b site mismatch retrieval, namely the haplotype retrieval of A-C-DRB1-DQB1 site lacking B site; c site mismatch retrieval, namely C site-lacking A-B-DRB1-DQB1 site haplotype retrieval; DRB1 locus mismatch retrieval, namely A-B-C-DQB1 locus haplotype retrieval lacking DRB1 locus; DQB1 site mismatch search, i.e. A-B-C-DRB1 site haplotype search lacking DQB1 site. There were 16 haplotypes for each mismatch type. Taking A site mismatch retrieval as an example, comparing all values of an A column in a background reference database with 16 virtual haplotypes mismatched with A one by one, and assigning the A genotypes and the frequency numbers of a B column and a C column to the specified positions of a background calculation database if the matched contents are retrieved; if no matching class container is retrieved, a null value is assigned.
(4) A, B, DRB1 site 6/6 consensus search
A-B-DRB1 haplotype retrieval, comparing all contents of A row of a background A-B-DRB1 haplotype reference database with 8 virtual haplotypes one by one, and if matching contents are retrieved, assigning results and frequency of C and DQB1 of B-E row to the designated position of a background calculation database; if no matching class container is retrieved, a null value is assigned.
(5) Negative linkage search
Combining the 'locus-added genotypes-2', namely combining the loci according to A-B, A-C, A-DRB1, A-DQB1, B-C, B-DRB1, B-DQB1, C-DRB1, C-DQB1 and DRB1-DQB1, comparing the loci with the sequence A of the background negative linkage reference database, and assigning the content of the sequence A to a specified position of the background if the matched content is searched.
The sixth part, the prediction result is given according to the screening condition and the optimized parameter and is displayed in the foreground
(1) Allele CWD interpretation
C/WD/R and the sample are displayed in a foreground genotype CWD display frame, and C only needs to display C, WD or R and needs to display WD (frequency) or R (frequency).
(2) Perfect match (10/10 congruence) retrieval
1)10/10 calculation of the theoretical probability of coincidence: half of the sum of the Frequency of two haplotypes in a set (HF) is the theoretical probability that the Haplotype combination would be 10/10 compatible (HMP), i.e., (HF1+ HF 2)/2. The total Probability of Matching of 16 haplotypes 10/10 (THMP) is the sum of 16 HMPs, i.e., THMP ∑ HMPi, i ═ 16. The THMP value is displayed in the foreground "10/10 match probability prediction" display box.
2)10/10 judgment of chance of coincidence
It is divided into 9 levels of high +, high-, middle +, middle-, low +, low and low-. Dividing 10/10 chance of haplotype combination into three levels of high, middle and low, if two haplotypes are Common or WD, then high; one is Common or WD, and the other is Rare, no match or null value, and is 'middle'; both are Rare, no match or null, then "low". Dividing the high, medium and low into 'high +, high-, medium +, medium-, low +, low-' according to the HMP value of the combination, and dividing the 'high' into 'high +, high-' by taking the HMP value of 0.4-0.2%; dividing the 'middle' into 'middle +, middle and middle-' by taking the HMP value of 0.1% -0.05% as a boundary; dividing "low" into "low +, low-" with HMP value of 0.05% -0.02%. Thirdly, counting the frequency of high +, high-, middle +, middle-, low +, low-, and low- ". Fourthly, evaluating the general opportunity of 16 haplotypes 10/10 for combination, when the same grade appears twice or more, increasing the grade upwards by one grade, and finally taking the highest grade as the total grade, and displaying the total grade in the display box of the front-end 10/10 probability prediction.
3) The haplotype combinations retrieved are ranked from high to low according to the theoretical probability of 10/10 concordance and displayed in the foreground along with the corresponding CWD, rank, frequency and chance.
(3) One gene mismatch (9/10 consensus) search
1) The searched A, B, C, DRB1 and DQB1 mismatched haplotypes are arranged from large to small according to frequency;
2) obtaining the theoretical Probability (HMMP) of the mismatch type by using frequency/total number of haplotypes in the database/16;
3)9/10 calculating the total probability of combination, i.e. THMP (HMMP) being the mean of the sum of HMMP at A, B, C, DRB1 and DQB1 sites and THMPA+HMMPB+HMMPC+HMMPDRB1+HMMPDQB1+THMP)/2;
4)9/10 judging the matching opportunity, dividing the THMMP into three layers of high, medium and low by taking the value of 1-0.1% of the THMMP as a boundary;
5) the HMMP value, the total grade, the A, B, C, DRB1 and the probability that each locus of DQB1 is possible to mismatch the genotype are displayed in the foreground.
(4) Prediction of C, DQB1 site results when A, B, DRB1 site 6/6 are combined
1) The 8 (4) virtual haplotypes, the C, DQB1 results retrieved, were paired intra-group, for example: haplotype-1 and haplotype-2 form a group, haplotype-1 retrieves 2 types of C, DQB1 results are respectively C, DQB1-1, C, DQB1-2, haplotype-2 retrieves 2 types of C, DQB1 results are respectively C, DQB1-3, C, DQB1-4, and the group C, DQB1 results have 4 possibilities, namely C, DQB1-1& C, DQB1-3, C, DQB1-1& C, DQB1-4, C, DQB1-2& C, DQB1-3, C, DQB1-2& C, DQB 1-4.
2) Summarizing the possible C and DQB1 results of the 4 groups of virtual haplotypes, sorting the results from large to small according to frequency, and using the probability of one C and DQB1 result as frequency/total number of haplotypes in the database/2;
3) all possible combinations and probabilities of C, DQB1 are displayed in the foreground.
(5) Negative linkage search
Example 1
10/10 the matching probability is rated as "high +" for example, the combination one is 2 Common, so it is judged as "high" first, then it is judged as "high +" according to the HMP value > 0.4% of the combination; the combination II is 1 Common and 1 Rare, so the combination is firstly judged as 'middle', and then the combination is judged as 'middle' according to the condition that 0.1% > HMP value is more than 0.05%; the combination three is 2 Rare, so the combination is judged to be 'low', and then the combination is judged to be 'low' according to the condition that 0.05% > HMP value is more than 0.02%; the combination of three and four is 1 Rare and 1 Rare without matching, so the combination is judged as 'low' firstly, and then the combination is judged as 'low-' according to the HMP value of less than 0.02%. Two low- 'are gradually increased to 1 low', then are gradually increased to 1 low + 'with the original 1 low', and finally the highest level 'high +' is taken as 10/10 to be combined for final rating; 9/10 Total probability of agreement THMMP > 1%, so the rating is "high".
TABLE 12
Figure BDA0002498079680000111
Example 2
10/10 the coincidence probability is rated "high-" for example, 3 "middle +" is increased upwards to 1 "high-" so the highest level "high-" is finally taken as 10/10 coincidence final rating; 9/10 Total probability of agreement 1% > THMMP > 0.1%, so the rating is "Medium".
Watch 13
Figure BDA0002498079680000112
Example 3
10/10 the match probability rating "center +" for example, the reason why the predicted result of C, DQB1 is empty is that there are 1 non-matches in each group of haplotypes, so there is no result in the arranged combination step in the combination, which can refer to the predicted result of C, DQB1 in 9/10 the match.
TABLE 14
Figure BDA0002498079680000113
Example 4
10/10, the probability rating of the combination is "low +" for example, 3 "low-" are increased to 1 "low", and the 3 "low-" are increased to 1 "low +" with the original 2 "low", so the highest level "low +" is finally taken as the final rating; 9/10 Total probability of agreement THMMP < 0.1%, so the rating is "low".
Watch 15
Figure BDA0002498079680000121
Example 5
As an example of the genotype CWD interpretation results, genotype A24: 10 is WD and therefore the number of examples is required, and the rest are C and therefore the number of examples is not required.
TABLE 16
Figure BDA0002498079680000122
Example 6
A, B, DRB1 site 6/6 consensus donor C, DQB1 results prediction example, C, DQB1 predicted 2 groups of results, but the first group of results was significantly more probable than the second group.
TABLE 17
Figure BDA0002498079680000123
In each of examples 1 to 4, irrelevant donor search was performed, and the irrelevant donor search page only displays the first 15 search donors at a time, and when the irrelevant donor search page exceeds 15, the page is not displayed for a while, when the irrelevant donor search page is less than 15, the page is displayed. Therefore, the result displayed on the retrieval page is compared with the prediction result of the invention to verify the reliability of the prediction of the invention. These cases are 10/10 matched and 9/10 matched predictions and mismatch types are highly consistent with actual chances of patient retrieval of unrelated donor retrieval, and the mismatch types shown on the web page fall largely within the predicted mismatch types, as detailed in the following table:
watch 18
Figure BDA0002498079680000131
The patient of example 6 was confirmed typed with two matched donors at position 6/6 of a, B, DRB1, and the predicted outcome of C, DQB1 was compared to the actual outcome of confirmed typing, which was in full agreement with the predicted high probability, as shown in the table below:
watch 19
Figure BDA0002498079680000132

Claims (9)

1. A method for predicting HLA match probability and mismatch type, comprising the steps of: (1) constructing an HLA database; (2) recording and submitting genotypes to be compared, wherein the recorded genotypes are HLA-A, B, C, DRB1 and DQB1 locus genotypes; (3) converting the format of the allele recorded in the step (2) into a format matched with the format in an HLA database; (4) arranging and combining the genotypes subjected to format conversion in the step (3) to obtain a haplotype combination; (5) comparing the haplotype combination obtained in the step (4) with the HLA database constructed in the step (1); (6) and obtaining a prediction result through comparison.
2. The method of claim 1, wherein the HLA database comprises an HLA allele CWD database, an HLA haplotype frequency database, and an HLA negative-linked database; wherein the HLA haplotype frequency database comprises 10/10 concordance prediction database, 9/10 concordance prediction database and a C, DQB1 prediction database.
3. The method of claim 2, wherein the HLA allele CWD database comprises genotype, C/WD/R, frequency;
the 10/10 concordance prediction database comprises A-B-C-DRB1-DQB1 haplotype, A, B, C, DRB1, DQB1 genotype, frequency, sequence, C/WD/R;
the 9/10 concordant predictive databases include an A mismatch database, a B mismatch database, a C mismatch database, a DRB1 mismatch database, a DQB1 mismatch database; each mismatch database comprises mismatched haplotypes, mismatched genotypes corresponding to the haplotypes and frequency;
the C, DQB1 prediction database contains the haplotype of A-B-DRB1, the C, DQB1 genotype corresponding to the haplotype, and frequency.
4. The method of claim 1, wherein the number of haplotype combinations in step (4) is 2n-1And (b) a group, wherein n is the number of gene loci.
5. The method of claim 2, wherein the alignment of step (5) comprises 10/10 concordance search, 9/10 concordance search, A, B, DRB1 locus 6/6 concordance search, allele CWD interpretation and negative linkage search.
6. A system for predicting HLA match probability and mismatch type according to any one of claims 1 to 5, the system comprising an HLA database, a genotype recording module, a format conversion module, a genotype combination module, and an alignment module;
the HLA database is used for storing reference data;
the genotype input module is used for inputting the genotypes to be compared and checking the formats of the genotypes;
the format conversion module is used for processing and converting the recorded alleles to unify the recorded alleles into a format matched with an HLA database;
the genotype combination module is used for arranging and combining the genotypes after format conversion to obtain haplotype combination;
and the comparison module is used for comparing the haplotype combination with the HLA database constructed in the step (1) to obtain HLA matching probability and mismatch types.
7. A storage medium for predicting HLA match probability and mismatch type, the storage medium performing the method of any one of claims 1-5.
8. A processor for predicting HLA match probability and mismatch type, the processor being configured to execute a program, the program executing the method for predicting HLA match probability and mismatch type according to any one of claims 1-5.
9. An apparatus for predicting HLA match probability and mismatch type according to any one of claims 1-5, wherein the apparatus is used for implementing the method for predicting HLA match probability and mismatch type according to any one of claims.
CN202010424265.XA 2020-05-19 2020-05-19 Method for predicting HLA match probability and mismatch type Active CN111613269B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010424265.XA CN111613269B (en) 2020-05-19 2020-05-19 Method for predicting HLA match probability and mismatch type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010424265.XA CN111613269B (en) 2020-05-19 2020-05-19 Method for predicting HLA match probability and mismatch type

Publications (2)

Publication Number Publication Date
CN111613269A true CN111613269A (en) 2020-09-01
CN111613269B CN111613269B (en) 2024-01-05

Family

ID=72198908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010424265.XA Active CN111613269B (en) 2020-05-19 2020-05-19 Method for predicting HLA match probability and mismatch type

Country Status (1)

Country Link
CN (1) CN111613269B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103221551A (en) * 2010-11-23 2013-07-24 深圳华大基因科技有限公司 HLA genotype-SNP linkage database, its constructing method, and HLA typing method
JP2015228819A (en) * 2014-06-04 2015-12-21 ジェノダイブファーマ株式会社 Dna typing method for hla gene, and computer program used for data analysis of the same method
CN106434863A (en) * 2016-06-15 2017-02-22 广州医科大学附属第二医院 Method for identifying HLA-DQB1 exon 2 haplotype
CN108241792A (en) * 2016-12-23 2018-07-03 深圳华大基因科技服务有限公司 A kind of method and apparatus for integrating multi-platform genotypic results
CN110423818A (en) * 2019-08-07 2019-11-08 南京实践医学检验有限公司 A kind of primer, probe, kit and method for HLA-LOSS detection and individual identification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103221551A (en) * 2010-11-23 2013-07-24 深圳华大基因科技有限公司 HLA genotype-SNP linkage database, its constructing method, and HLA typing method
JP2015228819A (en) * 2014-06-04 2015-12-21 ジェノダイブファーマ株式会社 Dna typing method for hla gene, and computer program used for data analysis of the same method
CN106434863A (en) * 2016-06-15 2017-02-22 广州医科大学附属第二医院 Method for identifying HLA-DQB1 exon 2 haplotype
CN108241792A (en) * 2016-12-23 2018-07-03 深圳华大基因科技服务有限公司 A kind of method and apparatus for integrating multi-platform genotypic results
CN110423818A (en) * 2019-08-07 2019-11-08 南京实践医学检验有限公司 A kind of primer, probe, kit and method for HLA-LOSS detection and individual identification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J-M TIERCY 等: "Bone marrow transplantation with unrelated donors: what is the probability of identifying an HLA-A/B/Cw/DRB1/B3/B5/DQB1-matched donor?", BONE MARROW TRANSPLANTATION, vol. 26, pages 437 - 441, XP037754740, DOI: 10.1038/sj.bmt.1702529 *
高素青 等: "应用不同类型白血病患者群体HLA-A、B和DRB1 遗传学数据评估HLA相合供者的概率∗", 中国输血杂志, vol. 22, no. 5, pages 358 - 362 *
高素青 等: "应用不同类型白血病患者群体HLA-A、B和DRB1遗传学数据评估HLA相合供者的概率", 中国输血杂志, vol. 22, no. 5, pages 358 - 362 *

Also Published As

Publication number Publication date
CN111613269B (en) 2024-01-05

Similar Documents

Publication Publication Date Title
Lowy-Gallego et al. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project
Ritchie et al. The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation
Schneider et al. Arlequin ver. 2.000
Krings et al. mtDNA analysis of Nile River Valley populations: A genetic corridor or a barrier to migration?
Degli-Esposti et al. Ancestral haplotypes: conserved population MHC haplotypes
Ortiz-Fernández et al. Identification of susceptibility loci for Takayasu arteritis through a large multi-ancestral genome-wide association study
Buhler et al. The heterogeneous HLA genetic makeup of the Swiss population
Querol et al. Cord blood stem cells for hematopoietic stem cell transplantation in the UK: how big should the bank be?
CN107169310B (en) Gene detection knowledge base construction method and system
Schipper et al. The probability of finding a suitable related donor for bone marrow transplantation in extended families [see comments]
WO2003048318A2 (en) Methods for the identification of genetic features
CN110752041B (en) Method, device and storage medium for predicting neoantigen based on second-generation sequencing
JP5710674B2 (en) Selection of blood donors with blood group identification by cross-test for blood type-identified recipients
Taskent et al. Analysis of haplotypic variation and deletion polymorphisms point to multiple archaic introgression events, including from Altai Neanderthal lineage
Nunes et al. How ancestry influences the chances of finding unrelated donors: an investigation in admixed Brazilians
Gilbert et al. Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank
Siol et al. EggLib 3: A python package for population genetics and genomics
Song et al. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database
CN111613269A (en) Method for predicting HLA matching probability and mismatch type
Sajulga et al. Assessment of HLA-DPB1 genetic variation using an HLA-DP tool and its implications in clinical transplantation
Excoffier et al. An integrated software package for population genetics data analysis
Naseri et al. Discovery of runs-of-homozygosity diplotype clusters and their associations with diseases in uk biobank
Leiva‐Torres et al. High prevalence of weak D type 42 in a large‐scale RHD genotyping program in the province of Quebec (Canada)
Sanchez-Mazas et al. The most frequent HLA alleles around the world: A fundamental synopsis
Aronson et al. Using GPT-4 Prompts to Determine Whether Articles Contain Functional Evidence Supporting or Refuting Variant Pathogenicity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant