NL1044005B1

NL1044005B1 - Method for analysing human blood group genotype based on high-through sequencing, and application thereof

Info

Publication number: NL1044005B1
Application number: NL1044005A
Authority: NL
Inventors: Hong Wenxu; Xu Yunping; Liu Jinhong; Peng Long; Lu Liang; Wu Fan; Su Yuqing; Zhu Weigang; Liang Shuang; Liang Yanlian
Original assignee: Shenzhen Blood Center Shenzhen Inst Of Transfusion Medicine
Priority date: 2020-04-22
Filing date: 2021-04-20
Publication date: 2022-06-03
Also published as: NL1044005A; CN111534602A

Abstract

The invention discloses a method for analysing human blood group genotype based on high-throughput sequencing, which belongs to the field of bioinformatics. The invention first obtains high-throughput sequencing data of human blood sample DNA, and further processes the sequencing data by sequence comparison, mutation detection, and gene annotation to complete blood group genotype analysis and verification. The invention fist establishes a cloud platform for typing whole gene sequencing blood group by adopting NGS technology, reveals the molecular mechanism of human's GPA, GPB, and GPE, and performs sequencing analysis on the complex genes of GP (A-B-A), GP (B-A-B), and GP (A-B) , analyses the polymorphism characteristics of glycoprotein heterozygous genes in detail by using BWA/GATK and other bioinformatics software, and determines the corresponding glycoprotein molecular type to over-express the mutant gene and verify the antigen type of the MNS system, thereby solving difficult problenns in diagnosing and curing clinical blood transfusion reactions and immune diseases caused by heterozygous glycoprotein polymorphism molecules.

Description

ref: P 2021 NL 009 TITLE: METHOD FOR ANALYSING HUMAN BLOOD GROUP GENOTYPE BASED ON HIGH-THROUGH SEQUENCING, AND APPLICATION

THEREOF

TECHNICAL FIELD OF THE INVENTION The invention belongs to the field of bioinformatics, and specifically relates to a method for analysing human blood group genotype based on high-throughput sequencing, and application thereof,

DESCRIPTION OF THE PRIOR ART As blood group glycoprotein in human red blood cells, GPA, GPB, and GPE are all erythrocyte transmembrane glycoprotein, the coding genes corresponding fo which are arranged on Chromosome 4 in the order of 5-GYPA-GYPB-GYPE-3'. The three have 95% homology. The mutations between each other such as exchange, fusion, recombination, deficiency in the genetic process will generate new antigenic determinants on the surface of red blood cells. The base sequences of the three genes determine polymorphism of MNS blood group system antigen in human red blood cells. The MNS blood group system is the second blood group system discovered by humans after the ABO blood group. The MNS blood group is closely related to clinical blood transfusion and immune hemolytic diseases, and is of great significance in organ transplantation, forensic identification and human population genetics. For red blood cell as a main component of blood, and blood as a special drug, adverse reactions of transfusing them occur more frequently and more harmiully than adverse drug reactions, and have a higher mortality and morbidity rate. At present, only 46 kinds of the MNS blood group system antigens are identified by people, of which at least 16 kinds are derived from genetic recombination, and a lot of specificity of MNS blood group antigens determined by heterozygous glycoproteins such as GP (AB), GP (BA), GP (ABA) or GP (BAB) has not been clearly identified, more than that these antigens not identified as a kind can cause difficulties in diagnosing heterotypic transfusion reactions, rejection of allogeneic organ transplantation, intrauterine hemolysis of fetus, and neonatal alloimmune diseases, and pose a serious threat to human health.

identifying the kind of MNS blood group antigen expressed by heterozygous glycoprotein and the specificity of antigens has always baffled people in diagnosing and curing the clinical blood transfusion and the hemolytic diseases. At present, the detection methods pointed at the MNS blood group system mainly adopt serological methods, simple PCR-SSP method, first-generation sequencing method, DNA probe colorimetric detection and other methods. None of these methods can accurately identify the mutations of MNS blood group gene and their corresponding antigens, There are still many variant antigens that have not been identified. At the same time, due to the lack of antibody reagents for the MNS blood group system, none of laboratory can complete the serological detection of this blood group, resulting in difficulties in the diagnosing and curing the clinical blood fransfusion and the immunological diseases.

SUMMARY OF THE INVENTION The purpose of the invention is a cloud platform for typing whole gene sequencing blood group by adopting NGS technology to reveal the molecular mechanism of human's GPA, GPB, and GPE, and perform sequencing analysis on the complex genes of GP (A-B-A}, GP {B-A-B}, and GP (A-B) to determine the polymorphism of heterozygous glycoprotein molecular, and solve the difficult problems of clinical blood transfusion and immunological disease diagnosis caused by MNS blood type.

For that, the first aspect of the invention provides a method for analysing human blood group genotype based on high-throughput sequencing, and application thereof that include the following steps: S1, obtaining or collecting genomic DNA of human blood samples; S2, performing high-throughput sequencing on the genomic DNA; 33, after pre-processing the sequenced data, using a sequence comparison software for comparison, and using a mutation detection software for mutation detection; S4, obtaining a mutant gene by cloning, and putting it into K562 cells for overexpression, and then using Western-Blot to verify the antigen type; S5, obtaining analysis data of blood group genotype through association analysis.

in some embodiments of the invention, in the step 52, a sequencing platform including but not limited to Illumina HiSeq is used to perform high-throughput sequencing.

in some embodiments of the invention, the sequence comparison sofware in the step S3 is BWA.

In some embodiments of the invention, the mutation is selected from at least one of SNP, CNV, Indel, and SV. In some embodiments of the invention, the blood group genotype refers to the MNS blood group genotype. In some embodiments of the invention, the comparison in the step S3 is performed for GYPA, GYPB, and GYPE.

In some embodiments of the invention, the step 84 further includes the step of using Blood Typer software to predict the phenotype-related antigens of fusion molecules such as GPA, GPB, and GPE on the sequencing results.

In some embodiments of the invention, the step S3 also includes performing comparison and mutation detection on the full-length sequences of the fusion genes such as GYP (A-B-A), GYP (B-A-B), GYP (A-B}, and GYP (A-B).

In some specific embodiments of the invention, the full-length sequence includes exons, introns, and UTR region sequences. Thereby, we can analysing the polymorphism characteristics of glycoprotein hybrid genes, determining the corresponding glycoprotein molecular type, overexpress the mutant gene and verify the antigen type of the MNS system.

in some embodiments of the present invention, before the step 53, the method further includes a step of performing quality control on the sequencing.

In some specific embodiments of the invention, SOAPnuke is executed for the quality control.

In the invention, case-control is used to confirm mutated basic group and perform association analysis on data: performing association analysis on coding region genes, non-coding region genes, regulatory genes, and potentially associated genes that GPA, GPB, GPE of all samples are corresponding to, and performing population-specificity screening to exclude association sites caused by ethnic specificity by using the Chinese Han population data of the Thousand-Person Genome.

The second aspect of the invention provides the application of the reagent used in the method of the first aspect of the invention to prepare a detection kit for analysing human blood group genotype. The third aspect of the present invention provides a system for analysing human blood group genotype based on high-throughput sequencing, which includes: a data-storing unit used for storing sequencing data, a data-processing unit connected with the data-storing unit, and used for processing the sequencing data, Including for sequence comparison, mutation detection, and gene annotation; a blood group genotype database unit connected with the data-processing unit, and used for storing blood group genotype data, wherein, when the data-processing unit finds a new blood group genotype, the blood group genotype is uploaded to the blood group genotype database unit to complete updating the blood group genotype database.

In some embodiments of the invention, the blood group genotype database refers to the MNS blood group genotype database.

The detailed technical solution of the invention is as follows: obtaining the full-length sequence polymorphism and allele frequency distribution of the genes that are corresponding to GPA, GPB, and GPE molecules from blood samples of a random population. In the invention, we randomly select blood samples from voluntary blood donors in a blood centre, adopt blood group serology to identify MNS blood group and detect the dose effect of MNS antigens on red blood cells, and select representative specimens to analyze MNS blood group genes such as GYPA, GYPA, GYPA, and GYPA by adopting high-throughput sequencing (NGS) technology, then use PE150 (lllumina platform) to perform sequencing analysis on the inserted gene fragment, and use SOAPnuke software to perform quality control analysis on the off-machine data after obtaining the genetic test data, finally clarify the molecular characteristics of human's red blood cell GPA, GPB, and GPE by using BWA software to perform sequence comparison.

Analysing the correlation between the polymorphisms of GPA, GPB, GPE molecular and the immunogenicity of MNS antigen expression.

We analyse the intrinsic relationship and the genetic characteristics between the nucleotide mutation and the antigen expression in the full-length sequence of GYPA, GYPB, and GYPE of the sample, and determine the correlation of GYPA, GYPB, and GYPE genes with MNS blood group antigens and immune antibodies,

respectively, based on their pathogenicity in clinical blood transfusion, and fetal and neonatal alloimmunization.

Versifying for reorganization.

We put a mutant gene into K562 cells for over-expression and verify it by Western-Blot, then put the new gene into K562 cells for over-expression, and extract the expressed antigen protein to immunize animals and prepare antibody serum to verify the antigen type.

Building human erythrocyte blood group database by a cloud platform for typing whole genome blood group.

We can screen all human blood group genes, by using the cloud platform for typing whole genome blood group established by the applicant for the first time in the national blood station system, and build a rare blood group database to solve the difficult problems of clinical blood transfusion and immunological disease diagnosis by using whole gene sequencing data.

The beneficial effects of the invention Compared with the prior art, the invention has the following beneficial effects: (1) We can accurately identify the glycoprotein type and the antibody produced by its immunization by precise molecular typing, and identify the structure and sequencing characteristics of genes related to red blood cell glycoprotein GPA, GPB, GPE by adopting high-throughput sequencing technology and cloud platform, and clarify the antigen type corresponding to the GYPA/GYPB/GYPE gene to solve a series of difficult problems such as clinical blood transfusion and alloimmune disease diagnosis caused by glycoprotein GPA, GPB, and GPE antigens.

(2) The invention can be used to reveal the correlation between the genes related to human erythrocyte heterozygous glycoprotein and the immunogen of MNS blood group antigen on red blood cells.

(3) The invention reveals the molecular mechanism of human GPA, GPB, and GPE through NGS technology by establishing the cloud platform for typing whole genome blood group, and performs sequencing analysis on the complex genes of GP(A-B-A}, GP(B-A-B), and GP(A-B) to determine the molecular polymorphisms of heterozygous glycoproteins, and solve the difficult problems of clinical blood transfusion and immunological disease diagnosis caused by MNS blood type.

(4) The present invention adopts high-throughput sequencing technology to sequence human blood group whole genome, and the related technology is applied to building the rare blood group database, in order to provide powerful technical support for safety of clinical blood transfusion, and prevention, diagnosis and treatment of immune hemolytic diseases.

BRIEF DESCRIPTION OF THE DRAWINGS FIG, 1 shows a technical roadmap of one embodiment in the invention. FIG.2 shows the coverage of genes related to the whole blood group system by whole genome high-throughput sequencing. FIG.3 shows the whole genome analysis process, including analysis for sequenced data such as quality control, comparison statistics, mutation detection, and mutation site annotation. F1G.4 shows the analysis results of 30x whole genome sequencing data. FIG.5 shows the SNP and INDEL gene annotation information at the site of the GYPA gene interval of one sample (Sample_1). Note: The ellipse marks the mutated basic group. The arrow indicates whether the mutation site causes a missense mutation, and the horizontal line indicates the gene region (for example, protein coding region, promoter region, etc.) affected by the mutation site. F1G.6 shows the summary of the annotation results and mutation detection for the whole genome SNV of Sample 1. FIG.7 shows the analysis for mutated basic groups of 49 samples with a total of 10,782,324 mutations. FIG.8 shows the combined mutations co-existing in 49 samples. FIG.9 shows the storage, server, and database architecture of the nucleic acid sequencing analysis platform. FIG. 10 shows the coverage and depth of 36 blood groups, nearly 60 blood group genes and related gene detection.

DETAILED DESCRIPTION OF THE INVENTION In order to make the technical problems, technical solutions, and beneficial effects solved by the invention clearer and more comprehensible, the invention will be further described in detail as below in conjunction with embodiments. Embodiment The following examples are used here to demonstrate preferred embodiments of the invention. The person skilled in the art will understand that the technology disclosed in the following examples represents the technology discovered by the inventor that can be used to implement the invention, and therefore can be regarded as a preferred solution for implementing the invention. However, a skilled person in the art should understand from this description that many modifications can be made to the specific embodiments disclosed herein, and the same or similar results can still be obtained without departing from the essence or scope of the invention.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by a skilled person in the art to which the invention belongs. The public quotations hereby and the materials cited by them will be incorporated by reference.

Those skilled persons in the art will be aware of or understand many equivalent techniques of the specific embodiments described in the invention herein by routine experimentation. These equivalents will be included in the claims.

Embodiment 1 The technical route of this embodiment is shown in FIG, 1. The specific implementation is as follows: We randomly select 3,000 blood samples from voluntary blood donors and 175 representative samples and family survey samples accumulated at ordinary times. First, blood group serology is adopted to detect the MNS blood group antigen type corresponding to erythrocyte glycoprotein and the dose of erythrocyte surface antigen (agglutination intensity with antibody serum), and the representative serological samples are selected to collect DNA for performing high-throughput whole-genome sequencing.

We adopted high-throughput sequencing technology (NGS) to randomly interrupt the genomic DNA of selected samples, and performed recovery of DNA fragments of the required length (0.2~5kb) by electrophoresis , terminal repair- addition of adenylate deoxyribonucleotide, linker connection, polymerase chain reaction, single-chain cyclization to prepare nanobalHoading chip, then sequenced the inserted gene fragment by adopting the method of the PE100 (BGISEQ) platform. The whole genome coverage rate of one sample is shown in FIG.2. The whole genome coverage rate reaches 95%, and the 20% coverage rate is 80%, and the whole genome coverage rate is good.

Obtaining 30xWhole Genome Sequencing (WGS) data, we used SOAPnuke software to perform quality control analysis on off-machine data, filter low-quality Reads, and obtain Q20 qualified Reads at least 90% and Q30 qualified Reads at 85%. We focused on analysing the full-length sequence of GYPA, GYPB, GYPE genes and fusion genes GYP(A-B-A), GYP (B-A-B), GYP(A-B), GYP(A-B)

: corresponding to GPA, GPB, and GPE molecules, including all the information of exons, introns and UTR region sequence, as shown in FIG.3. The inventor tested 30x whole genome sequencing data, the blood group gene detection rate is 100%, the average depth ís 30.3x, and the average coverage rate is 100% for each gene.

The coverage rate of genes GYPA/GYPB/GYPE related to MNS blood group is

98.3%, and the average depth is 25x, as shown in FIG.4. It can meet the requirement of research on the molecular polymorphism of human blood group glycoprotein, namely requirement of research on the knotty blood group of MNS blood group.

We performed sequence comparison on GYPA/GYPB/GYPE by using BWA software, and detected the mutations of genes corresponding genes of individual GPA, GPR, and GPE molecules, such as SNP, CNV, Indel, SV by GATK, and annotated the mutation sites and analysed the mutation types of GPA, GPB, and GPE molecular genes of different individuals by means of bioinformatics. The test result of one sample is shown in FIG.5, and the annotation information is shown in FIG.6.

We cloned the sequencing analysis of the mutation region, performed prediction of the related antigens of GPA, GPB, GPE fusion molecular phenotypes {for example, MNS blood group) on the sequencing result by using Blood Typer software, and put new genes into K562 cells for overexpression and verification, then performed summary and classification combined with clinical blood transfusion reactions, and prevention, diagnosis and follow-up investigation of alloimmune diseases.

We analysed the mechanism between the GPA/GPB/GPE fusion gene and the expression of glycoprotein molecules on red blood cells, and clarified the correlation between the molecular mechanism of human hybrid glycophorin and the clinical alloimmunity.

By establishing a cloud platform for analysing full-length gene sequence of human blood group, and adopting modern cell biology and molecular biology experimental techniques such as the WGS system, the inventors completed the full- length gene detection of 49 samples of MNS blood group antigen with dose effect, and completed the full-length sequence of the 49 samples related to genes GYPA/GYPBI/GYPE, and performed confirmation of mutated basis group and association analysis of data by using case-control. The inventor has performed association analysis on coding region genes, non-coding region genes, regulatory genes, and potentially associated genes that GPA, GPB, GPE of all samples are corresponding to, and performed population-specificity screening to exclude association sites caused by ethnic specificity by using the Chinese Han population data of the Thousand-Person Genome. For the first time, 49 samples of M antigen with dose effect were found to have the same combination mutations {as shown in FIG.7 and FIG.8).

Embodiment 2 establishment of cloud platform As shown in FIG. 8, the inventor has established a cloud platform for typing whole genome blood group, including an O88 cloud storage module, an ESC cloud service module, and an RDS cloud data module.

The OSS cloud storage module unit has a storage capacity of 107, used for storing sequenced data storage, and can be used for long-term storage and read- write operation of data.

The ECS cloud data module is a cloud server with 24-core 96GB running memory, 500GB high-efficiency SSD hard disk, and having parallel computing capability for multiple whole genome to meet the requirement of multiple biological information calculations such as sequence comparison, mutation detection, and gene annotation.

The RDS cloud database will archive blood group genetic information already researched and newly issued and form a database thereof, to establish a blood group information database.

The prior art adopted the whole-genome sequencing typing algorithm Blood Typer to predict 38 red blood cell antigen phenotypes of 12 blood groups, and its accuracy consistence with initial serological and SNP results are 99.5%. This method can solve the blood group typing prediction of such nucleic acid polymorphism and copy number variation as known . For new discoveries, knotty blood groups, etc, it is still unable to meet the requirement of typing prediction, especially the knotty and unknown MNS type high homologous gene regions (GYPA/GYPB/GYPE) cannot be effectively typed.

The Inventor tested and analyzed the 10x whole-genome sequencing data. The blood group gene detection rate is as high as 98% (56/57), the average depth was 4.6%, and the average coverage of each gene is as high as 94%. Among them, the coverage of genes GYPA/GYPB/GYPE related to MNS blood group reaches

98.3%, with an average depth of 4.4%, as shown in FIG. 10.

Finally, the most complete blood group gene database, including 39 genotyping data such as MNS, ABO, RH is produced, by combining the latest international ISBT blood group subtype data, and the human genome diversity blood group subtype model of multiple omics projects such as Thousand Human Genome and others.

All documents mentioned in the invention are cited as references in this application, as if each document was individually cited as a reference.

In addition, it should be understood that after reading the above content of the invention, a skilled person in the art can make various changes or modifications to the invention, and these equivalents also fall within the scope defined by the claims attached to the application.

Claims

CONCLUSIONS

A method for analyzing human blood group genotype based on high throughput sequencing, comprising the steps of: S1, obtaining or collecting genomic DNA from samples of human blood; 82, performing high throughput sequencing on the genomic DNA; S3, after preprocessing the sequenced data using sequence comparison software for comparison and using mutation detection software for mutation detection; S4, obtaining a mutant gene by cloning and placing it in K582 cells for overexpression and then verifying the antigen type by Western blot; S5, obtaining blood group genotype analysis data by association analysis.

The method of claim 1, wherein the mutation is selected from at least one of SNP, CNV, Indel and SV.

The method of claim 2, wherein the comparison in step S3 is performed for GYPA, GYPB and GYPE.

The method of claim 3, wherein step 54 further comprises the step of using Blood Typer software to predict the phenotype related antigens of fusion molecules such as GPA, GPB and GPE on the sequence results.

The method of claim 4, wherein step S3 also comprises performing comparison and mutation detection on the full length sequences of the fusion genes such as GYP (A-B-A), GYP (B-A-B), GYP (A-B) and GYP (A-B).

The method of claim 5, wherein the full length sequence comprises exons, introns and UTR region sequences.

A method according to any one of claims 1 to 6, before step S3, the method further comprising a step of performing quality control on the sequencing.

Use of the reagent used in the method of claim 1 to prepare a detection kit for analyzing human blood group genotype.

A system for analyzing human blood group genotype based on high throughput sequencing, comprising: a data storage unit used for storing sequence data, a data processing unit connected to the data storage unit and used for processing the sequence data, including for sequence comparison, mutation detection and gene annotation; gene blood group genotype database unit connected to the data processing unit and used for storage of blood group genotype data, where when the data processing unit finds a new blood group genotype, the blood group genotype is uploaded to the blood group genotype database unit to allow updating of complete the blood group genotype database.

The system of claim 9, wherein the blood group genotype database refers to the MNS blood group genotype database.