CN115433768A - IGH (immunoglobulin-binding H) hypermutation detection method and system based on NGS (Next Generation kit) amplicon sequencing technology - Google Patents

IGH (immunoglobulin-binding H) hypermutation detection method and system based on NGS (Next Generation kit) amplicon sequencing technology Download PDF

Info

Publication number
CN115433768A
CN115433768A CN202211106780.9A CN202211106780A CN115433768A CN 115433768 A CN115433768 A CN 115433768A CN 202211106780 A CN202211106780 A CN 202211106780A CN 115433768 A CN115433768 A CN 115433768A
Authority
CN
China
Prior art keywords
leader
sequence
primer
sequencing
sequences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211106780.9A
Other languages
Chinese (zh)
Other versions
CN115433768B (en
Inventor
丁雨
杨雪雨
邓望龙
张超
任用
李诗濛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Xiansheng Medical Devices Co ltd
Nanjing Xiansheng Medical Laboratory Co ltd
Jiangsu Xiansheng Medical Diagnosis Co ltd
Original Assignee
Jiangsu Xiansheng Medical Devices Co ltd
Nanjing Xiansheng Medical Laboratory Co ltd
Jiangsu Xiansheng Medical Diagnosis Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Xiansheng Medical Devices Co ltd, Nanjing Xiansheng Medical Laboratory Co ltd, Jiangsu Xiansheng Medical Diagnosis Co ltd filed Critical Jiangsu Xiansheng Medical Devices Co ltd
Priority to CN202211106780.9A priority Critical patent/CN115433768B/en
Publication of CN115433768A publication Critical patent/CN115433768A/en
Application granted granted Critical
Publication of CN115433768B publication Critical patent/CN115433768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Theoretical Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Data Mining & Analysis (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application relates to the field of bioinformatics, and particularly provides an IGH (immunoglobulin-binding protein) hyper-mutation detection method and system based on an NGS (Next Generation) amplicon sequencing technology.

Description

IGH (immunoglobulin G H) hyper-mutation detection method and system based on NGS (Next Generation System) amplicon sequencing technology
Technical Field
The application relates to the technical field of student information analysis, in particular to an IGH (immunoglobulin G) hypermutation detection method and system based on an NGS (Next Generation kit) amplicon sequencing technology.
Technical Field
The IGH Hyper Mutation (SHM) refers to a Somatic hypermutation of IGHV gene in B lymphocytes under the stimulation of external antigens. It occurs after B lymphocyte maturation and has a high mutation frequency, generally defined as a rearranged V gene that differs by greater than or equal to 2% from a germline reference V gene segment, less than 2% being considered as not having a hypermutation. The IGH hypermutation is used as a prognosis index of chronic lymphocytic leukemia and lymphoma, and is recommended as a necessary inspection index in the international CLL-IPI scoring system and the iwCLL diagnosis and treatment guideline.
During the growth and development of B lymphocytes, V/D/J gene rearrangement of heavy chain and V/J gene rearrangement of light chain are undergone. During the rearrangement process, rearranged segments of the gene are randomly selected to yield 10 8 -10 10 A species immunoglobulin molecule; furthermore, the combination of the IGH hypermutation, the heavy and light chains, with each other, further increases the diversity of the antibody. The length and sequence of these gene rearrangements in each cell is unique. The multiplex PCR assay uses conserved regions within immunoglobulin genes as capture targets, and uses a series of mixed primer pairs of known sequence to identify specific gene rearrangements within cells, and thereby identify populations of B lymphocytes derived from individual cells.
In order to detect the rearrangement of the IGH VDJ gene, it is generally thought that primers are designed to bind to a target sequence in a plurality of conserved regions (FR 1, FR2, FR 3) of the IGHV gene and a terminal region of the IGHJ gene. The target fragment obtained by designing the primer in the FR2/FR3 region is short and low in specificity, so that the rearranged fragment cannot be accurately typed. In the detection process of IGH hypermutation, a Leader or FR1 region is usually selected to design a primer to be combined with the primer. The resolution required for detecting the IGH hypermutation is high, the FR1 region cannot cover the complete V gene region, and the hypermutation ratio is influenced. Although the primer of the Leader region can cover the complete V region, the target fragment is longer; combining the position distribution of IGH genes (figure 1), it can be found that the Leader/FR1 region is far away from the J region, and the complete IGH-VDJ rearranged segment can not be obtained by the NGS conventional sequencing mode.
In summary, detection of the IGH hypermutation is influenced by various factors such as the design position of a primer, the sequencing read length and the like; in view of this, the present application is presented. The application develops an IGH hypermutation detection method based on primer pairs of Leader-J and FR1-J regions and combined with a sequencing technology platform of NGS-PE 250. The method greatly reduces the experiment cost while ensuring the detection accuracy of the IGH hypermutation, and is beneficial to the popularization and the application of the conventional clinical detection.
Disclosure of Invention
In order to solve the technical problems, an amplification mode and a sequencing mode are ingeniously designed, assembly of an IGH-VDJ full-length sequence is completed in a fragmentation ordered connection mode through two sets of PCR amplification primer pairs of Leader-J and FR1-J and a PE250 sequencing mode, and accurate clone typing and hyper-mutation proportion is obtained through further comparison and calculation.
Specifically, the following technical scheme is provided in the application:
the application firstly provides an IGH hypermutation sequence processing method based on an NGS amplicon sequencing technology, and the method comprises the following steps:
1) Respectively amplifying the same sample by using a Leader-J primer pair and an FR1-J primer pair to build a library, and respectively sequencing NGS to obtain original off-machine data;
2) Filtering low-quality reads by the offline data; preferably, the linker sequence is further removed;
3) Filtering the non-specific amplified fragments, and classifying the sequencing data into corresponding primer classes;
4) Cutting off a 5' end PCR amplification primer in the insert;
5) Comparing and splicing the sequencing sequences processed in the step 4) to obtain a complete amplification sequence mergeFR1 between FR1 and J, and an upstream sequence Leader-R1 and a downstream sequence Leader-R2 amplified by a Leader-J primer pair;
6) Respectively carrying out unsupervised clustering on the complete amplification sequence mergeFR1 between FR1 and J and the amplified downstream sequence Leader-R2 of the Leader-J primer pair;
7) And (4) after clustering, completing the assembly of the full-length target sequence based on the overlap similarity score between the two sequences.
Further, in the step 1), the amplification library building specifically comprises: selecting FR1 and J-terminal conserved regions to design a multiple PCR primer FR1-J primer pair, and amplifying the sequences of the full length, J region and partial V region of a CDR3 region; selecting a Leader region and a J-end conserved region to design a multiplex PCR primer pair Leader-J primer pair, and performing complementary amplification on sequence intervals between the Leader and FR1; the same sample was amplified and library-constructed using Leader-J primer pair and FR1-J primer pair, respectively.
Further, in the step 1), the NGS sequencing is NGS-PE250 technology platform sequencing.
Further, in the step 3), the filtering the non-specifically amplified fragments specifically comprises: and (3) comparing the sequencing sequence with respective primer sequences, setting a related comparison threshold value, and filtering non-specifically amplified fragments.
Further, the step 3) further comprises: calculating the proportion of primer specific amplification reads, and evaluating the amplification efficiency of the primers and the effective proportion of experimental data.
Further, in the step 5), the splicing specifically includes: the PCR amplification product sequence of the FR1-J primer pair is short, overlap exists between R1 and R2 obtained by sequencing, and a complete amplification sequence mergeFR1 between FR1-J is obtained after splicing; the distance between the Leader-J primer pairs is far, the upstream and downstream sequences have no overlap, the Leader-J primer pairs cannot be spliced, and the Leader-J primer pairs are still independent upstream and downstream sequences, namely Leader-R1 and Leader-R2.
Further, the step 7) specifically comprises: after clustering, clustering with the sequence number accounting ratio higher than 5% in the mergeFR1 clustering result and clustering with the sequence number accounting ratio higher than 5% in the Leader-R2 clustering result, and completing full-length target sequence assembly based on overlap similarity score between the sequences of the two.
The application also provides an IGH hypermutation detection method based on the NGS amplicon sequencing technology, which comprises any one of the sequence processing methods and further comprises the following steps:
8) Respectively comparing the assembled sequences with IGH-VDJ embryonic gene segments in an IGMT database, counting the proportion of each clone according to the comparison result, and determining dominant clone; and calculating the difference ratio of the V gene sequences in the dominant clone, and judging the hypermutation state.
The present application further provides an electronic device, comprising: a processor and a memory; the processor is connected to a memory, wherein the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the method according to any one of the above items.
The present application also provides a computer storage medium having stored thereon a computer program comprising program instructions which, when executed by a processor, perform the method according to any of the above.
The application has at least the following beneficial technical effects:
1) The application develops an IGH hypermutation detection method based on an NGS amplicon sequencing technology platform for the first time.
2) According to the method, an amplification mode is ingeniously designed, a specific sequencing means is selected, and meanwhile, corresponding assembly sequencing is selected, so that a fragmentation ordered connection method is realized, and the difficulty that the full-length rearranged sequence cannot be obtained on a conventional sequencing platform due to IGH hypermutation is effectively solved.
Drawings
FIG. 1, the distance distribution between the regions of IGH;
FIG. 2, data analysis flow diagram;
FIG. 3, fragmenting and orderly connecting sequencing data to obtain the full length of a target sequence;
FIG. 4 shows the alignment of the assembled sequences with the IGH-VDJ germline gene segments in the IGMT database;
FIG. 5, correlation of the superior sequence hypermutation ratio of PE250 assembly with the clinical hypermutation ratio.
Detailed Description
Embodiments of the present application will be described in detail below with reference to examples, but those skilled in the art will appreciate that the following examples are only illustrative of the present application and should not be construed as limiting the scope of the present application. The examples, in which specific conditions are not specified, were carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or instruments used are not indicated by manufacturers, and are all conventional products available on the market.
Definition of partial terms
Unless defined otherwise below, all technical and scientific terms used in the detailed description of the present application are intended to have the same meaning as commonly understood by one of ordinary skill in the art. While the following terms are believed to be well understood by those skilled in the art, the following definitions are set forth to better explain the present application.
As used in this application, the indefinite or definite article used when referring to a noun in the singular, e.g. "a" or "an", "the", includes a plural of that noun.
As used in this application, the terms "comprising," "including," "having," "containing," or "involving" are inclusive or open-ended and do not exclude additional unrecited elements or method steps. The term "consisting of 8230A" is considered to be a preferred embodiment of the term "comprising". If in the following a certain group is defined to comprise at least a certain number of embodiments, this should also be understood as disclosing a group which preferably only consists of these embodiments.
The term "about" in the present application denotes an interval of accuracy that can be understood by a person skilled in the art, which still guarantees the technical effect of the feature in question. The term generally denotes a deviation of ± 10%, preferably ± 5%, from the indicated value.
Furthermore, the terms first, second, third, (a), (b), (c), and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments described herein are capable of operation in other sequences than described or illustrated herein.
The above terms or definitions are provided only to aid in understanding the present application. These definitions should not be construed to have a scope less than understood by those skilled in the art.
The method for detecting the IGH hypermutation based on the NGS amplicon sequencing technology is based on a specific primer design idea, combines a specific sequencing means, and matches with a corresponding sequence assembly strategy to realize fragmentation ordered connection, further complete IGH-VDJ full-length sequence assembly, and further obtains an accurate clone typing and hypermutation proportion through comparison calculation.
The general method comprises the following steps:
1) Respectively amplifying the same sample by using a Leader-J primer pair and an FR1-J primer pair to build a library, and respectively sequencing NGS to obtain original off-machine data; 2) Filtering low-quality reads by the data of the off-line device; 3) Filtering the non-specific amplified fragments, and classifying the sequencing data into corresponding primer classes; 4) Cutting off a 5' end PCR amplification primer in the insert; 5) Comparing and splicing the sequencing sequences processed in the step 4) to obtain a complete amplification sequence mergeFR1 between FR1 and J, and an upstream sequence Leader-R1 and a downstream sequence Leader-R2 amplified by a Leader-J primer pair; 6) Respectively carrying out unsupervised clustering on the complete amplification sequence mergeFR1 between FR1 and J and the amplified downstream sequence Leader-R2 of the Leader-J primer pair; 7) After clustering, based on the similarity score of overlap between the two sequences, the assembly of the full-length target sequence is completed; 8) Respectively comparing the assembled sequences with IGH-VDJ embryonic gene segments in an IGMT database, counting the proportion of each clone according to the comparison result, and determining dominant clone; and calculating the difference ratio of the V gene sequences in the dominant clone, and judging the hypermutation state.
In some embodiments, in step 1), the amplification pooling may be: comprehensively evaluating the conservative and accessibility problems of the sequences, selecting FR1 and J-end conservative regions to design a multiple PCR primer FR1-J primer pair, and amplifying the sequences of the full length, J region and partial V region of a CDR3 region; selecting a Leader region and a J-end conserved region to design a multiplex PCR primer pair Leader-J primer pair, and performing complementary amplification on sequence intervals between the Leader and FR1; and performing library construction after amplification.
In some embodiments, in step 1), the NGS sequencing is preferably NGS-PE250 technology platform sequencing, which can satisfy sequencing of the amplicons.
In some embodiments, in step 2), a further step of removing the linker sequence;
in some embodiments, in step 3), the filtering non-specifically amplified fragments may be: and (3) comparing the sequencing sequence with respective primer sequences, setting a related comparison threshold value, and filtering non-specifically amplified fragments.
In some embodiments, the step 3) may further include: calculating the proportion of primer specific amplification reads, and evaluating the amplification efficiency of the primers and the effective proportion of experimental data.
In some embodiments, in the step 5), the splicing may be: the PCR amplification product sequence of the FR1-J primer pair is short, overlap exists between R1 and R2 obtained by sequencing, and a complete amplification sequence mergeFR1 between FR1-J is obtained after splicing; the distance between the Leader-J primer pairs is far, the upstream and downstream sequences have no overlap, the Leader-J primer pairs cannot be spliced, and the Leader-J primer pairs are still independent upstream and downstream sequences, namely Leader-R1 and Leader-R2.
In some embodiments, the step 7) may be: and after clustering, taking the clusters with the sequence number accounting for more than 5% in the mergeFR1 clustering result and the clusters with the sequence number accounting for more than 5% in the Leader-R2 clustering result, and finishing the assembly of the full-length target sequence based on the similarity score of overlap between the sequences of the two clusters.
Suitable modifications of the corresponding method are within the scope of the present application without departing from the design idea of the present application. The application is illustrated below with reference to specific examples.
Example 1 construction of the methods of the present application
1. Selection and design of primers and sequencing mode of the application
Comprehensively analyzing the accessibility of the primers in the variable region and the conserved region of the VDJ rearranged gene, firstly selecting the conserved regions at the FR1 and J ends to design multiple PCR primers for amplifying the sequences of the full length, J region and partial V region of the CDR3 region. And secondly, supplementing the sequence interval between the Leader and the FR1 through a Leader region and a primer pair at the J end.
The length of an amplification product of the FR1-J primer pair is about 350bp, and the distance between the Leader primer and the FR1 primer is about 190 bp. According to the distance between the Leader and the FR1 primer position and the position condition of J, the comprehensive analysis needs to use a PE250+ assembly strategy to achieve the aim, and other sequencing modes cannot completely test the length, so that the method selects the PE250 sequencing mode to sequence the amplification products of FR1-J and the amplification products of Leader-J respectively. Sequencing data for both primer combinations were combined and analyzed for IGH hypermutations.
2. Analytical method
The detailed analysis method is shown in fig. 2 and specifically comprises the following steps:
1) Amplifying the same sample by using Leader-J and FR1-J primer pairs to obtain a target fragment, and sequencing on an NGS-PE250 platform to obtain original off-line data;
2) Filtering low-quality reads by using analysis software to remove a joint sequence;
3) The sequencing data are classified into corresponding primer categories at the same time by comparing the sequencing sequence with respective primer sequences, setting related comparison thresholds, filtering large fragments which are not specifically amplified. Calculating the proportion of the primer specific amplification reads, and evaluating the amplification efficiency of the primers and the effective proportion of experimental data.
4) Cutting off the 5' end PCR amplification primer in the insert by using primer cutting software;
5) And (3) comparing and splicing sequencing sequences R1 and R2 by sequencing data splicing software. The PCR amplification product of FR1-J has a short sequence, overlap exists between R1 and R2 obtained in a PE250 sequencing mode, and a complete amplification sequence mergeFR1 between FR1 and J can be obtained after splicing is completed (figure 3). The distance between the Leader-J primer pair is far, and R1 and R2 have no overlap, so the step cannot splice the primers and the sequences are independent R1 and R2 sequences (Leader-R1 and Leader-R2), and particularly refer to FIG. 3.
6) Sequence clustering software is used for carrying out unsupervised clustering on the complete amplification sequence mergeFR1 between the FR1 region and the J and the amplified downstream sequencing sequence (Leader-R2) of the Leader primer pair respectively.
7) After clustering is completed, clustering with the sequence number accounting ratio higher than 5% in the mergeFR1 clustering result and clustering with the sequence number accounting ratio higher than 5% in the Leader-R2 clustering result are taken, and based on the overlap similarity score between the two sequences, the assembly of the full-length target sequence is completed (figure 3).
8) Comparing the assembled sequences to IGH-VDJ embryonic gene segments in an IGMT database by using comparison software, counting the ratio of each clone according to the comparison result, and determining the final dominant clone; and calculating whether the difference ratio of the V gene sequence of the dominant clone is higher than 2%, and judging the hypermutation state.
Example 2 validation of clinical data
1) Supermutation and dominant cloning accuracy assessment
6 clinical dominant clone positive samples were selected, 3 of which were hypermutation positive samples and 3 of which were hypermutation negative samples. Table 1 shows specific dominant clonotypes and hypermutation states.
Taking an example of a hypermutation negative sample as an example, the sequence assembly is as follows: the original sequence was stripped of linker and primer sequences, as detailed below. The length of the R1 end sequence of the amplified product of the FR1-J primer pair is 227bp, and the length of the R2 end sequence is 225bp; the length of the R1 end sequence of the amplification product of the Leader-J primer pair is 227bp, and the length of the R2 end sequence is 227bp.
Figure BDA0003839676780000071
Overlap between R1 and R2 of an amplification product of the FR1-J primer pair is 164bp, and the similarity is 100%; the length of the spliced sequence mergeR1 is 288bp.
Figure BDA0003839676780000072
The overlap between the R2 end of the Leader-J primer pair and the mergeR1 sequence is 23bp, and the similarity is 100%; the length of the complete sequence after assembly is 492bp, as follows:
Figure BDA0003839676780000073
the assembled sequence was aligned to the IGH-VDJ germline gene segment in the IGMT database, and the detailed alignment is shown in FIG. 4. The V, D and J genes were aligned and typed as IGHV3-23 x 01, IGHD3-10 x 01 and IGHJ4 x 02, respectively. The proportion of IGH hypermutation was 1.7%.
The results of all samples are shown in Table 1, and the results show that the typing of the IGHV/D/J gene corresponding to the dominant clone sequence obtained by the PE250 assembly method of the present application is consistent with that of the clinic. Moreover, the hypermutation status judged from the 2% hypermutation threshold was also consistent with clinical results, and the correlation R between the hypermutation ratio and clinical results 2 99.52% (fig. 5), which are highly correlated.
TABLE 1 dominant cloning sequence typing and hypermutation status
Figure BDA0003839676780000081
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A sequence processing method of IGH hypermutation based on NGS amplicon sequencing technology is characterized by comprising the following steps:
1) Respectively amplifying the same sample by using a Leader-J primer pair and an FR1-J primer pair to build a library, and respectively sequencing NGS to obtain original off-machine data;
2) Filtering low-quality reads by the data of the off-line device; preferably, the linker sequence is further removed;
3) Filtering the non-specific amplified fragments, and classifying the sequencing data into corresponding primer classes;
4) Cutting off a 5' end PCR amplification primer in the insert;
5) Comparing and splicing the sequencing sequences processed in the step 4) to obtain a complete amplification sequence mergeFR1 between FR1 and J, and an upstream sequence Leader-R1 and a downstream sequence Leader-R2 amplified by a Leader-J primer pair;
6) Respectively carrying out unsupervised clustering on the complete amplification sequence mergeFR1 between FR1 and J and the amplified downstream sequence Leader-R2 of the Leader-J primer pair;
7) And after clustering, completing the assembly of the full-length target sequence based on the overlap similarity score between the sequences of the two.
2. The analysis method according to claim 1, wherein in the step 1), the amplification library is specifically: selecting FR1 and J-terminal conserved regions to design a multiple PCR primer FR1-J primer pair, and amplifying the sequences of the full length, J region and partial V region of a CDR3 region; selecting a Leader region and a J-end conserved region to design a multiplex PCR primer pair Leader-J primer pair, and performing complementary amplification on sequence intervals between the Leader and FR1; and respectively amplifying the same sample by using a Leader-J primer pair and an FR1-J primer pair and constructing a library.
3. The analytical method of claim 1, wherein in the step 1), the NGS sequencing is NGS-PE250 technology platform sequencing.
4. The analytical method according to claim 1, wherein in step 3), the filtering of non-specifically amplified fragments is specifically: and (3) comparing the sequencing sequence with respective primer sequences, setting a related comparison threshold value, and filtering non-specifically amplified fragments.
5. The analytical method of claim 1, wherein the step 3) further comprises: calculating the proportion of primer specific amplification reads, and evaluating the amplification efficiency of the primers and the effective proportion of experimental data.
6. The analysis method according to claim 1, wherein in the step 5), the splicing specifically comprises: the PCR amplification product sequence of the FR1-J primer pair is short, overlap exists between R1 and R2 obtained by sequencing, and a complete amplification sequence mergeFR1 between FR1-J is obtained after splicing; the distance between the Leader-J primer pairs is far, the upstream and downstream sequences have no overlap, the Leader-J primer pairs cannot be spliced, and the Leader-J primer pairs are still independent upstream and downstream sequences, namely Leader-R1 and Leader-R2.
7. The analysis method according to claim 1, wherein the step 7) is specifically: after clustering, clustering with the sequence number accounting ratio higher than 5% in the mergeFR1 clustering result and clustering with the sequence number accounting ratio higher than 5% in the Leader-R2 clustering result, and completing full-length target sequence assembly based on overlap similarity score between the sequences of the two.
8. A method for detecting IGH hypermutation based on NGS amplicon sequencing technology, comprising the sequence processing method of any one of claims 1 to 7, and further comprising the steps of:
8) Respectively comparing the assembled sequences with IGH-VDJ embryonic gene segments in an IGMT database, counting the proportion of each clone according to the comparison result, and determining dominant clone; and (4) calculating the difference ratio of the V gene sequences in the dominant clone and judging the hypermutation state.
9. An electronic device, comprising: a processor and a memory; the processor is coupled to a memory, wherein the memory is configured to store a computer program and the processor is configured to invoke the computer program to perform the method of any of claims 1-7.
10. A computer storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-7.
CN202211106780.9A 2022-09-09 2022-09-09 IGH hypermutation detection method and system based on NGS amplicon sequencing technology Active CN115433768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211106780.9A CN115433768B (en) 2022-09-09 2022-09-09 IGH hypermutation detection method and system based on NGS amplicon sequencing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211106780.9A CN115433768B (en) 2022-09-09 2022-09-09 IGH hypermutation detection method and system based on NGS amplicon sequencing technology

Publications (2)

Publication Number Publication Date
CN115433768A true CN115433768A (en) 2022-12-06
CN115433768B CN115433768B (en) 2023-09-29

Family

ID=84263133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211106780.9A Active CN115433768B (en) 2022-09-09 2022-09-09 IGH hypermutation detection method and system based on NGS amplicon sequencing technology

Country Status (1)

Country Link
CN (1) CN115433768B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116445478A (en) * 2023-06-12 2023-07-18 北京旌准医疗科技有限公司 Primer combination for constructing IGHV gene library and application thereof
CN117721191A (en) * 2024-02-07 2024-03-19 深圳赛陆医疗科技有限公司 Gene sequencing method, sequencing device, readable storage medium and gene sequencing system
CN117721191B (en) * 2024-02-07 2024-05-10 深圳赛陆医疗科技有限公司 Gene sequencing method, sequencing device, readable storage medium and gene sequencing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109929924A (en) * 2019-03-27 2019-06-25 上海科医联创医学检验所有限公司 A kind of IGH gene rearrangement detection method based on high-flux sequence
CN111261226A (en) * 2020-03-12 2020-06-09 江苏先声医学诊断有限公司 NGS-based automatic sequencing analysis method and device for minimal residual lesions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109929924A (en) * 2019-03-27 2019-06-25 上海科医联创医学检验所有限公司 A kind of IGH gene rearrangement detection method based on high-flux sequence
CN111261226A (en) * 2020-03-12 2020-06-09 江苏先声医学诊断有限公司 NGS-based automatic sequencing analysis method and device for minimal residual lesions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GEOFFREY LOWMAN等: "Comparison of DNA and RNA input IGH Chain Sequencing Assays for Somatic Hypermutation Analysis", 《WWW.THERMOFISHER.COM/2021AACR-POSTERS》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116445478A (en) * 2023-06-12 2023-07-18 北京旌准医疗科技有限公司 Primer combination for constructing IGHV gene library and application thereof
CN116445478B (en) * 2023-06-12 2023-09-05 北京旌准医疗科技有限公司 Primer combination for constructing IGHV gene library and application thereof
CN117721191A (en) * 2024-02-07 2024-03-19 深圳赛陆医疗科技有限公司 Gene sequencing method, sequencing device, readable storage medium and gene sequencing system
CN117721191B (en) * 2024-02-07 2024-05-10 深圳赛陆医疗科技有限公司 Gene sequencing method, sequencing device, readable storage medium and gene sequencing system

Also Published As

Publication number Publication date
CN115433768B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US11898206B2 (en) Systems and methods for clonotype screening
Ou et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
US10347365B2 (en) Systems and methods for visualizing a pattern in a dataset
CN105886616B (en) Efficient specific sgRNA recognition site guide sequence for pig gene editing and screening method thereof
Trapnell et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
JP2018535481A5 (en)
CN108197434B (en) Method for removing human gene sequence in metagenome sequencing data
D'Angelo et al. The antibody mining toolbox: an open source tool for the rapid analysis of antibody repertoires
CN111566225A (en) Normalization of tumor mutational burden
CN115312121B (en) Target gene locus detection method, device, equipment and computer storage medium
CN115433768B (en) IGH hypermutation detection method and system based on NGS amplicon sequencing technology
CN107038349B (en) Method and apparatus for determining pre-rearrangement V/J gene sequence
CN112029842A (en) Kit and method for ABO blood type genotyping based on high-throughput sequencing
Chen et al. Practical considerations on performing and analyzing CLIP-seq experiments to identify transcriptomic-wide RNA-protein interactions
Downes et al. An integrated platform to systematically identify causal variants and genes for polygenic human traits
CN115101128A (en) Method for evaluating off-target risk of hybridization capture probe
Marceddu et al. Analysis of machine learning algorithms as integrative tools for validation of next generation sequencing data.
Forsberg et al. CLC Bio Integrated Platform for Handling and Analysis of Tag Sequencing Data
KR20210040714A (en) Method and appartus for detecting false positive variants in nucleic acid sequencing analysis
CN116064818A (en) Primer group, method and system for detecting IGH gene rearrangement and hypermutation
Rosenfeld et al. Bulk gDNA sequencing of antibody heavy-chain gene rearrangements for detection and analysis of B-cell clone distribution: A method by the AIRR community
Lasica et al. Automated ChIPmentation procedure on limited biological material of the human blood fluke Schistosoma mansoni
CN114783518A (en) Method, device, electronic apparatus, program, and medium for predicting gene editing result
CN111349690B (en) Method for detecting protein DNA binding site
Islam et al. CRIS: complete reconstruction of immunoglobulin VDJ sequences from RNA-seq data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant