CN116434837B - Chromosome balance translocation detection analysis system based on NGS - Google Patents
Chromosome balance translocation detection analysis system based on NGS Download PDFInfo
- Publication number
- CN116434837B CN116434837B CN202310687440.8A CN202310687440A CN116434837B CN 116434837 B CN116434837 B CN 116434837B CN 202310687440 A CN202310687440 A CN 202310687440A CN 116434837 B CN116434837 B CN 116434837B
- Authority
- CN
- China
- Prior art keywords
- sequencing data
- module
- translocation
- information
- chromosome
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005945 translocation Effects 0.000 title claims abstract description 141
- 210000000349 chromosome Anatomy 0.000 title claims abstract description 100
- 238000001514 detection method Methods 0.000 title claims abstract description 100
- 238000004458 analytical method Methods 0.000 title claims abstract description 58
- 238000012163 sequencing technique Methods 0.000 claims abstract description 134
- 238000007481 next generation sequencing Methods 0.000 claims abstract description 66
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000002360 preparation method Methods 0.000 claims abstract description 12
- 238000012165 high-throughput sequencing Methods 0.000 claims abstract description 11
- 238000012216 screening Methods 0.000 claims description 53
- 239000011159 matrix material Substances 0.000 claims description 31
- 238000000034 method Methods 0.000 claims description 18
- 230000008707 rearrangement Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000003908 quality control method Methods 0.000 claims description 12
- 238000013441 quality evaluation Methods 0.000 claims description 12
- 230000000007 visual effect Effects 0.000 claims description 11
- 230000035772 mutation Effects 0.000 claims description 9
- 238000007400 DNA extraction Methods 0.000 claims description 8
- 230000002159 abnormal effect Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 238000001303 quality assessment method Methods 0.000 claims description 6
- 239000012634 fragment Substances 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000003780 insertion Methods 0.000 claims description 3
- 230000037431 insertion Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 9
- 238000007689 inspection Methods 0.000 description 8
- 230000002759 chromosomal effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 208000034951 Genetic Translocation Diseases 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The application provides a chromosome balance translocation detection analysis system based on NGS, which comprises a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from the sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information. The application has the effect of improving the accuracy of chromosome balance translocation detection analysis.
Description
Technical Field
The application relates to the technical field of chromosome translocation detection, in particular to a chromosome balance translocation detection analysis system based on NGS.
Background
NGS-based chromosome balance translocation detection analysis system is a system that utilizes Next-generation sequencing (Next-Generation Sequencing, NGS) technology to detect and analyze chromosome balance translocation. Chromosome balance translocation refers to the exchange of two gene sequences on a chromosome, and generally involves an exchange event between two chromosomes. Such translocation may lead to the occurrence of genetic diseases, tumors, and the like.
Traditional chromosomal analysis methods, such as conventional nuclear profiling and fluorescence in situ hybridization, can detect large chromosomal structural changes, but are less sensitive to minor balanced or complex translocations. While NGS-based chromosome balance translocation detection analysis systems can provide higher resolution and sensitivity, can detect smaller structural changes, and help identify specific locations and genetic variations of translocations.
A number of systems for detecting chromosomal balance translocation have been developed, and a number of studies and references have been made to find that the systems for detecting chromosomal balance translocation of the prior art have a system for detecting chromosomal balance translocation as disclosed in publication nos. CN110265087A, CN111276189B, CN110428873A, EP3115962A4 and US4122518A, JP5219516B2, which generally comprise: the system comprises a data acquisition module, a genome comparison module, an analysis module and a result output module, wherein the data acquisition module is used for acquiring sample data of a test reading section; the genome comparison module is used for performing base comparison on the sample data and a reference genome; the analysis module is used for generating corresponding analysis information according to the comparison result; and the result output module is used for displaying the analysis information. Because the detection and analysis process of the chromosome balance translocation detection and analysis system is single, the quality control process is absent from the acquisition of sample data to the generation of analysis information, and the defect of reduced accuracy of chromosome balance translocation detection and analysis is caused.
Disclosure of Invention
The application aims to provide a chromosome balance translocation detection analysis system based on NGS, aiming at the defects of the chromosome balance translocation detection analysis system.
The application adopts the following technical scheme:
an NGS-based chromosome balance translocation detection analysis system comprises a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from a sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information;
the sample preparation terminal comprises a DNA extraction module and a DNA library generation module; the DNA extraction module is used for extracting DNA from the sample to be detected; the DNA library generation module is used for carrying out library construction on the extracted DNA to generate a DNA library suitable for NGS sequencing;
the NGS sequencing terminal comprises an NGS platform calling module and an NGS sequencing module; the NGS platform calling module is used for calling an NGS platform preset in the system; the NGS sequencing module is used for performing high-throughput sequencing on the DNA library by utilizing a corresponding NGS platform to generate corresponding original sequencing data;
the sample data processing terminal comprises a data quality control module, a read segment comparison module, a translocation detection module and a visual information generation module; the data quality control module is used for carrying out quality filtering treatment on the original sequencing data; the read comparison module is used for comparing the sequencing data subjected to quality filtering with a reference genome to generate initial position information and direction information of each sequencing data read; the translocation detection module is used for carrying out translocation detection according to the initial position information and the direction information of each sequencing data reading segment and generating corresponding chromosome translocation information; the visual information generation module is used for carrying out graphical processing according to the chromosome translocation information and generating corresponding visual information.
Optionally, the data quality control module comprises a sequence quality screening sub-module and a sequence length screening sub-module; the sequence quality screening submodule is used for carrying out quality evaluation screening on each sequencing data read; the sequence length screening submodule is used for carrying out length evaluation screening on sequencing data reads subjected to quality evaluation screening;
when the sequence quality screening submodule works, the following formula is satisfied:
;
;
wherein ,representing a sequence quality assessment index corresponding to the sequencing data reads; />Representing the total number of bases in the corresponding sequencing data reads; />An exponential conversion coefficient based on the total number of bases, which is empirically set by a inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Quality assessment values of individual bases; />An index conversion coefficient representing a quality evaluation value based on the quality evaluation value, which is empirically set by an inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Probability of sequencing errors for individual bases; />The corresponding NGS platform is obtained by calculation according to the intensity and noise characteristics of the sequencing signals of the corresponding sequencing data reading section; the sequence quality screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing a sequence quality screening threshold value, which is empirically set by an inspector;
when the sequence length screening submodule is operative, a sequence length evaluation index for each sequencing data read screened by sequence quality is calculated by the following equation:
;
;
wherein ,a sequence length evaluation index representing the corresponding sequencing data reads that pass the sequence quality screening; />A coefficient selection function representing the number of base length anomalies in reads based on the corresponding sequencing data screened by sequence quality;representing the number of base length anomalies in the corresponding sequencing data reads that pass the sequence quality screening; />Representing the +.sup.th in the corresponding sequencing data reads by sequence quality screening>The length of the individual bases; />Representing the total number of bases in the corresponding sequencing data reads that pass the sequence quality screening; the sequence length screening submodule screens out ∈10->Is a sequencing data read of (a); />The sequence length screening threshold is expressed and empirically set by the inspector.
Optionally, the read comparison module comprises a base comparison sub-module, a scoring matrix generation sub-module and a comparison information generation sub-module; the base alignment sub-module is used for performing base alignment on the corresponding sequencing data read section screened by the sequence length and a reference genome; the score matrix generation sub-module is used for generating a score matrix corresponding to the sequencing data reading according to the matching score information and the penalty value information in the comparison result; the comparison information generation sub-module is used for generating comparison quality information, initial position information and direction information of the corresponding sequencing data read according to the comparison result and the scoring matrix.
Optionally, the comparison information generating sub-module includes a comparison quality unit, a comparison quality information generating unit, a starting position information generating unit and a direction information generating unit; the comparison quality index unit is used for calculating the comparison quality index of the corresponding sequencing data read and the reference genome according to the matching score information, the penalty value information and the score matrix of the corresponding sequencing data read in the comparison result; the comparison quality information generating unit generates corresponding comparison quality information according to the corresponding comparison quality index; the initial position information generation unit is used for generating initial position information of the corresponding sequencing data read on the reference genome according to the comparison result; the direction information generation unit is used for generating direction information of the corresponding sequencing data read section on the reference genome according to the comparison result;
when the comparison mass calculation unit calculates, the following equation is satisfied:
;
wherein ,representing an alignment quality index of the corresponding sequencing data reads screened by sequence length to a reference genome; />Representing the +.sup.th in the corresponding sequencing data reads>Match score values for bases for which match scores are obtained; the matching score represents a score of each base participating in comparison, which is assessed by the system according to preset requirements of a inspector in the comparison process; the higher the similarity of the base corresponding to the sequencing data read to the base of the reference genome, the higher the match score; />Representing the total number of bases in the corresponding sequencing data reads that achieve a match score; />Representing the total number of bases in the corresponding sequencing data reads screened by sequence length; />Representing the +.sup.th in the corresponding sequencing data reads>Penalty value values for each base for which a penalty value is derived; the penalty value represents a numerical value given by a base which is subjected to insertion or deletion operation in the comparison process according to a preset requirement of an inspector by a system;/>representing the total number of bases in the corresponding sequencing data reads for which penalty values are obtained; />Representing the number of bases on the optimal alignment path in the scoring matrix; the scoring matrix is used for recording the comparison score of each base of the corresponding sequencing data reading segment in a matrix form; the optimal comparison path is a path from the highest matching score to the matching score of zero in sequence in the score matrix; />Representing the highest matching score value in the score matrix; /> and />Respectively representing a first weight coefficient and a second weight coefficient, which are set by a inspector according to experience;
the translocation detection module selectsPerforming translocation detection on the initial position information and the direction information of each sequencing data read and generating corresponding chromosome translocation information; />The reference quality threshold is expressed and set empirically by the inspector.
Optionally, the translocation detection module comprises an abnormal fragment identification sub-module, a rearrangement boundary positioning sub-module, a mutation type identification sub-module and a chromosome translocation information generation sub-module; the abnormal fragment identification submodule is used for generating chromosome balance translocation event information according to the comparison result; the rearrangement boundary positioning sub-module is used for positioning a rearrangement boundary of the chromosome translocation event according to the chromosome balance translocation event information; the mutation type identification submodule is used for judging the chromosome translocation type according to the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the copy numbers of the two sides of the translocation event; the chromosome translocation information generation submodule is used for generating corresponding chromosome translocation information according to the starting position information, the direction information, the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the mutation type of each sequencing data reading segment.
An NGS-based chromosome balance translocation detection analysis method applied to the NGS-based chromosome balance translocation detection analysis system, the chromosome balance translocation detection analysis method comprising:
s1, extracting DNA from a sample to be detected and preparing a DNA library;
s2, performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data;
s3, performing translocation detection data processing on the original sequencing data to generate chromosome translocation information;
s4, generating and displaying translocation analysis result information according to the chromosome translocation information.
The beneficial effects obtained by the application are as follows:
1. the sample preparation terminal, the NGS sequencing terminal, the sample data processing terminal and the analysis result display terminal are arranged to be beneficial to improving the accuracy of the detection process from the construction of the DNA library, and chromosome translocation information is obtained through accurate original sequencing data, so that the analysis result is more accurate and clearer, and the accuracy of chromosome balance translocation detection analysis is improved;
2. the arrangement of the DNA extraction module and the DNA library generation module is beneficial to improving the accuracy and timeliness of DNA extraction, so that the DNA library is more accurate, thereby being beneficial to improving the accuracy of the detection process;
3. the NGS platform calling module and the NGS sequencing module are arranged to be beneficial to improving the adaptability and accuracy of NGS platform calling, and the constructed DNA library is subjected to high-throughput sequencing by a more accurate and more suitable NGS platform, so that the accuracy of the NGS sequencing process is improved;
4. the data quality control module, the read segment comparison module, the translocation detection module and the visual information generation module are arranged to be beneficial to the accuracy of a sample data processing process, so that the chromosomal translocation information is more accurate, the display and analysis are more accurately facilitated through the visual information, and the accuracy of the chromosomal balance translocation detection analysis is improved;
5. the sequence quality screening submodule and the sequence length screening submodule are matched with a sequence quality evaluation index algorithm and a sequence length evaluation index algorithm, so that sequencing data reads can be evaluated and screened in sequence, the quality of the screened test data reads is improved, the analysis accuracy is improved, and the accuracy of chromosome balance translocation detection is improved;
6. the base comparison sub-module, the scoring matrix generation sub-module and the comparison information generation sub-module are arranged to be beneficial to improving the base comparison efficiency and accuracy, and the scoring matrix is beneficial to improving the accuracy of analysis, so that the accuracy of chromosome balance translocation detection is improved;
7. the comparison quality index algorithm is matched by the comparison quality unit, the comparison quality information generation unit, the initial position information generation unit and the direction information generation unit, sequencing data reading segments with better comparison quality and comparison results thereof are further screened and analyzed, so that the chromosome translocation information is more accurate, and the accuracy of chromosome balance translocation detection is improved;
8. the arrangement of the abnormal segment identification sub-module, the rearrangement boundary positioning sub-module, the mutation type identification sub-module and the chromosome translocation information generation sub-module is beneficial to efficiently and accurately completing chromosome balance translocation analysis, so that more accurate chromosome translocation information is generated;
9. the inspector scoring module, the inspection flow scoring module and the inspection index computing sub-module are matched with the inspector scoring algorithm and the inspection flow scoring algorithm, so that the accuracy of inspector scoring and inspection flow scoring is improved, the accuracy of inspection indexes is further improved, the accuracy of inspection information is further improved, and the accuracy of chromosome balance translocation detection is improved.
For a further understanding of the nature and the technical aspects of the present application, reference should be made to the following detailed description of the application and the accompanying drawings, which are provided for purposes of reference only and are not intended to limit the application.
Drawings
FIG. 1 is a schematic diagram of the overall structure of the present application;
FIG. 2 is a schematic diagram of a data quality control module according to the present application;
FIG. 3 is a schematic flow chart of a method for detecting and analyzing a chromosome balance translocation based on NGS according to the present application;
FIG. 4 is a schematic diagram of the overall structure of an NGS-based chromosome balance translocation detection analysis system according to another embodiment of the present application.
Detailed Description
The following embodiments of the present application are described in terms of specific examples, and those skilled in the art will appreciate the advantages and effects of the present application from the disclosure herein. The application is capable of other and different embodiments and its several details are capable of modification and variation in various respects, all without departing from the spirit of the present application. The drawings of the present application are merely schematic illustrations, and are not drawn to actual dimensions, and are stated in advance. The following embodiments will further illustrate the related art of the present application in detail, but the disclosure is not intended to limit the scope of the present application.
Embodiment one: the present embodiment provides a NGS-based chromosome balance translocation detection analysis system. Referring to FIG. 1, an NGS-based chromosome balance translocation detection analysis system comprises a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from a sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information;
the sample preparation terminal comprises a DNA extraction module and a DNA library generation module; the DNA extraction module is used for extracting DNA from the sample to be detected; the DNA library generation module is used for carrying out library construction on the extracted DNA to generate a DNA library suitable for NGS sequencing;
the NGS sequencing terminal comprises an NGS platform calling module and an NGS sequencing module; the NGS platform calling module is used for calling an NGS platform preset in the system; the NGS sequencing module is used for performing high-throughput sequencing on the DNA library by utilizing a corresponding NGS platform to generate corresponding original sequencing data;
the sample data processing terminal comprises a data quality control module, a read segment comparison module, a translocation detection module and a visual information generation module; the data quality control module is used for carrying out quality filtering treatment on the original sequencing data; the read comparison module is used for comparing the sequencing data subjected to quality filtering with a reference genome to generate initial position information and direction information of each sequencing data read; the translocation detection module is used for carrying out translocation detection according to the initial position information and the direction information of each sequencing data reading segment and generating corresponding chromosome translocation information; the visual information generation module is used for carrying out graphical processing according to the chromosome translocation information and generating corresponding visual information.
Optionally, referring to fig. 2, the data quality control module includes a sequence quality screening sub-module and a sequence length screening sub-module; the sequence quality screening submodule is used for carrying out quality evaluation screening on each sequencing data read; the sequence length screening submodule is used for carrying out length evaluation screening on sequencing data reads subjected to quality evaluation screening;
when the sequence quality screening submodule works, the following formula is satisfied:
;
;
wherein ,representing a sequence quality assessment index corresponding to the sequencing data reads; />Representing the total number of bases in the corresponding sequencing data reads; />An exponential conversion coefficient based on the total number of bases, which is empirically set by a inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Quality assessment values of individual bases; />An index conversion coefficient representing a quality evaluation value based on the quality evaluation value, which is empirically set by an inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Probability of sequencing errors for individual bases; />The corresponding NGS platform is obtained by calculation according to the intensity and noise characteristics of the sequencing signals of the corresponding sequencing data reading section; the sequence quality screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing a sequence quality screening threshold value, which is empirically set by an inspector;
when the sequence length screening submodule is operative, a sequence length evaluation index for each sequencing data read screened by sequence quality is calculated by the following equation:
;
;
wherein ,a sequence length evaluation index representing the corresponding sequencing data reads that pass the sequence quality screening; />A coefficient selection function representing the number of base length anomalies in reads based on the corresponding sequencing data screened by sequence quality;representing the number of base length anomalies in the corresponding sequencing data reads that pass the sequence quality screening; />Representing the +.sup.th in the corresponding sequencing data reads by sequence quality screening>The length of the individual bases; />Representing the total number of bases in the corresponding sequencing data reads that pass the sequence quality screening; the sequence length screening submodule screens out ∈10->Is a sequencing data read of (a); />The sequence length screening threshold is expressed and empirically set by the inspector.
Optionally, the read comparison module comprises a base comparison sub-module, a scoring matrix generation sub-module and a comparison information generation sub-module; the base alignment sub-module is used for performing base alignment on the corresponding sequencing data read section screened by the sequence length and a reference genome; the score matrix generation sub-module is used for generating a score matrix corresponding to the sequencing data reading according to the matching score information and the penalty value information in the comparison result; the comparison information generation sub-module is used for generating comparison quality information, initial position information and direction information of the corresponding sequencing data read according to the comparison result and the scoring matrix.
Optionally, the comparison information generating sub-module includes a comparison quality unit, a comparison quality information generating unit, a starting position information generating unit and a direction information generating unit; the comparison quality index unit is used for calculating the comparison quality index of the corresponding sequencing data read and the reference genome according to the matching score information, the penalty value information and the score matrix of the corresponding sequencing data read in the comparison result; the comparison quality information generating unit generates corresponding comparison quality information according to the corresponding comparison quality index; the initial position information generation unit is used for generating initial position information of the corresponding sequencing data read on the reference genome according to the comparison result; the direction information generation unit is used for generating direction information of the corresponding sequencing data read section on the reference genome according to the comparison result;
when the comparison mass calculation unit calculates, the following equation is satisfied:
;
wherein ,representing an alignment quality index of the corresponding sequencing data reads screened by sequence length to a reference genome; />Representing the +.sup.th in the corresponding sequencing data reads>Match score values for bases for which match scores are obtained; the matching score represents a score of each base participating in comparison, which is assessed by the system according to preset requirements of a inspector in the comparison process; the higher the similarity of the base corresponding to the sequencing data read to the base of the reference genome, the higher the match score; />Representing the total number of bases in the corresponding sequencing data reads that achieve a match score; />Representing the total number of bases in the corresponding sequencing data reads screened by sequence length; />Representing the +.sup.th in the corresponding sequencing data reads>Penalty value values for each base for which a penalty value is derived; the penalty value represents a numerical value given by a base which is subjected to insertion or deletion operation in the comparison process according to a preset requirement of an inspector by a system;/>representing the total number of bases in the corresponding sequencing data reads for which penalty values are obtained; />Representing the number of bases on the optimal alignment path in the scoring matrix; the scoring matrix is used for recording the comparison score of each base of the corresponding sequencing data reading segment in a matrix form; the optimal comparison path is a path from the highest matching score to the matching score of zero in sequence in the score matrix; />Representing the highest matching score value in the score matrix; /> and />Respectively representing a first weight coefficient and a second weight coefficient, which are set by a inspector according to experience;
the translocation detection module selectsPerforming translocation detection on the initial position information and the direction information of each sequencing data read and generating corresponding chromosome translocation information; />The reference quality threshold is expressed and set empirically by the inspector.
Optionally, the translocation detection module comprises an abnormal fragment identification sub-module, a rearrangement boundary positioning sub-module, a mutation type identification sub-module and a chromosome translocation information generation sub-module; the abnormal fragment identification submodule is used for generating chromosome balance translocation event information according to the comparison result; the rearrangement boundary positioning sub-module is used for positioning a rearrangement boundary of the chromosome translocation event according to the chromosome balance translocation event information; the mutation type identification submodule is used for judging the chromosome translocation type according to the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the copy numbers of the two sides of the translocation event; the chromosome translocation information generation submodule is used for generating corresponding chromosome translocation information according to the starting position information, the direction information, the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the mutation type of each sequencing data reading segment.
An NGS-based chromosome balance translocation detection analysis method, which is applied to the NGS-based chromosome balance translocation detection analysis system, is shown in fig. 3, and comprises the following steps:
s1, extracting DNA from a sample to be detected and preparing a DNA library;
s2, performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data;
s3, performing translocation detection data processing on the original sequencing data to generate chromosome translocation information;
s4, generating and displaying translocation analysis result information according to the chromosome translocation information.
Embodiment two: the embodiment includes the whole content of the first embodiment, and provides an NGS-based chromosome balance translocation detection analysis system, which, with reference to fig. 4, further includes a detection and assessment terminal; the detection and assessment terminal comprises a detection and assessment index calculation module and a detection and assessment information generation module; the detection assessment index calculation module is used for calculating a corresponding detection assessment index according to the scores of the inspectors and the scores of the detection flow in the detection process; the detection and assessment information generation module is used for generating corresponding detection and assessment information according to the detection and assessment indexes.
The detection assessment index calculation module comprises an inspector assessment sub-module, a detection flow assessment sub-module and a detection assessment index calculation sub-module; the inspector scoring submodule is used for calculating inspector scores of corresponding inspectors according to the working ages, the total detection numbers and the teacher ratings of the inspectors; the detection flow scoring sub-module is used for calculating the detection flow score of the corresponding detection according to the number of detectors in the detection flow, the rating of the directors of the detectors and the score of the corresponding detection flow item; the detection assessment index calculation sub-module is used for calculating a detection assessment index according to the score of the inspector and the score of the detection flow.
When the detection assessment index calculation submodule calculates, the following equation is satisfied:
;
wherein ,representing a detection assessment index; />Representing inspector scores; />Representing a detection flow score; /> and />The first index value conversion coefficient and the second index value conversion coefficient are respectively represented, and are set empirically by a inspector.
When the inspector scoring submodule calculates, the following equation is satisfied:
;
;
wherein ,indicating the +.>Scoring values of the individual participating inspectors; />Representing the total number of the inspectors participated in the current inspection process; />Indicate->Total number of tests completed by each participating inspector; />A score index reference value is expressed and empirically set by a inspector; />Indicate->A teacher rating of the individual participating inspectors; />Indicate->The work age values of the individual participating inspectors.
When the detection flow evaluation submodule calculates, the following formula is satisfied:
;
;
wherein ,representing a first detection flow scoring weight coefficient; />Representing a scoring weight coefficient of the second detection flow;indicate->Detecting flow item scores of the participating inspectors; />Representing a third detection flow scoring weight coefficient; />、/> and />Are all set by inspectors according to experience; />Indicate->The corresponding detection process is completed by each participated inspectorSum of scores of the flow items; the score of each process item in the inspection process is preset by the inspector.
When (when)When the detection and assessment information generation module generates detection and assessment information representing unqualified detection and assessment so as to prompt an inspector to re-detect; when->And the detection and assessment information generation module generates detection and assessment information representing the qualification of the assessment. />The evaluation threshold is set empirically by the inspector.
The foregoing disclosure is only a preferred embodiment of the present application and is not intended to limit the scope of the application, so that all equivalent technical changes made by the application of the present application and the accompanying drawings are included in the scope of the application, and in addition, the elements in the application can be updated with the technical development.
Claims (5)
1. The chromosome balance translocation detection analysis system based on the NGS is characterized by comprising a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from a sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information;
the sample preparation terminal comprises a DNA extraction module and a DNA library generation module; the DNA extraction module is used for extracting DNA from the sample to be detected; the DNA library generation module is used for carrying out library construction on the extracted DNA to generate a DNA library suitable for NGS sequencing;
the NGS sequencing terminal comprises an NGS platform calling module and an NGS sequencing module; the NGS platform calling module is used for calling an NGS platform preset in the system; the NGS sequencing module is used for performing high-throughput sequencing on the DNA library by utilizing a corresponding NGS platform to generate corresponding original sequencing data;
the sample data processing terminal comprises a data quality control module, a read segment comparison module, a translocation detection module and a visual information generation module; the data quality control module is used for carrying out quality filtering treatment on the original sequencing data; the read comparison module is used for comparing the sequencing data subjected to quality filtering with a reference genome to generate initial position information and direction information of each sequencing data read; the translocation detection module is used for carrying out translocation detection according to the initial position information and the direction information of each sequencing data reading segment and generating corresponding chromosome translocation information; the visual information generation module is used for carrying out graphical processing according to the chromosome translocation information and generating corresponding visual information; the data quality control module comprises a sequence quality screening sub-module and a sequence length screening sub-module; the sequence quality screening submodule is used for carrying out quality evaluation screening on each sequencing data read; the sequence length screening submodule is used for carrying out length evaluation screening on sequencing data reads subjected to quality evaluation screening;
when the sequence quality screening submodule works, the following formula is satisfied:
;
;
wherein ,representing a sequence quality assessment index corresponding to the sequencing data reads; />Representing the total number of bases in the corresponding sequencing data reads; />An exponential conversion coefficient based on the total number of bases; />Representing the +.sup.th in the corresponding sequencing data reads>Quality assessment values of individual bases; />An exponential transformation coefficient based on the quality evaluation value; />Representing the first of the corresponding sequencing data readsProbability of sequencing errors for individual bases; />The corresponding NGS platform is obtained by calculation according to the intensity and noise characteristics of the sequencing signals of the corresponding sequencing data reading section; the sequence quality screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing a sequence quality screening threshold;
when the sequence length screening submodule is operative, a sequence length evaluation index for each sequencing data read screened by sequence quality is calculated by the following equation:
;
;
wherein ,a sequence length evaluation index representing the corresponding sequencing data reads that pass the sequence quality screening; />A coefficient selection function representing the number of base length anomalies in reads based on the corresponding sequencing data screened by sequence quality; />Representing the number of base length anomalies in the corresponding sequencing data reads that pass the sequence quality screening; />Representing the +.sup.th in the corresponding sequencing data reads by sequence quality screening>The length of the individual bases; />Representing the total number of bases in the corresponding sequencing data reads that pass the sequence quality screening; the sequence length screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing the sequence length screening threshold.
2. The NGS-based chromosome balance translocation detection analysis system of claim 1, wherein the read alignment module comprises a base alignment sub-module, a scoring matrix generation sub-module, and an alignment information generation sub-module; the base alignment sub-module is used for performing base alignment on the corresponding sequencing data read section screened by the sequence length and a reference genome; the score matrix generation sub-module is used for generating a score matrix corresponding to the sequencing data reading according to the matching score information and the penalty value information in the comparison result; the comparison information generation sub-module is used for generating comparison quality information, initial position information and direction information of the corresponding sequencing data read according to the comparison result and the scoring matrix.
3. The NGS-based chromosome balance translocation detection analysis system of claim 2, wherein the alignment information generation sub-module comprises an alignment quality unit, an alignment quality information generation unit, a starting location information generation unit, and a direction information generation unit; the comparison quality index unit is used for calculating the comparison quality index of the corresponding sequencing data read and the reference genome according to the matching score information, the penalty value information and the score matrix of the corresponding sequencing data read in the comparison result; the comparison quality information generating unit generates corresponding comparison quality information according to the corresponding comparison quality index; the initial position information generation unit is used for generating initial position information of the corresponding sequencing data read on the reference genome according to the comparison result; the direction information generation unit is used for generating direction information of the corresponding sequencing data read section on the reference genome according to the comparison result;
when the comparison mass calculation unit calculates, the following equation is satisfied:
;
wherein ,representation byComparing the quality index of the corresponding sequencing data reads screened for sequence length with a reference genome;representing the +.sup.th in the corresponding sequencing data reads>Match score values for bases for which match scores are obtained; the matching score represents a score of each base participating in comparison, which is assessed by the system according to preset requirements of a inspector in the comparison process; the higher the similarity of the base corresponding to the sequencing data read to the base of the reference genome, the higher the match score; />Representing the total number of bases in the corresponding sequencing data reads that achieve a match score; />Representing the total number of bases in the corresponding sequencing data reads screened by sequence length; />Representing the +.sup.th in the corresponding sequencing data reads>Penalty value values for each base for which a penalty value is derived; the penalty value represents a numerical value given by a base which is subjected to insertion or deletion operation in the comparison process according to a preset requirement of an inspector by a system; />Representing the total number of bases in the corresponding sequencing data reads for which penalty values are obtained; />Representing the number of bases on the optimal alignment path in the scoring matrix; the scoring matrix is used for recording each corresponding sequencing data read by the form of the matrixBase alignment scores; the optimal comparison path is a path from the highest matching score to the matching score of zero in sequence in the score matrix;representing the highest matching score value in the score matrix; /> and />Respectively representing a first weight coefficient and a second weight coefficient;
the translocation detection module selectsPerforming translocation detection on the initial position information and the direction information of each sequencing data read and generating corresponding chromosome translocation information; />Representing the comparison quality reference threshold.
4. The NGS-based chromosome balance translocation detection analysis system of claim 3, wherein the translocation detection module comprises an abnormal segment identification sub-module, a rearrangement boundary positioning sub-module, a variation type identification sub-module, and a chromosome translocation information generation sub-module; the abnormal fragment identification submodule is used for generating chromosome balance translocation event information according to the comparison result; the rearrangement boundary positioning sub-module is used for positioning a rearrangement boundary of the chromosome translocation event according to the chromosome balance translocation event information; the mutation type identification submodule is used for judging the chromosome translocation type according to the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the copy numbers of the two sides of the translocation event; the chromosome translocation information generation submodule is used for generating corresponding chromosome translocation information according to the starting position information, the direction information, the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the mutation type of each sequencing data reading segment.
5. An NGS-based chromosome balance translocation detection analysis method applied to an NGS-based chromosome balance translocation detection analysis system according to claim 4, wherein the chromosome balance translocation detection analysis method comprises:
s1, extracting DNA from a sample to be detected and preparing a DNA library;
s2, performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data;
s3, performing translocation detection data processing on the original sequencing data to generate chromosome translocation information;
s4, generating and displaying translocation analysis result information according to the chromosome translocation information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310687440.8A CN116434837B (en) | 2023-06-12 | 2023-06-12 | Chromosome balance translocation detection analysis system based on NGS |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310687440.8A CN116434837B (en) | 2023-06-12 | 2023-06-12 | Chromosome balance translocation detection analysis system based on NGS |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116434837A CN116434837A (en) | 2023-07-14 |
CN116434837B true CN116434837B (en) | 2023-08-29 |
Family
ID=87087577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310687440.8A Active CN116434837B (en) | 2023-06-12 | 2023-06-12 | Chromosome balance translocation detection analysis system based on NGS |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116434837B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117012285A (en) * | 2023-10-07 | 2023-11-07 | 广州盛安医学检验有限公司 | High-throughput sequencing data processing and analysis flow management and control system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631242A (en) * | 2015-12-25 | 2016-06-01 | 中国农业大学 | Method for identifying transgenic events through whole genome sequencing data |
CN110093406A (en) * | 2019-05-27 | 2019-08-06 | 新疆农业大学 | A kind of argali and its filial generation gene research method |
CN110189796A (en) * | 2019-05-27 | 2019-08-30 | 新疆农业大学 | A kind of sheep full-length genome resurveys sequence analysis method |
WO2020022733A1 (en) * | 2018-07-27 | 2020-01-30 | 주식회사 녹십자지놈 | Whole genome sequencing-based chromosomal abnormality detection method and use thereof |
CN111276189A (en) * | 2020-02-26 | 2020-06-12 | 广州市金域转化医学研究院有限公司 | Chromosome balance translocation detection and analysis system based on NGS and application thereof |
CN115803447A (en) * | 2020-04-23 | 2023-03-14 | 荷兰皇家科学院 | Detection of structural variation in chromosome proximity experiments |
CN116030892A (en) * | 2023-03-24 | 2023-04-28 | 北京大学第三医院(北京大学第三临床医学院) | System and method for identifying chromosome reciprocal translocation breakpoint position |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021107676A1 (en) * | 2019-11-29 | 2021-06-03 | 주식회사 녹십자지놈 | Artificial intelligence-based chromosomal abnormality detection method |
-
2023
- 2023-06-12 CN CN202310687440.8A patent/CN116434837B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631242A (en) * | 2015-12-25 | 2016-06-01 | 中国农业大学 | Method for identifying transgenic events through whole genome sequencing data |
WO2020022733A1 (en) * | 2018-07-27 | 2020-01-30 | 주식회사 녹십자지놈 | Whole genome sequencing-based chromosomal abnormality detection method and use thereof |
CN110093406A (en) * | 2019-05-27 | 2019-08-06 | 新疆农业大学 | A kind of argali and its filial generation gene research method |
CN110189796A (en) * | 2019-05-27 | 2019-08-30 | 新疆农业大学 | A kind of sheep full-length genome resurveys sequence analysis method |
CN111276189A (en) * | 2020-02-26 | 2020-06-12 | 广州市金域转化医学研究院有限公司 | Chromosome balance translocation detection and analysis system based on NGS and application thereof |
CN115803447A (en) * | 2020-04-23 | 2023-03-14 | 荷兰皇家科学院 | Detection of structural variation in chromosome proximity experiments |
CN116030892A (en) * | 2023-03-24 | 2023-04-28 | 北京大学第三医院(北京大学第三临床医学院) | System and method for identifying chromosome reciprocal translocation breakpoint position |
Non-Patent Citations (1)
Title |
---|
通过高通量全基因组测序技术精确判断平衡易位断点;杨传春 等;《中国优生与遗传杂志》;第26卷(第5期);14-16、52、2 * |
Also Published As
Publication number | Publication date |
---|---|
CN116434837A (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111341383B (en) | Method, device and storage medium for detecting copy number variation | |
CN116434837B (en) | Chromosome balance translocation detection analysis system based on NGS | |
CN106021984A (en) | Whole-exome sequencing data analysis system | |
JP6715451B2 (en) | Mass spectrum analysis system, method and program | |
CN107944228B (en) | Visualization method for gene sequencing variation site | |
Blakeley-Ruiz et al. | Considerations for constructing a protein sequence database for metaproteomics | |
US20090226916A1 (en) | Automated Analysis of DNA Samples | |
CN112634987B (en) | Method and device for detecting copy number variation of single-sample tumor DNA | |
CN112052813B (en) | Method and device for identifying translocation between chromosomes, electronic equipment and readable storage medium | |
US20220277811A1 (en) | Detecting False Positive Variant Calls In Next-Generation Sequencing | |
CN115064215B (en) | Method for tracing strains and identifying attributes through similarity | |
CN115083521A (en) | Method and system for identifying tumor cell group in single cell transcriptome sequencing data | |
CN116230208B (en) | Gastric mucosa inflammation typing auxiliary diagnosis system based on deep learning | |
KR102397822B1 (en) | Apparatus and method for analyzing cells using chromosome structure and state information | |
CN110942808A (en) | Prognosis prediction method and prediction system based on gene big data | |
CN107885972A (en) | It is a kind of based on the fusion detection method of single-ended sequencing and its application | |
CN113724781B (en) | Method and apparatus for detecting homozygous deletions | |
CN112634988B (en) | Python language-based gene variation detection method and system | |
CN103488913A (en) | A computational method for mapping peptides to proteins using sequencing data | |
KR100856526B1 (en) | System comprising scoring algorithm and method for identifying alternative splicing isoforms using peptide mass fingerprinting, and recording media having program therefor | |
US20230282310A1 (en) | Microorganism Discrimination Method and System | |
CN113793641B (en) | Method for rapidly judging sample gender from FASTQ file | |
CN117012274B (en) | Device for identifying gene deletion based on high-throughput sequencing | |
CN117671676B (en) | Method for evaluating abnormal immune cells based on space transcriptome visual image | |
CN117275577A (en) | Algorithm for detecting human mitochondrial genetic mutation sites based on second-generation sequencing technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |