CN116434837B - Chromosome balance translocation detection analysis system based on NGS - Google Patents

Chromosome balance translocation detection analysis system based on NGS Download PDF

Info

Publication number
CN116434837B
CN116434837B CN202310687440.8A CN202310687440A CN116434837B CN 116434837 B CN116434837 B CN 116434837B CN 202310687440 A CN202310687440 A CN 202310687440A CN 116434837 B CN116434837 B CN 116434837B
Authority
CN
China
Prior art keywords
sequencing data
module
translocation
information
chromosome
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310687440.8A
Other languages
Chinese (zh)
Other versions
CN116434837A (en
Inventor
谢杰
文妍
邢涛
梁丽敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shengan Medical Laboratory Co ltd
Original Assignee
Guangzhou Shengan Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shengan Medical Laboratory Co ltd filed Critical Guangzhou Shengan Medical Laboratory Co ltd
Priority to CN202310687440.8A priority Critical patent/CN116434837B/en
Publication of CN116434837A publication Critical patent/CN116434837A/en
Application granted granted Critical
Publication of CN116434837B publication Critical patent/CN116434837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The application provides a chromosome balance translocation detection analysis system based on NGS, which comprises a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from the sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information. The application has the effect of improving the accuracy of chromosome balance translocation detection analysis.

Description

Chromosome balance translocation detection analysis system based on NGS
Technical Field
The application relates to the technical field of chromosome translocation detection, in particular to a chromosome balance translocation detection analysis system based on NGS.
Background
NGS-based chromosome balance translocation detection analysis system is a system that utilizes Next-generation sequencing (Next-Generation Sequencing, NGS) technology to detect and analyze chromosome balance translocation. Chromosome balance translocation refers to the exchange of two gene sequences on a chromosome, and generally involves an exchange event between two chromosomes. Such translocation may lead to the occurrence of genetic diseases, tumors, and the like.
Traditional chromosomal analysis methods, such as conventional nuclear profiling and fluorescence in situ hybridization, can detect large chromosomal structural changes, but are less sensitive to minor balanced or complex translocations. While NGS-based chromosome balance translocation detection analysis systems can provide higher resolution and sensitivity, can detect smaller structural changes, and help identify specific locations and genetic variations of translocations.
A number of systems for detecting chromosomal balance translocation have been developed, and a number of studies and references have been made to find that the systems for detecting chromosomal balance translocation of the prior art have a system for detecting chromosomal balance translocation as disclosed in publication nos. CN110265087A, CN111276189B, CN110428873A, EP3115962A4 and US4122518A, JP5219516B2, which generally comprise: the system comprises a data acquisition module, a genome comparison module, an analysis module and a result output module, wherein the data acquisition module is used for acquiring sample data of a test reading section; the genome comparison module is used for performing base comparison on the sample data and a reference genome; the analysis module is used for generating corresponding analysis information according to the comparison result; and the result output module is used for displaying the analysis information. Because the detection and analysis process of the chromosome balance translocation detection and analysis system is single, the quality control process is absent from the acquisition of sample data to the generation of analysis information, and the defect of reduced accuracy of chromosome balance translocation detection and analysis is caused.
Disclosure of Invention
The application aims to provide a chromosome balance translocation detection analysis system based on NGS, aiming at the defects of the chromosome balance translocation detection analysis system.
The application adopts the following technical scheme:
an NGS-based chromosome balance translocation detection analysis system comprises a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from a sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information;
the sample preparation terminal comprises a DNA extraction module and a DNA library generation module; the DNA extraction module is used for extracting DNA from the sample to be detected; the DNA library generation module is used for carrying out library construction on the extracted DNA to generate a DNA library suitable for NGS sequencing;
the NGS sequencing terminal comprises an NGS platform calling module and an NGS sequencing module; the NGS platform calling module is used for calling an NGS platform preset in the system; the NGS sequencing module is used for performing high-throughput sequencing on the DNA library by utilizing a corresponding NGS platform to generate corresponding original sequencing data;
the sample data processing terminal comprises a data quality control module, a read segment comparison module, a translocation detection module and a visual information generation module; the data quality control module is used for carrying out quality filtering treatment on the original sequencing data; the read comparison module is used for comparing the sequencing data subjected to quality filtering with a reference genome to generate initial position information and direction information of each sequencing data read; the translocation detection module is used for carrying out translocation detection according to the initial position information and the direction information of each sequencing data reading segment and generating corresponding chromosome translocation information; the visual information generation module is used for carrying out graphical processing according to the chromosome translocation information and generating corresponding visual information.
Optionally, the data quality control module comprises a sequence quality screening sub-module and a sequence length screening sub-module; the sequence quality screening submodule is used for carrying out quality evaluation screening on each sequencing data read; the sequence length screening submodule is used for carrying out length evaluation screening on sequencing data reads subjected to quality evaluation screening;
when the sequence quality screening submodule works, the following formula is satisfied:
wherein ,representing a sequence quality assessment index corresponding to the sequencing data reads; />Representing the total number of bases in the corresponding sequencing data reads; />An exponential conversion coefficient based on the total number of bases, which is empirically set by a inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Quality assessment values of individual bases; />An index conversion coefficient representing a quality evaluation value based on the quality evaluation value, which is empirically set by an inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Probability of sequencing errors for individual bases; />The corresponding NGS platform is obtained by calculation according to the intensity and noise characteristics of the sequencing signals of the corresponding sequencing data reading section; the sequence quality screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing a sequence quality screening threshold value, which is empirically set by an inspector;
when the sequence length screening submodule is operative, a sequence length evaluation index for each sequencing data read screened by sequence quality is calculated by the following equation:
wherein ,a sequence length evaluation index representing the corresponding sequencing data reads that pass the sequence quality screening; />A coefficient selection function representing the number of base length anomalies in reads based on the corresponding sequencing data screened by sequence quality;representing the number of base length anomalies in the corresponding sequencing data reads that pass the sequence quality screening; />Representing the +.sup.th in the corresponding sequencing data reads by sequence quality screening>The length of the individual bases; />Representing the total number of bases in the corresponding sequencing data reads that pass the sequence quality screening; the sequence length screening submodule screens out ∈10->Is a sequencing data read of (a); />The sequence length screening threshold is expressed and empirically set by the inspector.
Optionally, the read comparison module comprises a base comparison sub-module, a scoring matrix generation sub-module and a comparison information generation sub-module; the base alignment sub-module is used for performing base alignment on the corresponding sequencing data read section screened by the sequence length and a reference genome; the score matrix generation sub-module is used for generating a score matrix corresponding to the sequencing data reading according to the matching score information and the penalty value information in the comparison result; the comparison information generation sub-module is used for generating comparison quality information, initial position information and direction information of the corresponding sequencing data read according to the comparison result and the scoring matrix.
Optionally, the comparison information generating sub-module includes a comparison quality unit, a comparison quality information generating unit, a starting position information generating unit and a direction information generating unit; the comparison quality index unit is used for calculating the comparison quality index of the corresponding sequencing data read and the reference genome according to the matching score information, the penalty value information and the score matrix of the corresponding sequencing data read in the comparison result; the comparison quality information generating unit generates corresponding comparison quality information according to the corresponding comparison quality index; the initial position information generation unit is used for generating initial position information of the corresponding sequencing data read on the reference genome according to the comparison result; the direction information generation unit is used for generating direction information of the corresponding sequencing data read section on the reference genome according to the comparison result;
when the comparison mass calculation unit calculates, the following equation is satisfied:
wherein ,representing an alignment quality index of the corresponding sequencing data reads screened by sequence length to a reference genome; />Representing the +.sup.th in the corresponding sequencing data reads>Match score values for bases for which match scores are obtained; the matching score represents a score of each base participating in comparison, which is assessed by the system according to preset requirements of a inspector in the comparison process; the higher the similarity of the base corresponding to the sequencing data read to the base of the reference genome, the higher the match score; />Representing the total number of bases in the corresponding sequencing data reads that achieve a match score; />Representing the total number of bases in the corresponding sequencing data reads screened by sequence length; />Representing the +.sup.th in the corresponding sequencing data reads>Penalty value values for each base for which a penalty value is derived; the penalty value represents a numerical value given by a base which is subjected to insertion or deletion operation in the comparison process according to a preset requirement of an inspector by a system;/>representing the total number of bases in the corresponding sequencing data reads for which penalty values are obtained; />Representing the number of bases on the optimal alignment path in the scoring matrix; the scoring matrix is used for recording the comparison score of each base of the corresponding sequencing data reading segment in a matrix form; the optimal comparison path is a path from the highest matching score to the matching score of zero in sequence in the score matrix; />Representing the highest matching score value in the score matrix; /> and />Respectively representing a first weight coefficient and a second weight coefficient, which are set by a inspector according to experience;
the translocation detection module selectsPerforming translocation detection on the initial position information and the direction information of each sequencing data read and generating corresponding chromosome translocation information; />The reference quality threshold is expressed and set empirically by the inspector.
Optionally, the translocation detection module comprises an abnormal fragment identification sub-module, a rearrangement boundary positioning sub-module, a mutation type identification sub-module and a chromosome translocation information generation sub-module; the abnormal fragment identification submodule is used for generating chromosome balance translocation event information according to the comparison result; the rearrangement boundary positioning sub-module is used for positioning a rearrangement boundary of the chromosome translocation event according to the chromosome balance translocation event information; the mutation type identification submodule is used for judging the chromosome translocation type according to the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the copy numbers of the two sides of the translocation event; the chromosome translocation information generation submodule is used for generating corresponding chromosome translocation information according to the starting position information, the direction information, the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the mutation type of each sequencing data reading segment.
An NGS-based chromosome balance translocation detection analysis method applied to the NGS-based chromosome balance translocation detection analysis system, the chromosome balance translocation detection analysis method comprising:
s1, extracting DNA from a sample to be detected and preparing a DNA library;
s2, performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data;
s3, performing translocation detection data processing on the original sequencing data to generate chromosome translocation information;
s4, generating and displaying translocation analysis result information according to the chromosome translocation information.
The beneficial effects obtained by the application are as follows:
1. the sample preparation terminal, the NGS sequencing terminal, the sample data processing terminal and the analysis result display terminal are arranged to be beneficial to improving the accuracy of the detection process from the construction of the DNA library, and chromosome translocation information is obtained through accurate original sequencing data, so that the analysis result is more accurate and clearer, and the accuracy of chromosome balance translocation detection analysis is improved;
2. the arrangement of the DNA extraction module and the DNA library generation module is beneficial to improving the accuracy and timeliness of DNA extraction, so that the DNA library is more accurate, thereby being beneficial to improving the accuracy of the detection process;
3. the NGS platform calling module and the NGS sequencing module are arranged to be beneficial to improving the adaptability and accuracy of NGS platform calling, and the constructed DNA library is subjected to high-throughput sequencing by a more accurate and more suitable NGS platform, so that the accuracy of the NGS sequencing process is improved;
4. the data quality control module, the read segment comparison module, the translocation detection module and the visual information generation module are arranged to be beneficial to the accuracy of a sample data processing process, so that the chromosomal translocation information is more accurate, the display and analysis are more accurately facilitated through the visual information, and the accuracy of the chromosomal balance translocation detection analysis is improved;
5. the sequence quality screening submodule and the sequence length screening submodule are matched with a sequence quality evaluation index algorithm and a sequence length evaluation index algorithm, so that sequencing data reads can be evaluated and screened in sequence, the quality of the screened test data reads is improved, the analysis accuracy is improved, and the accuracy of chromosome balance translocation detection is improved;
6. the base comparison sub-module, the scoring matrix generation sub-module and the comparison information generation sub-module are arranged to be beneficial to improving the base comparison efficiency and accuracy, and the scoring matrix is beneficial to improving the accuracy of analysis, so that the accuracy of chromosome balance translocation detection is improved;
7. the comparison quality index algorithm is matched by the comparison quality unit, the comparison quality information generation unit, the initial position information generation unit and the direction information generation unit, sequencing data reading segments with better comparison quality and comparison results thereof are further screened and analyzed, so that the chromosome translocation information is more accurate, and the accuracy of chromosome balance translocation detection is improved;
8. the arrangement of the abnormal segment identification sub-module, the rearrangement boundary positioning sub-module, the mutation type identification sub-module and the chromosome translocation information generation sub-module is beneficial to efficiently and accurately completing chromosome balance translocation analysis, so that more accurate chromosome translocation information is generated;
9. the inspector scoring module, the inspection flow scoring module and the inspection index computing sub-module are matched with the inspector scoring algorithm and the inspection flow scoring algorithm, so that the accuracy of inspector scoring and inspection flow scoring is improved, the accuracy of inspection indexes is further improved, the accuracy of inspection information is further improved, and the accuracy of chromosome balance translocation detection is improved.
For a further understanding of the nature and the technical aspects of the present application, reference should be made to the following detailed description of the application and the accompanying drawings, which are provided for purposes of reference only and are not intended to limit the application.
Drawings
FIG. 1 is a schematic diagram of the overall structure of the present application;
FIG. 2 is a schematic diagram of a data quality control module according to the present application;
FIG. 3 is a schematic flow chart of a method for detecting and analyzing a chromosome balance translocation based on NGS according to the present application;
FIG. 4 is a schematic diagram of the overall structure of an NGS-based chromosome balance translocation detection analysis system according to another embodiment of the present application.
Detailed Description
The following embodiments of the present application are described in terms of specific examples, and those skilled in the art will appreciate the advantages and effects of the present application from the disclosure herein. The application is capable of other and different embodiments and its several details are capable of modification and variation in various respects, all without departing from the spirit of the present application. The drawings of the present application are merely schematic illustrations, and are not drawn to actual dimensions, and are stated in advance. The following embodiments will further illustrate the related art of the present application in detail, but the disclosure is not intended to limit the scope of the present application.
Embodiment one: the present embodiment provides a NGS-based chromosome balance translocation detection analysis system. Referring to FIG. 1, an NGS-based chromosome balance translocation detection analysis system comprises a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from a sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information;
the sample preparation terminal comprises a DNA extraction module and a DNA library generation module; the DNA extraction module is used for extracting DNA from the sample to be detected; the DNA library generation module is used for carrying out library construction on the extracted DNA to generate a DNA library suitable for NGS sequencing;
the NGS sequencing terminal comprises an NGS platform calling module and an NGS sequencing module; the NGS platform calling module is used for calling an NGS platform preset in the system; the NGS sequencing module is used for performing high-throughput sequencing on the DNA library by utilizing a corresponding NGS platform to generate corresponding original sequencing data;
the sample data processing terminal comprises a data quality control module, a read segment comparison module, a translocation detection module and a visual information generation module; the data quality control module is used for carrying out quality filtering treatment on the original sequencing data; the read comparison module is used for comparing the sequencing data subjected to quality filtering with a reference genome to generate initial position information and direction information of each sequencing data read; the translocation detection module is used for carrying out translocation detection according to the initial position information and the direction information of each sequencing data reading segment and generating corresponding chromosome translocation information; the visual information generation module is used for carrying out graphical processing according to the chromosome translocation information and generating corresponding visual information.
Optionally, referring to fig. 2, the data quality control module includes a sequence quality screening sub-module and a sequence length screening sub-module; the sequence quality screening submodule is used for carrying out quality evaluation screening on each sequencing data read; the sequence length screening submodule is used for carrying out length evaluation screening on sequencing data reads subjected to quality evaluation screening;
when the sequence quality screening submodule works, the following formula is satisfied:
wherein ,representing a sequence quality assessment index corresponding to the sequencing data reads; />Representing the total number of bases in the corresponding sequencing data reads; />An exponential conversion coefficient based on the total number of bases, which is empirically set by a inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Quality assessment values of individual bases; />An index conversion coefficient representing a quality evaluation value based on the quality evaluation value, which is empirically set by an inspector; />Representing the +.sup.th in the corresponding sequencing data reads>Probability of sequencing errors for individual bases; />The corresponding NGS platform is obtained by calculation according to the intensity and noise characteristics of the sequencing signals of the corresponding sequencing data reading section; the sequence quality screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing a sequence quality screening threshold value, which is empirically set by an inspector;
when the sequence length screening submodule is operative, a sequence length evaluation index for each sequencing data read screened by sequence quality is calculated by the following equation:
wherein ,a sequence length evaluation index representing the corresponding sequencing data reads that pass the sequence quality screening; />A coefficient selection function representing the number of base length anomalies in reads based on the corresponding sequencing data screened by sequence quality;representing the number of base length anomalies in the corresponding sequencing data reads that pass the sequence quality screening; />Representing the +.sup.th in the corresponding sequencing data reads by sequence quality screening>The length of the individual bases; />Representing the total number of bases in the corresponding sequencing data reads that pass the sequence quality screening; the sequence length screening submodule screens out ∈10->Is a sequencing data read of (a); />The sequence length screening threshold is expressed and empirically set by the inspector.
Optionally, the read comparison module comprises a base comparison sub-module, a scoring matrix generation sub-module and a comparison information generation sub-module; the base alignment sub-module is used for performing base alignment on the corresponding sequencing data read section screened by the sequence length and a reference genome; the score matrix generation sub-module is used for generating a score matrix corresponding to the sequencing data reading according to the matching score information and the penalty value information in the comparison result; the comparison information generation sub-module is used for generating comparison quality information, initial position information and direction information of the corresponding sequencing data read according to the comparison result and the scoring matrix.
Optionally, the comparison information generating sub-module includes a comparison quality unit, a comparison quality information generating unit, a starting position information generating unit and a direction information generating unit; the comparison quality index unit is used for calculating the comparison quality index of the corresponding sequencing data read and the reference genome according to the matching score information, the penalty value information and the score matrix of the corresponding sequencing data read in the comparison result; the comparison quality information generating unit generates corresponding comparison quality information according to the corresponding comparison quality index; the initial position information generation unit is used for generating initial position information of the corresponding sequencing data read on the reference genome according to the comparison result; the direction information generation unit is used for generating direction information of the corresponding sequencing data read section on the reference genome according to the comparison result;
when the comparison mass calculation unit calculates, the following equation is satisfied:
wherein ,representing an alignment quality index of the corresponding sequencing data reads screened by sequence length to a reference genome; />Representing the +.sup.th in the corresponding sequencing data reads>Match score values for bases for which match scores are obtained; the matching score represents a score of each base participating in comparison, which is assessed by the system according to preset requirements of a inspector in the comparison process; the higher the similarity of the base corresponding to the sequencing data read to the base of the reference genome, the higher the match score; />Representing the total number of bases in the corresponding sequencing data reads that achieve a match score; />Representing the total number of bases in the corresponding sequencing data reads screened by sequence length; />Representing the +.sup.th in the corresponding sequencing data reads>Penalty value values for each base for which a penalty value is derived; the penalty value represents a numerical value given by a base which is subjected to insertion or deletion operation in the comparison process according to a preset requirement of an inspector by a system;/>representing the total number of bases in the corresponding sequencing data reads for which penalty values are obtained; />Representing the number of bases on the optimal alignment path in the scoring matrix; the scoring matrix is used for recording the comparison score of each base of the corresponding sequencing data reading segment in a matrix form; the optimal comparison path is a path from the highest matching score to the matching score of zero in sequence in the score matrix; />Representing the highest matching score value in the score matrix; /> and />Respectively representing a first weight coefficient and a second weight coefficient, which are set by a inspector according to experience;
the translocation detection module selectsPerforming translocation detection on the initial position information and the direction information of each sequencing data read and generating corresponding chromosome translocation information; />The reference quality threshold is expressed and set empirically by the inspector.
Optionally, the translocation detection module comprises an abnormal fragment identification sub-module, a rearrangement boundary positioning sub-module, a mutation type identification sub-module and a chromosome translocation information generation sub-module; the abnormal fragment identification submodule is used for generating chromosome balance translocation event information according to the comparison result; the rearrangement boundary positioning sub-module is used for positioning a rearrangement boundary of the chromosome translocation event according to the chromosome balance translocation event information; the mutation type identification submodule is used for judging the chromosome translocation type according to the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the copy numbers of the two sides of the translocation event; the chromosome translocation information generation submodule is used for generating corresponding chromosome translocation information according to the starting position information, the direction information, the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the mutation type of each sequencing data reading segment.
An NGS-based chromosome balance translocation detection analysis method, which is applied to the NGS-based chromosome balance translocation detection analysis system, is shown in fig. 3, and comprises the following steps:
s1, extracting DNA from a sample to be detected and preparing a DNA library;
s2, performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data;
s3, performing translocation detection data processing on the original sequencing data to generate chromosome translocation information;
s4, generating and displaying translocation analysis result information according to the chromosome translocation information.
Embodiment two: the embodiment includes the whole content of the first embodiment, and provides an NGS-based chromosome balance translocation detection analysis system, which, with reference to fig. 4, further includes a detection and assessment terminal; the detection and assessment terminal comprises a detection and assessment index calculation module and a detection and assessment information generation module; the detection assessment index calculation module is used for calculating a corresponding detection assessment index according to the scores of the inspectors and the scores of the detection flow in the detection process; the detection and assessment information generation module is used for generating corresponding detection and assessment information according to the detection and assessment indexes.
The detection assessment index calculation module comprises an inspector assessment sub-module, a detection flow assessment sub-module and a detection assessment index calculation sub-module; the inspector scoring submodule is used for calculating inspector scores of corresponding inspectors according to the working ages, the total detection numbers and the teacher ratings of the inspectors; the detection flow scoring sub-module is used for calculating the detection flow score of the corresponding detection according to the number of detectors in the detection flow, the rating of the directors of the detectors and the score of the corresponding detection flow item; the detection assessment index calculation sub-module is used for calculating a detection assessment index according to the score of the inspector and the score of the detection flow.
When the detection assessment index calculation submodule calculates, the following equation is satisfied:
wherein ,representing a detection assessment index; />Representing inspector scores; />Representing a detection flow score; /> and />The first index value conversion coefficient and the second index value conversion coefficient are respectively represented, and are set empirically by a inspector.
When the inspector scoring submodule calculates, the following equation is satisfied:
wherein ,indicating the +.>Scoring values of the individual participating inspectors; />Representing the total number of the inspectors participated in the current inspection process; />Indicate->Total number of tests completed by each participating inspector; />A score index reference value is expressed and empirically set by a inspector; />Indicate->A teacher rating of the individual participating inspectors; />Indicate->The work age values of the individual participating inspectors.
When the detection flow evaluation submodule calculates, the following formula is satisfied:
wherein ,representing a first detection flow scoring weight coefficient; />Representing a scoring weight coefficient of the second detection flow;indicate->Detecting flow item scores of the participating inspectors; />Representing a third detection flow scoring weight coefficient; />、/> and />Are all set by inspectors according to experience; />Indicate->The corresponding detection process is completed by each participated inspectorSum of scores of the flow items; the score of each process item in the inspection process is preset by the inspector.
When (when)When the detection and assessment information generation module generates detection and assessment information representing unqualified detection and assessment so as to prompt an inspector to re-detect; when->And the detection and assessment information generation module generates detection and assessment information representing the qualification of the assessment. />The evaluation threshold is set empirically by the inspector.
The foregoing disclosure is only a preferred embodiment of the present application and is not intended to limit the scope of the application, so that all equivalent technical changes made by the application of the present application and the accompanying drawings are included in the scope of the application, and in addition, the elements in the application can be updated with the technical development.

Claims (5)

1. The chromosome balance translocation detection analysis system based on the NGS is characterized by comprising a sample preparation terminal, an NGS sequencing terminal, a sample data processing terminal and an analysis result display terminal; the sample preparation terminal is used for extracting DNA from a sample to be detected and preparing a DNA library; the NGS sequencing terminal is used for performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data; the sample data processing terminal is used for performing translocation detection data processing on the original sequencing data to generate chromosome translocation information; the analysis result display terminal is used for generating and displaying translocation analysis result information according to the chromosome translocation information;
the sample preparation terminal comprises a DNA extraction module and a DNA library generation module; the DNA extraction module is used for extracting DNA from the sample to be detected; the DNA library generation module is used for carrying out library construction on the extracted DNA to generate a DNA library suitable for NGS sequencing;
the NGS sequencing terminal comprises an NGS platform calling module and an NGS sequencing module; the NGS platform calling module is used for calling an NGS platform preset in the system; the NGS sequencing module is used for performing high-throughput sequencing on the DNA library by utilizing a corresponding NGS platform to generate corresponding original sequencing data;
the sample data processing terminal comprises a data quality control module, a read segment comparison module, a translocation detection module and a visual information generation module; the data quality control module is used for carrying out quality filtering treatment on the original sequencing data; the read comparison module is used for comparing the sequencing data subjected to quality filtering with a reference genome to generate initial position information and direction information of each sequencing data read; the translocation detection module is used for carrying out translocation detection according to the initial position information and the direction information of each sequencing data reading segment and generating corresponding chromosome translocation information; the visual information generation module is used for carrying out graphical processing according to the chromosome translocation information and generating corresponding visual information; the data quality control module comprises a sequence quality screening sub-module and a sequence length screening sub-module; the sequence quality screening submodule is used for carrying out quality evaluation screening on each sequencing data read; the sequence length screening submodule is used for carrying out length evaluation screening on sequencing data reads subjected to quality evaluation screening;
when the sequence quality screening submodule works, the following formula is satisfied:
wherein ,representing a sequence quality assessment index corresponding to the sequencing data reads; />Representing the total number of bases in the corresponding sequencing data reads; />An exponential conversion coefficient based on the total number of bases; />Representing the +.sup.th in the corresponding sequencing data reads>Quality assessment values of individual bases; />An exponential transformation coefficient based on the quality evaluation value; />Representing the first of the corresponding sequencing data readsProbability of sequencing errors for individual bases; />The corresponding NGS platform is obtained by calculation according to the intensity and noise characteristics of the sequencing signals of the corresponding sequencing data reading section; the sequence quality screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing a sequence quality screening threshold;
when the sequence length screening submodule is operative, a sequence length evaluation index for each sequencing data read screened by sequence quality is calculated by the following equation:
wherein ,a sequence length evaluation index representing the corresponding sequencing data reads that pass the sequence quality screening; />A coefficient selection function representing the number of base length anomalies in reads based on the corresponding sequencing data screened by sequence quality; />Representing the number of base length anomalies in the corresponding sequencing data reads that pass the sequence quality screening; />Representing the +.sup.th in the corresponding sequencing data reads by sequence quality screening>The length of the individual bases; />Representing the total number of bases in the corresponding sequencing data reads that pass the sequence quality screening; the sequence length screening submodule screens out ∈10->Is a sequencing data read of (a); />Representing the sequence length screening threshold.
2. The NGS-based chromosome balance translocation detection analysis system of claim 1, wherein the read alignment module comprises a base alignment sub-module, a scoring matrix generation sub-module, and an alignment information generation sub-module; the base alignment sub-module is used for performing base alignment on the corresponding sequencing data read section screened by the sequence length and a reference genome; the score matrix generation sub-module is used for generating a score matrix corresponding to the sequencing data reading according to the matching score information and the penalty value information in the comparison result; the comparison information generation sub-module is used for generating comparison quality information, initial position information and direction information of the corresponding sequencing data read according to the comparison result and the scoring matrix.
3. The NGS-based chromosome balance translocation detection analysis system of claim 2, wherein the alignment information generation sub-module comprises an alignment quality unit, an alignment quality information generation unit, a starting location information generation unit, and a direction information generation unit; the comparison quality index unit is used for calculating the comparison quality index of the corresponding sequencing data read and the reference genome according to the matching score information, the penalty value information and the score matrix of the corresponding sequencing data read in the comparison result; the comparison quality information generating unit generates corresponding comparison quality information according to the corresponding comparison quality index; the initial position information generation unit is used for generating initial position information of the corresponding sequencing data read on the reference genome according to the comparison result; the direction information generation unit is used for generating direction information of the corresponding sequencing data read section on the reference genome according to the comparison result;
when the comparison mass calculation unit calculates, the following equation is satisfied:
wherein ,representation byComparing the quality index of the corresponding sequencing data reads screened for sequence length with a reference genome;representing the +.sup.th in the corresponding sequencing data reads>Match score values for bases for which match scores are obtained; the matching score represents a score of each base participating in comparison, which is assessed by the system according to preset requirements of a inspector in the comparison process; the higher the similarity of the base corresponding to the sequencing data read to the base of the reference genome, the higher the match score; />Representing the total number of bases in the corresponding sequencing data reads that achieve a match score; />Representing the total number of bases in the corresponding sequencing data reads screened by sequence length; />Representing the +.sup.th in the corresponding sequencing data reads>Penalty value values for each base for which a penalty value is derived; the penalty value represents a numerical value given by a base which is subjected to insertion or deletion operation in the comparison process according to a preset requirement of an inspector by a system; />Representing the total number of bases in the corresponding sequencing data reads for which penalty values are obtained; />Representing the number of bases on the optimal alignment path in the scoring matrix; the scoring matrix is used for recording each corresponding sequencing data read by the form of the matrixBase alignment scores; the optimal comparison path is a path from the highest matching score to the matching score of zero in sequence in the score matrix;representing the highest matching score value in the score matrix; /> and />Respectively representing a first weight coefficient and a second weight coefficient;
the translocation detection module selectsPerforming translocation detection on the initial position information and the direction information of each sequencing data read and generating corresponding chromosome translocation information; />Representing the comparison quality reference threshold.
4. The NGS-based chromosome balance translocation detection analysis system of claim 3, wherein the translocation detection module comprises an abnormal segment identification sub-module, a rearrangement boundary positioning sub-module, a variation type identification sub-module, and a chromosome translocation information generation sub-module; the abnormal fragment identification submodule is used for generating chromosome balance translocation event information according to the comparison result; the rearrangement boundary positioning sub-module is used for positioning a rearrangement boundary of the chromosome translocation event according to the chromosome balance translocation event information; the mutation type identification submodule is used for judging the chromosome translocation type according to the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the copy numbers of the two sides of the translocation event; the chromosome translocation information generation submodule is used for generating corresponding chromosome translocation information according to the starting position information, the direction information, the chromosome balance translocation event information, the rearrangement boundary of the chromosome translocation event and the mutation type of each sequencing data reading segment.
5. An NGS-based chromosome balance translocation detection analysis method applied to an NGS-based chromosome balance translocation detection analysis system according to claim 4, wherein the chromosome balance translocation detection analysis method comprises:
s1, extracting DNA from a sample to be detected and preparing a DNA library;
s2, performing high-throughput sequencing on the constructed DNA library to generate corresponding original sequencing data;
s3, performing translocation detection data processing on the original sequencing data to generate chromosome translocation information;
s4, generating and displaying translocation analysis result information according to the chromosome translocation information.
CN202310687440.8A 2023-06-12 2023-06-12 Chromosome balance translocation detection analysis system based on NGS Active CN116434837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310687440.8A CN116434837B (en) 2023-06-12 2023-06-12 Chromosome balance translocation detection analysis system based on NGS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310687440.8A CN116434837B (en) 2023-06-12 2023-06-12 Chromosome balance translocation detection analysis system based on NGS

Publications (2)

Publication Number Publication Date
CN116434837A CN116434837A (en) 2023-07-14
CN116434837B true CN116434837B (en) 2023-08-29

Family

ID=87087577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310687440.8A Active CN116434837B (en) 2023-06-12 2023-06-12 Chromosome balance translocation detection analysis system based on NGS

Country Status (1)

Country Link
CN (1) CN116434837B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117012285A (en) * 2023-10-07 2023-11-07 广州盛安医学检验有限公司 High-throughput sequencing data processing and analysis flow management and control system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631242A (en) * 2015-12-25 2016-06-01 中国农业大学 Method for identifying transgenic events through whole genome sequencing data
CN110093406A (en) * 2019-05-27 2019-08-06 新疆农业大学 A kind of argali and its filial generation gene research method
CN110189796A (en) * 2019-05-27 2019-08-30 新疆农业大学 A kind of sheep full-length genome resurveys sequence analysis method
WO2020022733A1 (en) * 2018-07-27 2020-01-30 주식회사 녹십자지놈 Whole genome sequencing-based chromosomal abnormality detection method and use thereof
CN111276189A (en) * 2020-02-26 2020-06-12 广州市金域转化医学研究院有限公司 Chromosome balance translocation detection and analysis system based on NGS and application thereof
CN115803447A (en) * 2020-04-23 2023-03-14 荷兰皇家科学院 Detection of structural variation in chromosome proximity experiments
CN116030892A (en) * 2023-03-24 2023-04-28 北京大学第三医院(北京大学第三临床医学院) System and method for identifying chromosome reciprocal translocation breakpoint position

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021107676A1 (en) * 2019-11-29 2021-06-03 주식회사 녹십자지놈 Artificial intelligence-based chromosomal abnormality detection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631242A (en) * 2015-12-25 2016-06-01 中国农业大学 Method for identifying transgenic events through whole genome sequencing data
WO2020022733A1 (en) * 2018-07-27 2020-01-30 주식회사 녹십자지놈 Whole genome sequencing-based chromosomal abnormality detection method and use thereof
CN110093406A (en) * 2019-05-27 2019-08-06 新疆农业大学 A kind of argali and its filial generation gene research method
CN110189796A (en) * 2019-05-27 2019-08-30 新疆农业大学 A kind of sheep full-length genome resurveys sequence analysis method
CN111276189A (en) * 2020-02-26 2020-06-12 广州市金域转化医学研究院有限公司 Chromosome balance translocation detection and analysis system based on NGS and application thereof
CN115803447A (en) * 2020-04-23 2023-03-14 荷兰皇家科学院 Detection of structural variation in chromosome proximity experiments
CN116030892A (en) * 2023-03-24 2023-04-28 北京大学第三医院(北京大学第三临床医学院) System and method for identifying chromosome reciprocal translocation breakpoint position

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
通过高通量全基因组测序技术精确判断平衡易位断点;杨传春 等;《中国优生与遗传杂志》;第26卷(第5期);14-16、52、2 *

Also Published As

Publication number Publication date
CN116434837A (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN111341383B (en) Method, device and storage medium for detecting copy number variation
CN116434837B (en) Chromosome balance translocation detection analysis system based on NGS
CN106021984A (en) Whole-exome sequencing data analysis system
JP6715451B2 (en) Mass spectrum analysis system, method and program
CN107944228B (en) Visualization method for gene sequencing variation site
Blakeley-Ruiz et al. Considerations for constructing a protein sequence database for metaproteomics
US20090226916A1 (en) Automated Analysis of DNA Samples
CN112634987B (en) Method and device for detecting copy number variation of single-sample tumor DNA
CN112052813B (en) Method and device for identifying translocation between chromosomes, electronic equipment and readable storage medium
US20220277811A1 (en) Detecting False Positive Variant Calls In Next-Generation Sequencing
CN115064215B (en) Method for tracing strains and identifying attributes through similarity
CN115083521A (en) Method and system for identifying tumor cell group in single cell transcriptome sequencing data
CN116230208B (en) Gastric mucosa inflammation typing auxiliary diagnosis system based on deep learning
KR102397822B1 (en) Apparatus and method for analyzing cells using chromosome structure and state information
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
CN107885972A (en) It is a kind of based on the fusion detection method of single-ended sequencing and its application
CN113724781B (en) Method and apparatus for detecting homozygous deletions
CN112634988B (en) Python language-based gene variation detection method and system
CN103488913A (en) A computational method for mapping peptides to proteins using sequencing data
KR100856526B1 (en) System comprising scoring algorithm and method for identifying alternative splicing isoforms using peptide mass fingerprinting, and recording media having program therefor
US20230282310A1 (en) Microorganism Discrimination Method and System
CN113793641B (en) Method for rapidly judging sample gender from FASTQ file
CN117012274B (en) Device for identifying gene deletion based on high-throughput sequencing
CN117671676B (en) Method for evaluating abnormal immune cells based on space transcriptome visual image
CN117275577A (en) Algorithm for detecting human mitochondrial genetic mutation sites based on second-generation sequencing technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant