CN115831237A - DNA data classification method and system based on sequencing platform according to data quality - Google Patents

DNA data classification method and system based on sequencing platform according to data quality Download PDF

Info

Publication number
CN115831237A
CN115831237A CN202111095386.5A CN202111095386A CN115831237A CN 115831237 A CN115831237 A CN 115831237A CN 202111095386 A CN202111095386 A CN 202111095386A CN 115831237 A CN115831237 A CN 115831237A
Authority
CN
China
Prior art keywords
data
dna data
dna
sequencing platform
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111095386.5A
Other languages
Chinese (zh)
Inventor
刘忠
张燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI BOQ BIOTECH Ltd
Original Assignee
SHANGHAI BOQ BIOTECH Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI BOQ BIOTECH Ltd filed Critical SHANGHAI BOQ BIOTECH Ltd
Priority to CN202111095386.5A priority Critical patent/CN115831237A/en
Publication of CN115831237A publication Critical patent/CN115831237A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a DNA data classification method, a system, electronic equipment and a medium based on a sequencing platform according to data quality. The classification method comprises the following steps: acquiring original DNA data by a DNA automatic sequencer based on a sequencing platform; extracting target parameters from the original DNA data; the raw DNA data is classified according to the target parameters. According to the technical scheme of the invention, the original data obtained by the automatic DNA sequencer based on the sequencing platform is automatically classified, so that the workload of manual classification can be reduced, the classification efficiency is improved, and the classification accuracy is improved.

Description

DNA data classification method and system based on sequencing platform according to data quality
Technical Field
The invention belongs to the technical field of DNA (deoxyribonucleic acid) data classification, and particularly relates to a DNA data classification method and system based on a sequencing platform according to data quality.
Background
An automatic DNA sequencer based on a Sanger (a DNA sequencing method) DNA sequencing platform generally adopts a working mode of cluster capillary electrophoresis to simultaneously separate and analyze a plurality of samples at a time. Because the source, content, purity and capillary state of the sample tested on the computer have great influence on the quality of the test result, the working mode of the bundled capillary electrophoresis makes a plurality of samples tested at one time only adopt the same electrophoresis condition, and the DNA data quality of different samples tested in the same batch is different. At present, data with better quality and data with poorer quality are generally screened out manually and then are subjected to subsequent analysis. The manual screening efficiency is low, and errors are easy to occur.
Disclosure of Invention
The invention aims to overcome the defects of low efficiency and easy error in DNA data screening and classification in the prior art, and provides a method and a system for classifying DNA data according to data quality based on a sequencing platform.
The invention solves the technical problems through the following technical scheme:
the invention provides a DNA data classification method based on a sequencing platform according to data quality, which comprises the following steps:
acquiring original DNA data by a DNA automatic sequencer based on a sequencing platform;
extracting target parameters from the original DNA data;
the raw DNA data is classified according to the target parameters.
Preferably, the target parameter includes at least one of signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixing peak, capillary running state, influence of impurities, influence of bubbles.
Preferably, classifying the raw DNA data according to the target parameters comprises:
and classifying the original DNA data according to a preset threshold corresponding to the target parameter.
Preferably, classifying the raw DNA data according to a preset threshold corresponding to the target parameter comprises:
if the target parameter is larger than a preset threshold value, dividing the original DNA data into a first class; and if the target parameter is less than or equal to the preset threshold value, dividing the original DNA data into a second class.
Preferably, the method further comprises the steps of:
the type of the raw DNA data is adjusted according to the target parameters.
Preferably, the method further comprises the steps of:
obtaining a plurality of batches of original DNA sequence files;
classifying original DNA data contained in an original DNA sequence file to obtain a target type;
and putting the original DNA sequence file into a folder corresponding to the target type.
Preferably, the sequencing platform is a Sanger platform.
The invention also provides a DNA data classification system based on the sequencing platform according to the data quality, which comprises an acquisition module, an extraction module and a classification module;
the acquisition module is used for acquiring original DNA data by the automatic DNA sequencer based on the sequencing platform;
the extraction module is used for extracting target parameters from the original DNA data;
the classification module is used for classifying the original DNA data according to the target parameters.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the computer program, the DNA data classification method based on the sequencing platform according to the data quality is realized.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the sequencing platform based DNA data classification method of the present invention as a function of data quality.
The positive progress effects of the invention are as follows: according to the technical scheme of the invention, the original data obtained by the automatic DNA sequencer based on the sequencing platform is automatically classified, so that the workload of manual classification can be reduced, the classification efficiency is improved, and the classification accuracy is improved.
Drawings
FIG. 1 is a flow chart of a method for DNA data classification based on sequencing platform according to data quality according to example 1 of the present invention.
FIG. 2 is a schematic diagram of a DNA data classification system based on a sequencing platform according to data quality according to example 2 of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to embodiment 3 of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.
Example 1
The embodiment provides a DNA data classification method according to data quality based on a sequencing platform. Referring to fig. 1, the DNA data classification method according to data quality based on a sequencing platform includes the following steps:
s1, obtaining original DNA data by a DNA automatic sequencer based on a sequencing platform.
And S2, extracting target parameters from the original DNA data.
And S3, classifying the original DNA data according to the target parameters.
As an alternative embodiment, the sequencing platform is a Sanger platform. In specific implementation, in step S1, the Sanger sequencing platform-based DNA automatic sequencer obtains original DNA data.
Then, in step S2, target parameters are extracted from the raw DNA data. The target parameters include signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles, and the like. In the process of acquiring original DNA data by a DNA automatic sequencer based on a Sanger sequencing platform, parameter information such as signal intensity, signal to noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like can be acquired, and the original DNA data comprises the parameter information such as signal intensity, signal to noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like. The original DNA data has a preset data format, and parameter information such as signal intensity, signal to noise ratio, total base number, credible base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like is stored in the original DNA data according to the preset format.
In step S2, the original DNA data is analyzed according to a preset format, and parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, and influence of bubbles can be obtained.
Next, in step S3, the raw DNA data is classified according to the target parameters.
In specific implementation, a plurality of types of DNA data are divided into numerical value intervals to which each item of parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like belongs. In some alternative embodiments, the criteria for classifying several types of DNA data is implemented with reference to standards currently in the art. The type of DNA data is clear to those skilled in the art. The original DNA data is divided into corresponding types according to parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles, etc., which can be realized by those skilled in the art and is not described herein again. In other alternative embodiments, several types of criteria for partitioning the DNA data may be set appropriately as desired.
As an optional implementation manner, a corresponding preset threshold is set for parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the like.
In an alternative embodiment, in step S3, the raw DNA data is classified according to a preset threshold corresponding to the target parameter. That is, for parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the like, corresponding preset thresholds are respectively set, and then the original DNA data are classified according to the magnitude relationship between the signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the corresponding preset thresholds. The preset thresholds corresponding to parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like are reasonably set by those skilled in the art according to corresponding standards in the technical field of DNA data, and are not described herein again.
In an alternative embodiment, the method for DNA data classification based on sequencing platform according to data quality further comprises: the type of the raw DNA data is adjusted according to the target parameters.
The DNA data classification method based on the sequencing platform according to the data quality further comprises the following steps:
obtaining a plurality of batches of original DNA sequence files; classifying original DNA data contained in an original DNA sequence file to obtain a target type; and putting the original DNA sequence file into a folder corresponding to the target type.
According to the technical scheme of this embodiment can carry out DNA data automatic classification based on the original data that the automatic sequencing appearance of DNA of sequencing platform obtained, can reduce artifical categorised work load, improve classification efficiency, improve categorised rate of accuracy.
Example 2
The embodiment provides a DNA data classification system based on a sequencing platform according to data quality. Referring to fig. 2, the DNA data classification system based on the sequencing platform according to the data quality includes an acquisition module 101, an extraction module 102, and a classification module 103.
The acquisition module 101 is used for acquiring original DNA data by a DNA automatic sequencer based on a sequencing platform; the extraction module 102 is used for extracting target parameters from the original DNA data; the classification module 103 is used for classifying the raw DNA data according to the target parameters.
In specific implementations, the sequencing platform is a Sanger platform. In specific implementation, the obtaining module 101 obtains original DNA data based on a DNA automatic sequencer of a Sanger sequencing platform.
The extraction module 102 then extracts the target parameters from the raw DNA data. The target parameters include signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles, and the like. In the process of acquiring original DNA data by the DNA automatic sequencer based on the Sanger sequencing platform, parameter information such as signal intensity, signal to noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like can be acquired, and the original DNA data comprises the parameter information such as signal intensity, signal to noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like. The original DNA data has a preset data format, and parameter information such as signal intensity, signal-to-noise ratio, total base number, credible base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like is stored in the original DNA data according to the preset format.
The extraction module 102 analyzes the original DNA data according to a preset format, and may obtain parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the like.
Next, the classification module 103 classifies the raw DNA data according to the target parameters.
In specific implementation, a plurality of types of DNA data are divided into numerical value intervals to which each item of parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like belongs. In some alternative embodiments, the criteria for classifying several types of DNA data is implemented with reference to standards currently in the art. The type of DNA data is clear to those skilled in the art. The original DNA data is divided into corresponding types according to parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles, etc., which can be realized by those skilled in the art and is not described herein again. In other alternative embodiments, criteria for classifying several types of DNA data may be set as appropriate as desired.
As an optional implementation manner, a corresponding preset threshold is set for parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the like.
In an alternative embodiment, the classification module 103 classifies the raw DNA data according to a preset threshold corresponding to the target parameter. That is, for parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the like, corresponding preset thresholds are respectively set, and then the original DNA data are classified according to the magnitude relation between the signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary operation state, influence of impurities, influence of bubbles, and the corresponding preset thresholds of the original DNA data. The preset thresholds corresponding to parameter information such as signal intensity, signal-to-noise ratio, total base number, reliable base number, peak width, mixed peak, capillary running state, influence of impurities, influence of bubbles and the like are reasonably set by those skilled in the art according to corresponding standards in the technical field of DNA data, and are not described herein again.
In an alternative embodiment, the classification module 103 also adjusts the type of raw DNA data according to the target parameters.
The classification module 103 acquires a plurality of batches of original DNA sequence files; classifying original DNA data contained in an original DNA sequence file to obtain a target type; and putting the original DNA sequence file into a folder corresponding to the target type.
According to the technical scheme of this embodiment can carry out DNA data automatic classification based on the original data that the automatic sequencing appearance of DNA of sequencing platform obtained, can reduce artifical categorised work load, improve classification efficiency, improve categorised rate of accuracy.
Example 3
Fig. 3 is a schematic structural diagram of an electronic device provided in this embodiment. The electronic device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method for DNA data classification according to data quality based on a sequencing platform of embodiment 1. The electronic device 30 shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 3, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, and a bus 33 connecting the various system components (including the memory 32 and the processor 31).
The bus 33 includes a data bus, an address bus, and a control bus.
The memory 32 may include volatile memory, such as Random Access Memory (RAM) 321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.
Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 31 executes various functional applications and data processing, such as a DNA data classification method based on a sequencing platform according to data quality according to embodiment 1 of the present invention, by running a computer program stored in the memory 32.
The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through input/output (I/O) interfaces 35. Also, model-generating device 30 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via network adapter 36. As shown, network adapter 36 communicates with the other modules of model-generating device 30 via bus 33. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the model-generating device 30, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, to name a few.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the sequencing platform-based DNA data classification method of embodiment 1 according to data quality.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible embodiment, the invention can also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the method for DNA data classification based on sequencing platform according to data quality of example 1, when said program product is run on said terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes or modifications to these embodiments may be made by those skilled in the art without departing from the principle and spirit of this invention, and these changes and modifications are within the scope of this invention.

Claims (10)

1. A DNA data classification method based on a sequencing platform according to data quality is characterized by comprising the following steps:
acquiring original DNA data by a DNA automatic sequencer based on a sequencing platform;
extracting target parameters from the raw DNA data;
and classifying the original DNA data according to the target parameters.
2. The sequencing platform-based method for DNA data classification by data quality of claim 1, wherein the target parameter comprises at least one of signal intensity, signal to noise ratio, total base number, number of bases that can be trusted, peak width, mixing peak, capillary run state, effect of impurities, effect of bubbles.
3. The sequencing platform-based DNA data classification method according to data quality of claim 1, wherein said classifying the raw DNA data according to the target parameter comprises:
and classifying the original DNA data according to a preset threshold corresponding to the target parameter.
4. The sequencing platform-based DNA data classification method according to data quality of claim 1, wherein the classifying the raw DNA data according to the preset threshold corresponding to the target parameter comprises:
if the target parameter is larger than the preset threshold value, dividing the original DNA data into a first class; and if the target parameter is less than or equal to the preset threshold value, dividing the original DNA data into a second class.
5. The sequencing platform based DNA data classification method according to data quality of claim 1, further comprising the steps of:
adjusting the type of the raw DNA data according to the target parameter.
6. The sequencing platform based DNA data classification method according to data quality of claim 1, further comprising the steps of:
obtaining a plurality of batches of original DNA sequence files;
classifying original DNA data contained in the original DNA sequence file to obtain a target type;
and putting the original DNA sequence file into a folder corresponding to the target type.
7. The sequencing platform-based method for DNA data classification according to data quality of claim 1, wherein the sequencing platform is a Sanger platform.
8. A DNA data classification system based on a sequencing platform according to data quality is characterized by comprising an acquisition module, an extraction module and a classification module;
the acquisition module is used for acquiring original DNA data by a DNA automatic sequencer based on a sequencing platform;
the extraction module is used for extracting target parameters from the original DNA data;
the classification module is used for classifying the original DNA data according to the target parameters.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the method for DNA data classification by data quality based on a sequencing platform of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the DNA data classification method according to data quality based on a sequencing platform of any one of claims 1 to 7.
CN202111095386.5A 2021-09-17 2021-09-17 DNA data classification method and system based on sequencing platform according to data quality Pending CN115831237A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111095386.5A CN115831237A (en) 2021-09-17 2021-09-17 DNA data classification method and system based on sequencing platform according to data quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111095386.5A CN115831237A (en) 2021-09-17 2021-09-17 DNA data classification method and system based on sequencing platform according to data quality

Publications (1)

Publication Number Publication Date
CN115831237A true CN115831237A (en) 2023-03-21

Family

ID=85515964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111095386.5A Pending CN115831237A (en) 2021-09-17 2021-09-17 DNA data classification method and system based on sequencing platform according to data quality

Country Status (1)

Country Link
CN (1) CN115831237A (en)

Similar Documents

Publication Publication Date Title
CN103684898A (en) Method and device for monitoring operation of user request in distributed system
US11372388B2 (en) Locking error alarm device and method
CN110941553A (en) Code detection method, device, equipment and readable storage medium
CN112527676A (en) Model automation test method, device and storage medium
CN111796095A (en) Proteome mass spectrum data processing method and device
US11592448B2 (en) Tandem identification engine
CN115831237A (en) DNA data classification method and system based on sequencing platform according to data quality
CN113342799A (en) Data correction method and system
CN110176276B (en) Biological information analysis process management method and system
CN116841846A (en) Real-time log abnormality detection method, device, equipment and storage medium thereof
CN113704077A (en) Test case generation method and device
CN116467860A (en) Simulation test design method based on evaluation index
CN110502538A (en) Label of drawing a portrait generates method, system, equipment and the storage medium of logical mappings
US8090167B1 (en) Neuronal measurement tool
US7869914B2 (en) Vehicle quality analyzing system and plural data management method
US20230102127A1 (en) Systems and methods for identifying samples of interest by comparing aligned time-series measurements
US20120166101A1 (en) Chromatographic peak identification using bootstrap replication object oriented system and method
CN113849484A (en) Big data component upgrading method and device, electronic equipment and storage medium
CN113238901A (en) Multi-device automatic testing method and device, storage medium and computer device
CN112395189A (en) Method, device and equipment for automatically identifying test video and storage medium
US20080162063A1 (en) Vehicle Quality Analyzing System and Program File Management Method
US20230222360A1 (en) Context similarity detector for artificial intelligence
WO2022252051A1 (en) Data processing method and apparatus, and device and storage medium
CN111340078B (en) Method, device, medium and electronic equipment for automatically classifying certificate information
CN111522900A (en) Method, system, device and storage medium for automatically analyzing unstructured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication